j0rd1smit's comments

j0rd1smit · 2026-01-15T16:51:08 1768495868

j0rd1smit · on Feb 24, 2025

Any suggestions for good basic accounting learning resources?

dugmartin · on Feb 24, 2025

I have no association with the author but this site has been a good search result source for me over the last few years when I've had questions:

https://www.accountingcoach.com/

j0rd1smit · on March 17, 2024

This is also a huge problem in offline Rl (learning a policy using only a dataset). If done naively, the learned policy will keep accumulating errors due to enter areas that are not well covered. So the trick is to avoid these areas. In offline rl they do this by measuring epistemic uncertainty and using this as a regularization term in the loss function such that the model learns to avoid these areas. This a good blog post that explains it way better https://jacobbuckman.com/2020-11-30-conceptual-fundamentals-...

polygamous_bat · on March 17, 2024

Bigger issue with offline RL in the real world (I.e. not Atari video games) has been the assumption of reward labeling. Who’s giving you reward labels at scale? In my opinion that’s why we haven’t seen any large scale real world success stories using offline RL.

AndrewKemendo · on March 17, 2024

I fully agree with you that instrumentation is one of the biggest barriers to state, action, trajectory and reward feedback

However, instrumentation assumes that there’s a control regime that could actually control whatever the system is mechanically, and that’s generally not true.

So it’s almost a chicken and an egg problem where you can do instrumentation for non-autonomous-control systems in order to get state-action-reward data, but because you don’t actually have an actuated control system that you can specify and build mechanically, your targets for state-action-reward tuple aren’t the same

That is to say unless you’re actively collecting data from an autonomous system that’s being used non-autonomously then you’re not gonna be able to transition from a non-autonomous control regime to an autonomous control regime