Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The example you're thinking of is actually in gridworld [1]. As you allud to, one of the parameters of the model is the cost of simply being alive for an additional time-step. If the cost is negative (a reward), then the agent will just sit there forever and accumulate infinite points. If it is zero, it might still just sit there to avoid falling into the hole, which has a large penalty and ends the simulation. As you turn up the dial on the cost of living, the agent starts using more and more aggressive strategies to reach the goal quickly. But if you make it too big, it will just jump in the hole.

[1] https://inst.eecs.berkeley.edu/~cs188/fa18/assets/slides/lec...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: