![Ricardo Hortelano Profile](https://pbs.twimg.com/profile_images/1705872362652966912/isBLZyTn_x96.jpg)
Ricardo Hortelano
@RHortelanoS
Followers
428
Following
6K
Statuses
4K
@molasalex Si lo hacemos dt=0 supongo que tampoco veriamos como atropella a todos. Sería otro win, de algun modo... 😃
0
0
0
RT @XRarchitect: Finally got my Gaussian Splat picture up and running in Augmented Reality—all via the web This is how I want to capture p…
0
152
0
RT @suchenzang: the true bitter lesson: it's easier to lie, cheat, and steal than it is to actually do good work
0
25
0
Carmark lo ha entendido
Offline reinforcement learning, where an agent tries to improve a behavior policy by observing another agent without actually playing, is a harder problem than it appears. The challenge isn’t to mimic the provided play, but to learn something better than what you have seen. The difference between online (traditional) RL and offline RL is that online RL is constantly "testing" its model by taking new actions as a result of changes to the model, while the offline training can bootstrap itself off into a coherent fantasy of great returns untested by reality. It may be just an artifact of value based RL in particular, but I am inclined to believe that it is a more fundamental truth about theoretical and observational science versus experimental science, and life in general.
0
0
0
@ID_AA_Carmack That's probably the next step on the ladder of causality from @yudapearl An agent of that nature probably needs to learn by counterfactuals.
1
1
4