![Yuxi Li Profile](https://pbs.twimg.com/profile_images/1074541888139456512/LSaO-hEn_x96.jpg)
Yuxi Li
@yuxili99
Followers
851
Following
600
Statuses
724
RL, AI, LLMs, agent, code, blockchain. Guest editor, MLJ SI. Co-Chair for workshops in AAAI, ICML, NeurIPS. PhD @UAlberta.
Joined March 2012
@xwang_lk RLHF is inverse RL, so imitation learning. Diff from "learning from demonstration" though, which is supervised learning, cf "supervised fine-tuning". RLHF is not SL. RLHF is a principled approach to problems w/o a reward function. Most NLP problems w/o objective objectives.
1
0
3
Paper by John Schultz, Jakub Adamek, @MatejJusup, @sharky6000, Michael Kaisers, @sarah_perrin_, Daniel Hennes, Jeremy Shar, Cannada Lewis, @anianruoss, @TZahavy, @PetarV_93, Laurel Prince, Satinder Singh, @ericmalmi and @weballergy
0
0
2
@zdhnarsil I think PRM is a misuse of terminology: No need to differentiate PRM and ORM, just RM. It should be value function or may be reward shaping. Something wrong in the way to define PRM, like (1,0,-1). A short blog hybrid in Chinese & English.
0
0
3
@natolambert Seems no exact solution (from the answers so far and AFAIK). Can we say the numbers reported are "heuristic"? BTW, passing several tests, like with HumanEval, can not guarantee code correctness. So, many (all?) code generation papers are reporting "heuristic" results?
0
0
7
@denny_zhou Any LLM can *guarantee* accuracy, not mention optimality? Any LLM can beat AlphaZero on chess (without training data from an AI like AlphaZero)?
1
0
2
@sh_reya I talked about it. I am not an influencer though...
AI is NOT ready to automate programming yet! #artificalintelligence #LLM #LLMs #programming #SoftwareEngineering #SoftwareDevelopment #SoftwareEngineer #softwaretesting
0
0
0