yuxili99 Profile Banner
Yuxi Li Profile
Yuxi Li

@yuxili99

Followers
851
Following
600
Statuses
724

RL, AI, LLMs, agent, code, blockchain. Guest editor, MLJ SI. Co-Chair for workshops in AAAI, ICML, NeurIPS. PhD @UAlberta.

Joined March 2012
Don't wanna be here? Send us removal request.
@yuxili99
Yuxi Li
10 days
Building information perpetual motion machines? Reinforcement learning! Why not in 2023?
0
0
0
@yuxili99
Yuxi Li
20 days
@xwang_lk Thanks! NLP/LLMs are very different from AlphaGo.
@yuxili99
Yuxi Li
6 months
@karpathy Andrej’s tweet about RLHF is misleading. Why? A fundamental issue is: Most NLP problems do not have objective objectives. RLHF is a promising way to learn the reward / objective function. Pierluca basically explains it:
0
0
2
@yuxili99
Yuxi Li
20 days
@garrytan LLMs are different from AlphaGo.
0
0
1
@yuxili99
Yuxi Li
20 days
@xwang_lk RLHF is inverse RL, so imitation learning. Diff from "learning from demonstration" though, which is supervised learning, cf "supervised fine-tuning". RLHF is not SL. RLHF is a principled approach to problems w/o a reward function. Most NLP problems w/o objective objectives.
1
0
3
@yuxili99
Yuxi Li
25 days
Paper by John Schultz, Jakub Adamek, @MatejJusup, @sharky6000, Michael Kaisers, @sarah_perrin_, Daniel Hennes, Jeremy Shar, Cannada Lewis, @anianruoss, @TZahavy, @PetarV_93, Laurel Prince, Satinder Singh, @ericmalmi and @weballergy
0
0
2
@yuxili99
Yuxi Li
1 month
@yuxili99
Yuxi Li
1 month
Title: Mastering Board Games by External and Internal Planning with Language Models Speaker: John Schultz, Deepmind Time: Jan 16, 2-3 pm EST Pls mark your calendar!
Tweet media one
0
0
3
@yuxili99
Yuxi Li
1 month
@yuxili99
Yuxi Li
1 month
Title: Mastering Board Games by External and Internal Planning with Language Models Speaker: John Schultz, Deepmind Time: Jan 16, 2-3 pm EST Pls mark your calendar!
Tweet media one
0
0
1
@yuxili99
Yuxi Li
1 month
Title: Mastering Board Games by External and Internal Planning with Language Models Speaker: John Schultz, Deepmind Time: Jan 16, 2-3 pm EST Pls mark your calendar!
Tweet media one
0
2
21
@yuxili99
Yuxi Li
1 month
@zdhnarsil I think PRM is a misuse of terminology: No need to differentiate PRM and ORM, just RM. It should be value function or may be reward shaping. Something wrong in the way to define PRM, like (1,0,-1). A short blog hybrid in Chinese & English.
0
0
3
@yuxili99
Yuxi Li
1 month
@TonyZQin Title: Building Task-driven Conversational Agents for Business Phone Operations
0
0
0
@yuxili99
Yuxi Li
1 month
Title: Building Task-driven Conversational Agents for Business Phone Operations Speaker: @TonyZQin Time: Jan 8, 5:30pm PT Welcome!
Tweet media one
0
0
1
@yuxili99
Yuxi Li
1 month
@natolambert Seems no exact solution (from the answers so far and AFAIK). Can we say the numbers reported are "heuristic"? BTW, passing several tests, like with HumanEval, can not guarantee code correctness. So, many (all?) code generation papers are reporting "heuristic" results?
0
0
7
@yuxili99
Yuxi Li
1 month
Title: Building Task-driven Conversational Agents for Business Phone Operations Speaker: @TonyZQin Time: Jan 8, 5:30pm PT Welcome! Please mark your calendar.
Tweet media one
0
0
0
@yuxili99
Yuxi Li
1 month
@denny_zhou Any LLM can *guarantee* accuracy, not mention optimality? Any LLM can beat AlphaZero on chess (without training data from an AI like AlphaZero)?
1
0
2
@yuxili99
Yuxi Li
1 month
@omarsar0 Shouldn't the title be "LLMs are not good enough?" or "LLMs are not good enough for building (autonomous) agents"?
0
0
1
@yuxili99
Yuxi Li
1 month
Reflection 2024, Guesstimation 2025
0
0
0
@yuxili99
Yuxi Li
1 month
@sh_reya I talked about it. I am not an influencer though...
0
0
0
@yuxili99
Yuxi Li
2 months
@denny_zhou @aidan_mclau Gradient decent is search, in continuous spaces.
0
0
1