Yuxi Li @yuxili99 profile

Yuxi Li

@yuxili99

Followers

851

Following

600

Statuses

724

RL, AI, LLMs, agent, code, blockchain. Guest editor, MLJ SI. Co-Chair for workshops in AAAI, ICML, NeurIPS. PhD @UAlberta.

Joined March 2012

Don't wanna be here? Send us removal request.

Yuxi Li

@yuxili99

10 days

Building information perpetual motion machines? Reinforcement learning! Why not in 2023?

0

Yuxi Li

@yuxili99

20 days

@xwang_lk Thanks! NLP/LLMs are very different from AlphaGo.

Yuxi Li

@yuxili99

6 months

@karpathy Andrej’s tweet about RLHF is misleading. Why? A fundamental issue is: Most NLP problems do not have objective objectives. RLHF is a promising way to learn the reward / objective function. Pierluca basically explains it:

0

2

Yuxi Li

@yuxili99

20 days

@garrytan LLMs are different from AlphaGo.

0

1

Yuxi Li

@yuxili99

20 days

@xwang_lk RLHF is inverse RL, so imitation learning. Diff from "learning from demonstration" though, which is supervised learning, cf "supervised fine-tuning". RLHF is not SL. RLHF is a principled approach to problems w/o a reward function. Most NLP problems w/o objective objectives.

1

0

3

Yuxi Li

@yuxili99

25 days

Paper by John Schultz, Jakub Adamek, @MatejJusup, @sharky6000, Michael Kaisers, @sarah_perrin_, Daniel Hennes, Jeremy Shar, Cannada Lewis, @anianruoss, @TZahavy, @PetarV_93, Laurel Prince, Satinder Singh, @ericmalmi and @weballergy

0

2

Yuxi Li

@yuxili99

1 month

@MatejJusup @GoogleDeepMind @ADarmouni

Yuxi Li

@yuxili99

1 month

Title: Mastering Board Games by External and Internal Planning with Language Models Speaker: John Schultz, Deepmind Time: Jan 16, 2-3 pm EST Pls mark your calendar!

0

3

Yuxi Li

@yuxili99

1 month

@ADarmouni @GoogleDeepMind

Yuxi Li

@yuxili99

1 month

Title: Mastering Board Games by External and Internal Planning with Language Models Speaker: John Schultz, Deepmind Time: Jan 16, 2-3 pm EST Pls mark your calendar!

0

1

Yuxi Li

@yuxili99

1 month

Title: Mastering Board Games by External and Internal Planning with Language Models Speaker: John Schultz, Deepmind Time: Jan 16, 2-3 pm EST Pls mark your calendar!

0

2

21

Yuxi Li

@yuxili99

1 month

@zdhnarsil I think PRM is a misuse of terminology: No need to differentiate PRM and ORM, just RM. It should be value function or may be reward shaping. Something wrong in the way to define PRM, like (1,0,-1). A short blog hybrid in Chinese & English.

0

3

Yuxi Li

@yuxili99

1 month

@TonyZQin Title: Building Task-driven Conversational Agents for Business Phone Operations

0

Yuxi Li

@yuxili99

1 month

Title: Building Task-driven Conversational Agents for Business Phone Operations Speaker: @TonyZQin Time: Jan 8, 5:30pm PT Welcome!

0

1

Yuxi Li

@yuxili99

1 month

@natolambert Seems no exact solution (from the answers so far and AFAIK). Can we say the numbers reported are "heuristic"? BTW, passing several tests, like with HumanEval, can not guarantee code correctness. So, many (all?) code generation papers are reporting "heuristic" results?

0

7

Yuxi Li

@yuxili99

1 month

Title: Building Task-driven Conversational Agents for Business Phone Operations Speaker: @TonyZQin Time: Jan 8, 5:30pm PT Welcome! Please mark your calendar.

0

Yuxi Li

@yuxili99

1 month

@denny_zhou Any LLM can *guarantee* accuracy, not mention optimality? Any LLM can beat AlphaZero on chess (without training data from an AI like AlphaZero)?

1

0

2

Yuxi Li

@yuxili99

1 month

@omarsar0 Shouldn't the title be "LLMs are not good enough?" or "LLMs are not good enough for building (autonomous) agents"?

0

1

Yuxi Li

@yuxili99

1 month

Reflection 2024, Guesstimation 2025

0

Yuxi Li

@yuxili99

1 month

@sh_reya I talked about it. I am not an influencer though...

Yuxi Li

@yuxili99

11 months

AI is NOT ready to automate programming yet! #artificalintelligence #LLM #LLMs #programming #SoftwareEngineering #SoftwareDevelopment #SoftwareEngineer #softwaretesting

0

Yuxi Li

@yuxili99

2 months

@denny_zhou @aidan_mclau Gradient decent is search, in continuous spaces.

0

1