Yuandong Tian @tydsh profile

Yuandong Tian

@tydsh

Followers

23K

Following

3K

Statuses

946

Research Scientist Director in Meta GenAI. Doing reasoning. Novelist in spare time. PhD in @CMU_Robotics.

California, USA

Joined December 2009

Don't wanna be here? Send us removal request.

Yuandong Tian

@tydsh

2 days

Our new Token Assorted paper ( shows that pre-models can learn CoTs with mixed text and latent tokens. The latent tokens are encoded from text-based CoTs by a VQVAE, whose decoder also enables us to understand the meaning of the latent tokens. The resulting fine-tuned models outperform baselines by 4-5% in multiple math datasets (MATH, GSM8K, Fresh-Gaokao-Math) with ~17% shorter CoTs. The same paradigm also works for synthetic tasks such as maze navigations, by training a Transformer from scratch. Great work from the team Andy Su, @zhuhl98 , @YingchenX , @JiantaoJ and @qqyuzu !

Qinqing Zheng

@qqyuzu

2 days

Widely accepted: the longer CoT the better perf - in TEXT space. What happens in LATENT space? We use latent discrete tokens to abstract away initial reasoning steps, reducing trace length while boosting performance! w Dijia Hanlin @YingchenX @JiantaoJ @tydsh #reasoning #llm

3

25

98

Yuandong Tian

@tydsh

2 days

Try ParetoQ😃A picture is worth a thousand words!

Zechun Liu

@zechunliu

2 days

Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.

0

7

Yuandong Tian

@tydsh

2 days

We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit QAT, showing "compensation" behaviors.

1

12

70

Yuandong Tian

@tydsh

3 days

This is a very nice characteristics for Deepseek-R1. Our Dualformer paper ( ICLR'25) also shows such behaviors, once trained with mixed CoT / direct answer data. One model to switch between slow and fast thinking seamlessly. Does that mean that R1 is also trained with mixed CoT/direct answer data🤔, or this is just because in the second stage of their RL training, Deepseek incorporates 200k non-reasoning data, some of which are simple and does not provide CoT?

Daya Guo

@Guodaya

6 days

If you deploy the DeepSeek-R1 model locally and find that the model sometimes does not engage in thinking, please refer to Add `<think>\n` at the end of the chat template to force the model to think.

9

23

152

Yuandong Tian

@tydsh

8 days

Every time I brainstormed with others why Silicon Valley can innovate, I always tell the story that "you never know what crazy ideas may come out from old garage from energetic young people. It is a decentralized system." When it becomes centralized, even along the direction of high-level ideas (e.g. "we only need scaling laws"), things will change.

Yann LeCun

@ylecun

9 days

A common disease in some Silicon Valley circles: a misplaced superiority complex. ⬇️⬇️⬇️

1

6

74

Yuandong Tian

@tydsh

8 days

Believing in a distributed system of many open source AIs seems to be on the right side of history😃

Yuandong Tian

@tydsh

1 year

I agree. History has demonstrated repeatedly that distributed systems consistently out-innovate centralized ones. They're stable and not tied to one person's whim. With AI, this model also educates daily, propelling the entire community forward.

1

24

Yuandong Tian

@tydsh

12 days

@Francis_YAO_ Is that because there are more and more SFT data “leaked” into the pre-trained dataset?

3

0

32

Yuandong Tian

@tydsh

20 days

ehhh... It would be crazy if that's true😓. FrontierMath is extremely challenging since the dataset is private and the problem is super diverse, each requiring on-demand learning of unseen complicated and deep math concepts... I definitely trust OpenAI people not to train on the test set but there are always ways to construct a massive amount of internal data with similar nature...

Mikhail Samin

@Mihonarium

21 days

Remember o3’s 25% performance on the FrontierMath benchmark? It turns out that OpenAI funded FrontierMath and has had access to most of the dataset. Mathematicians who’ve created the problems and solutions for the benchmark were not told OpenAI funded the work and will have access. That is: - we don’t know if OpenAI trained o3 on the benchmark, and it’s unclear if their results can be trusted - mathematicians, some of whom distrust OpenAI and would not want to contribute to general AI capabilities due to existential risk concerns, were misled: most didn’t suspect a frontier AI company funded it. From Epoch AI: “Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.” There was a “verbal agreement” with OpenAI—as if anyone trusts OpenAI’s word at this point: “We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.”

3

0

67

Yuandong Tian

@tydsh

24 days

Instead of generating 2 latent tokens, you can allow the model to generate 1 latent token, then force a , and you will see language tokens following. On the other hand, latent tokens may contain a lot of information (e.g. all possible paths up to depth K), which is hard to be converted directly to language tokens.

1

0

7

Yuandong Tian

@tydsh

24 days

LaTRO finds discrete thought tokens to maximize final rewards. The thought tokens z are sampled by a "reasoner" with a variational framework. Here "latent" = thinking process not observable. In contrast, Coconut finds continuous latent thinking token. Very different in nature.

1

35

Yuandong Tian

@tydsh

1 month

@armandjoulin no worries! I should have followed u long time ago~

0

3

Yuandong Tian

@tydsh

1 month

Thanks for liking our continuous CoT paper 😀

Armand Joulin

@armandjoulin

1 month

Today I had the great idea of doing chain of thoughts in the continuous space. I know it's a great idea because @jaseweston and @tesatory already did it. Great read:

0

14

Yuandong Tian

@tydsh

1 month

Nice experience😀. Define a function with natural language, and the function call is available to you immediately anywhere. "What you think immediately becomes what you get" 🚀🚀

Weco AI

@WecoAI

1 month

What if you could build AI features in seconds- without handling complex prompts, output schemas, or model confusion? Introducing Weco AI Functions: Just call an AI function as if it’s any other function in your code. (1/N)

0

4

23