Yuandong Tian Profile
Yuandong Tian

@tydsh

Followers
23K
Following
3K
Statuses
946

Research Scientist Director in Meta GenAI. Doing reasoning. Novelist in spare time. PhD in @CMU_Robotics.

California, USA
Joined December 2009
Don't wanna be here? Send us removal request.
@tydsh
Yuandong Tian
2 days
Our new Token Assorted paper ( shows that pre-models can learn CoTs with mixed text and latent tokens. The latent tokens are encoded from text-based CoTs by a VQVAE, whose decoder also enables us to understand the meaning of the latent tokens. The resulting fine-tuned models outperform baselines by 4-5% in multiple math datasets (MATH, GSM8K, Fresh-Gaokao-Math) with ~17% shorter CoTs. The same paradigm also works for synthetic tasks such as maze navigations, by training a Transformer from scratch. Great work from the team Andy Su, @zhuhl98 , @YingchenX , @JiantaoJ and @qqyuzu !
@qqyuzu
Qinqing Zheng
2 days
Widely accepted: the longer CoT the better perf - in TEXT space. What happens in LATENT space? We use latent discrete tokens to abstract away initial reasoning steps, reducing trace length while boosting performance! w Dijia Hanlin @YingchenX @JiantaoJ @tydsh #reasoning #llm
3
25
98
@tydsh
Yuandong Tian
2 days
Try ParetoQšŸ˜ƒA picture is worth a thousand words!
@zechunliu
Zechun Liu
2 days
Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.
Tweet media one
0
0
7
@tydsh
Yuandong Tian
2 days
We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit QAT, showing "compensation" behaviors.
1
12
70
@tydsh
Yuandong Tian
3 days
This is a very nice characteristics for Deepseek-R1. Our Dualformer paper ( ICLR'25) also shows such behaviors, once trained with mixed CoT / direct answer data. One model to switch between slow and fast thinking seamlessly. Does that mean that R1 is also trained with mixed CoT/direct answer datašŸ¤”, or this is just because in the second stage of their RL training, Deepseek incorporates 200k non-reasoning data, some of which are simple and does not provide CoT?
@Guodaya
Daya Guo
6 days
If you deploy the DeepSeek-R1 model locally and find that the model sometimes does not engage in thinking, please refer to Add `<think>\n` at the end of the chat template to force the model to think.
9
23
152
@tydsh
Yuandong Tian
8 days
Every time I brainstormed with others why Silicon Valley can innovate, I always tell the story that "you never know what crazy ideas may come out from old garage from energetic young people. It is a decentralized system." When it becomes centralized, even along the direction of high-level ideas (e.g. "we only need scaling laws"), things will change.
@ylecun
Yann LeCun
9 days
A common disease in some Silicon Valley circles: a misplaced superiority complex. ā¬‡ļøā¬‡ļøā¬‡ļø
1
6
74
@tydsh
Yuandong Tian
8 days
Believing in a distributed system of many open source AIs seems to be on the right side of historyšŸ˜ƒ
@tydsh
Yuandong Tian
1 year
I agree. History has demonstrated repeatedly that distributed systems consistently out-innovate centralized ones. They're stable and not tied to one person's whim. With AI, this model also educates daily, propelling the entire community forward.
1
1
24
@tydsh
Yuandong Tian
12 days
@Francis_YAO_ Is that because there are more and more SFT data ā€œleakedā€ into the pre-trained dataset?
3
0
32
@tydsh
Yuandong Tian
20 days
ehhh... It would be crazy if that's truešŸ˜“. FrontierMath is extremely challenging since the dataset is private and the problem is super diverse, each requiring on-demand learning of unseen complicated and deep math concepts... I definitely trust OpenAI people not to train on the test set but there are always ways to construct a massive amount of internal data with similar nature...
@Mihonarium
Mikhail Samin
21 days
Remember o3ā€™s 25% performance on the FrontierMath benchmark? It turns out that OpenAI funded FrontierMath and has had access to most of the dataset. Mathematicians whoā€™ve created the problems and solutions for the benchmark were not told OpenAI funded the work and will have access. That is: - we donā€™t know if OpenAI trained o3 on the benchmark, and itā€™s unclear if their results can be trusted - mathematicians, some of whom distrust OpenAI and would not want to contribute to general AI capabilities due to existential risk concerns, were misled: most didnā€™t suspect a frontier AI company funded it. From Epoch AI: ā€œOur contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.ā€ There was a ā€œverbal agreementā€ with OpenAIā€”as if anyone trusts OpenAIā€™s word at this point: ā€œWe acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.ā€
Tweet media one
Tweet media two
3
0
67
@tydsh
Yuandong Tian
24 days
Instead of generating 2 latent tokens, you can allow the model to generate 1 latent token, then force a , and you will see language tokens following. On the other hand, latent tokens may contain a lot of information (e.g. all possible paths up to depth K), which is hard to be converted directly to language tokens.
1
0
7
@tydsh
Yuandong Tian
24 days
LaTRO finds discrete thought tokens to maximize final rewards. The thought tokens z are sampled by a "reasoner" with a variational framework. Here "latent" = thinking process not observable. In contrast, Coconut finds continuous latent thinking token. Very different in nature.
1
1
35
@tydsh
Yuandong Tian
1 month
@armandjoulin no worries! I should have followed u long time ago~
0
0
3
@tydsh
Yuandong Tian
1 month
Thanks for liking our continuous CoT paper šŸ˜€
@armandjoulin
Armand Joulin
1 month
Today I had the great idea of doing chain of thoughts in the continuous space. I know it's a great idea because @jaseweston and @tesatory already did it. Great read:
0
0
14
@tydsh
Yuandong Tian
1 month
Nice experiencešŸ˜€. Define a function with natural language, and the function call is available to you immediately anywhere. "What you think immediately becomes what you get" šŸš€šŸš€
@WecoAI
Weco AI
1 month
What if you could build AI features in seconds- without handling complex prompts, output schemas, or model confusion? Introducing Weco AI Functions: Just call an AI function as if itā€™s any other function in your code. (1/N)
0
4
23