Yuandong Tian
@tydsh
Followers
23K
Following
3K
Statuses
946
Research Scientist Director in Meta GenAI. Doing reasoning. Novelist in spare time. PhD in @CMU_Robotics.
California, USA
Joined December 2009
Our new Token Assorted paper ( shows that pre-models can learn CoTs with mixed text and latent tokens. The latent tokens are encoded from text-based CoTs by a VQVAE, whose decoder also enables us to understand the meaning of the latent tokens. The resulting fine-tuned models outperform baselines by 4-5% in multiple math datasets (MATH, GSM8K, Fresh-Gaokao-Math) with ~17% shorter CoTs. The same paradigm also works for synthetic tasks such as maze navigations, by training a Transformer from scratch. Great work from the team Andy Su, @zhuhl98 , @YingchenX , @JiantaoJ and @qqyuzu !
Widely accepted: the longer CoT the better perf - in TEXT space. What happens in LATENT space? We use latent discrete tokens to abstract away initial reasoning steps, reducing trace length while boosting performance! w Dijia Hanlin @YingchenX @JiantaoJ @tydsh #reasoning #llm
3
25
98
We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit QAT, showing "compensation" behaviors.
1
12
70
This is a very nice characteristics for Deepseek-R1. Our Dualformer paper ( ICLR'25) also shows such behaviors, once trained with mixed CoT / direct answer data. One model to switch between slow and fast thinking seamlessly. Does that mean that R1 is also trained with mixed CoT/direct answer dataš¤, or this is just because in the second stage of their RL training, Deepseek incorporates 200k non-reasoning data, some of which are simple and does not provide CoT?
If you deploy the DeepSeek-R1 model locally and find that the model sometimes does not engage in thinking, please refer to Add `<think>\n` at the end of the chat template to force the model to think.
9
23
152
Every time I brainstormed with others why Silicon Valley can innovate, I always tell the story that "you never know what crazy ideas may come out from old garage from energetic young people. It is a decentralized system." When it becomes centralized, even along the direction of high-level ideas (e.g. "we only need scaling laws"), things will change.
A common disease in some Silicon Valley circles: a misplaced superiority complex. ā¬ļøā¬ļøā¬ļø
1
6
74
Believing in a distributed system of many open source AIs seems to be on the right side of historyš
I agree. History has demonstrated repeatedly that distributed systems consistently out-innovate centralized ones. They're stable and not tied to one person's whim. With AI, this model also educates daily, propelling the entire community forward.
1
1
24
@Francis_YAO_ Is that because there are more and more SFT data āleakedā into the pre-trained dataset?
3
0
32
ehhh... It would be crazy if that's trueš. FrontierMath is extremely challenging since the dataset is private and the problem is super diverse, each requiring on-demand learning of unseen complicated and deep math concepts... I definitely trust OpenAI people not to train on the test set but there are always ways to construct a massive amount of internal data with similar nature...
Remember o3ās 25% performance on the FrontierMath benchmark? It turns out that OpenAI funded FrontierMath and has had access to most of the dataset. Mathematicians whoāve created the problems and solutions for the benchmark were not told OpenAI funded the work and will have access. That is: - we donāt know if OpenAI trained o3 on the benchmark, and itās unclear if their results can be trusted - mathematicians, some of whom distrust OpenAI and would not want to contribute to general AI capabilities due to existential risk concerns, were misled: most didnāt suspect a frontier AI company funded it. From Epoch AI: āOur contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.ā There was a āverbal agreementā with OpenAIāas if anyone trusts OpenAIās word at this point: āWe acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.ā
3
0
67
Instead of generating 2 latent tokens, you can allow the model to generate 1 latent token, then force a , and you will see language tokens following. On the other hand, latent tokens may contain a lot of information (e.g. all possible paths up to depth K), which is hard to be converted directly to language tokens.
1
0
7
Thanks for liking our continuous CoT paper š
Today I had the great idea of doing chain of thoughts in the continuous space. I know it's a great idea because @jaseweston and @tesatory already did it. Great read:
0
0
14
Nice experienceš. Define a function with natural language, and the function call is available to you immediately anywhere. "What you think immediately becomes what you get" šš
What if you could build AI features in seconds- without handling complex prompts, output schemas, or model confusion? Introducing Weco AI Functions: Just call an AI function as if itās any other function in your code. (1/N)
0
4
23