Yuhuai (Tony) Wu @Yuhu_ai_ profile

Yuhuai (Tony) Wu

@Yuhu_ai_

Followers

23,250

Following

416

Media

41

Statuses

387

Co-Founder @xAI . Minerva, STaR, AlphaGeometry, AlphaStar, Autoformalization, Memorizing transformer.

https://t.co/JiseskU82P

Stanford

Joined July 2017

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

JIKOOK • 111389 Tweets

تركيا • 105086 Tweets

Avusturya • 79042 Tweets

Eminem • 77754 Tweets

बालक बुद्धि • 59790 Tweets

Romania • 53401 Tweets

TXT CHIKAI OUT NOW • 52453 Tweets

Gakpo • 52000 Tweets

TXT WE'LL NEVER CHANGE OUT NOW • 50537 Tweets

#DOLCEGABBANA • 37554 Tweets

#DGAltaModa • 36667 Tweets

Rudy Giuliani • 33731 Tweets

$ASI • 31459 Tweets

Tobey • 27031 Tweets

TRT 1 • 25757 Tweets

Recep Tayyip Erdoğan • 23340 Tweets

هولندا • 20084 Tweets

Elmo • 18539 Tweets

Denji • 16749 Tweets

チェンソーマン • 14880 Tweets

Malen • 13029 Tweets

Carille • 12362 Tweets

#roened • 12274 Tweets

Fujimoto • 12202 Tweets

Hollanda • 11681 Tweets

Trabalhadores • 11440 Tweets

オランダ • 10795 Tweets

Big Sean

San Gregorio Atzompa

Xbox user

Rublev

Roemenië

ديباي

Gapko

Lloyd Doggett

Reijnders

Nayuta

Bergwijn

Xavi Simons

Chef Smith

Alec Baldwin

Dumfries

Danny Murphy

ルーマニア

Schouten

Depay

Koeman

Países Bajos

#رف_صب

#AUTvTUR

Last Seen Profiles

@eurovision_fra

@icitizen

@chichester

@WRzBb6NB3xgDvQ8

@kontennakal0

@FLYMOTIONmedia

@John__Phipps

@kontennakal0

@xOracleUniverse

@SANGE_tante_

@DeepInHisEyes

@FridayNghtGlory

@kontennakal0

@twtilker

@510S12

@haticehnm0

@DoniBobes

@Coaster_DC

@xle5e

Pinned Tweet

Yuhuai (Tony) Wu

@Yuhu_ai_

1 year

Solve math and understand the universe

158

149

1K

Yuhuai (Tony) Wu

@Yuhu_ai_

7 months

Coming to #NeurIPS23 now. Will be there until Friday night. DM me to chat about: reasoning, AI for math, and what we’re doing @xai . Also will be at #MATHAI workshop panel discussion on Friday morning. See you there!

121

151

520

Yuhuai (Tony) Wu

@Yuhu_ai_

6 months

Euclidean geometry problems have been my favorite math puzzles since middle school. The most intriguing part of it is the creation of auxiliary lines, which opens a space for imagination and the freedom to explore various diagrams. Once a proof is found, these auxiliary lines

trieu

@thtrieu_

6 months

Proud of this work. Here's my 22min video explanation of the paper:

38

170

786

219

191

946

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Language models can dramatically improve their reasoning by learning from chains of thought that they generate. With STaR, just a few worked examples can boost accuracy to that of a 30X larger model (GPT-J to GPT-3). W. @ericzelikman , Noah Goodman 1/

8

93

523

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

After showing a few examples, large language models can translate natural language mathematical statements into formal specifications. We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F! 1/ Paper:

5

110

459

Yuhuai (Tony) Wu

@Yuhu_ai_

3 months

Our latest reasoning update. 24%->50% on MATH from Grok 1 to 1.5.

xAI

@xai

3 months

734

1K

7K

29

40

352

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

Can Neural Networks solve IQ tests? We propose Scattering Compositional Learner (SCL) for RPM Task. SCL improves SOTA from 63.9% to 95.0%. It is even capable of zero-shot generalization and learns disentangled representations! paper: (1/n)

7

108

376

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

How do you make a transformer recurrent? You just turn the transformer 90 degree, and apply it in the lateral direction! Now, with recurrence, the context size is infinite! Let's make the recurrence great again with Block-Recurrent Transformers:

Behnam Neyshabur

@bneyshabur

2 years

You think the RNN era is over? Think again! We introduce "Block-Recurrent Transformer", which applies a transformer layer in a recurrent fashion & beats transformer XL on LM tasks. Paper: W. DeLesley Hutchins, Imanol Schlag, @Yuhu_ai_ & @ethansdyer 1/

5

68

451

9

41

315

Yuhuai (Tony) Wu

@Yuhu_ai_

3 months

Now it can see

xAI

@xai

3 months

👀

666

1K

7K

11

14

207

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Super excited to share Minerva!! – a language model that is capable of solving MATH with 50% success rate, which was predicted to happen in 2025 by Steinhardt et. al. ()! #Minerva 1/

alewkowycz

@alewkowycz

2 years

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.

108

1K

8K

4

46

243

Yuhuai (Tony) Wu

@Yuhu_ai_

4 months

░J░O░I░N░U░S░

Jimmy Ba

@jimmybajimmyba

4 months

based and 🔓 wanna help accelerate the next Grok? looking for builders: — Rust/Jax/Kube infra engineers — front-end/full-stack engineers

38

89

405

25

15

149

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Autoformalization with LLMs in Lean! @zhangir_azerbay and Edward Ayers built a chat interface to formalize natural language mathematics in Lean: Very impressive work!

5

49

193

Yuhuai (Tony) Wu

@Yuhu_ai_

4 months

Alright people check it out

Grok

@grok

4 months

@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░

2K

16K

4

15

129

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

🚨We are organizing the 2nd MATHAI workshop at NeurIPS! Check it out if you're interested in AI for math, and machine reasoning in general🤯! We have a great lineup of speakers & panelists! See more in call for papers: 👇

3

30

150

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Hello #NeurIPS2022 ! I'm at New Orleans and will be here until Thursday morning (Dec 1). Let's brainstorm AI for math, LLMs, Reasoning 🤯🤯! We'll present 8 papers (1 oral and 7 posters) + 2 at workshops (MATHAI and DRL). Featuring recent breakthroughs in AI for math! See👇

3

18

138

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We open sourced Memorizing Transformers () and Block Recurrent Transformers () in Meliad! Repo link:

GitHub - google-research/meliad

Contribute to google-research/meliad development by creating an account on GitHub.

github.com

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Memorizing Transformer's camera ready is released! Main updates: 1. Adding 8K memory ~ 5X-8X larger model parameters. 2. You can easily turn a pretrained LLMs into a memorizing transformer! (4% of pretraining cost to obtain 85% of the benefit)

4

18

134

2

22

138

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Memorizing Transformer's camera ready is released! Main updates: 1. Adding 8K memory ~ 5X-8X larger model parameters. 2. You can easily turn a pretrained LLMs into a memorizing transformer! (4% of pretraining cost to obtain 85% of the benefit)

Christian Szegedy

@ChrSzegedy

2 years

Thanks a lot to @Yuhu_ai_ , @MarkusNRabe and Delesley Hutchins for their hard work of updating our ICLR paper on retrieval-augmented language modeling, aka "Memorizing Transformer"! Here is a short thread on why we think this is important. 🧵 1/n

4

41

233

4

18

134

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

In May, we discovered that LLMs can autoformalize theorem statements: In June, we showed that LLMs can solve challenging math problems with Minerva. Now, we show LLMs can turn its generated informal proofs into verified formal proofs!🤯 What's next?😎

Autoformalization with Large Language Models

Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields...

arxiv.org

Albert Jiang

@AlbertQJiang

2 years

Large language models can write informal proofs, translate them into formal ones, and achieve SoTA performance in proving competition-level maths problems! LM-generated informal proofs are sometimes more useful than the human ground truth 🤯 Preprint: 🧵

8

152

662

2

29

123

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Excited to share this new work, which sheds light on the understanding of pre-training via synthetic tasks. We did three experiments that iteratively simplify pre-training while still retaining gains. Paper: W. Felix Li, @percyliang . 1/

2

20

112

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We discover that you can teach LLMs to solve longer problems *only* via in-context learning, instead of fine-tuning. This is mind-blowing🤯🤯! -- that certain skills are hard to be encoded in model weights, but much easier to be acquired from the context.

Cem Anil

@cem__anil

2 years

🆕📜We study large language models’ ability to extrapolate to longer problems! 1) finetuning (with and without scratchpad) fails 2) few-shot scratchpad confers significant improvements 3) Many more findings (see the table & thread) Paper: [] 1/

4

40

239

2

16

105

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

We’re excited to announce the MathAI workshop at ICLR 2021 : On the Role of Mathematical Reasoning in General Artificial Intelligence. Now accepting submissions! Submission Link: Deadline: Feb 26, 11:59PM PST

1

17

94

Yuhuai (Tony) Wu

@Yuhu_ai_

1 year

Quanta magazine covers our two works on large language models for mathematical reasoning: Autoformalization and Minerva. Together, they show a path how to improve reasoning capabilities of large language models for the future.

To Teach Computers Math, Researchers Merge AI Approaches | Quanta Magazine

Large language models still struggle with basic reasoning tasks. Two new papers that apply machine learning to math provide a blueprint for how that could change.

www.quantamagazine.org

1

13

48

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

Can neural network agents prove theorems outside of the training distribution? We perform a systematic evaluation along 6 generalization dimensions with INT: an inequality theorem proving benchmark: Joint work with Albert Jiang, Jimmy Ba, @RogerGrosse .

0

7

46

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

If you use K-FAC you only need to do 1 update (ACKTR), but if you use first order optimizer, you need to do 320 updates (PPO). AND 1 update by K-FAC still wins. This is what we (with @baaadas ) find by comparing ACKTR vs. PPO vs. PPOKFAC.

An Empirical Analysis of Proximal Policy Optimization with...

In this technical report, we consider an approach that combines the PPO objective and K-FAC natural gradient optimization, for which we call PPOKFAC. We perform a range of empirical analysis on...

arxiv.org

Roger Grosse

@RogerGrosse

6 years

Empirical paper by Jiaming Song and @Yuhu_ai_ combining PPO with K-FAC preconditioning.

0

13

0

12

38

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

The next figure shows a perfect translation of a grade school math problem by PaLM. This is remarkable because such a statement is completely out-of-distribution – no formal mathematicians are interested in formalizing grade school math problems ;) 4/

1

7

37

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Compared to the 1st MATHAI workshop 1 year ago, the number of submissions this time almost doubled! Glad to see the field is growing rapidly 🙌 Also there are many mind-blowing works 🤯🤯 Stay tuned!

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

🚨👇Reminder that the submission deadline for the MATH-AI workshop at #NeurIPS2022 is tomorrow -- Sep 30, 11:59pm PT. Submit your recent works (e.g. ICLR submissions) if they are about Math&AI, reasoning, algorithmic capabilities!

0

4

20

1

3

36

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

Two papers in @ICLR18 : 1. Short horizon bias in meta-learning optimization: 2. RELAX: One invited to workshop: 3. Exploration in Meta-Reinforcement Learning

Some Considerations on Learning to Explore via Meta-Reinforcement...

Modifications to MAML and RL2 that should allow for better exploration.

openreview.net

0

3

35

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Our finding hence shows a very surprising capability of these models. They learned very general and transferable knowledge that allows them to work with low-resource formal language. 12/

1

5

33

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

Never focus too much on your short term reward, the optimal strategy in the long run might be completely opposite. Don't be fooled by short horizon bias. Both in life and meta learning.

0

17

35

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Epic night with @TheGregYang @jimmybajimmyba @ChrSzegedy @PiotrRMilos @s_tworkowski @WendaLi8 @ericzelikman @MarioKrenn6240 and many friends!! #NeurIPS2022

4

2

33

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

Check out our poster LIME #ICML @ 9-11pm PT. TLDR: Pretraining transformers on synthetic tasks to enable reasoning inductive biases. ICML page: Poster session:

0

9

34

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Now, let the examples do the talking! See the figure attached – Codex perfectly formalizes an IMO problem! It handles the negation “there is no function” by proof-by-contradiction. It understands the phrase “into itself” and correctly formalizes the co-domain of f.

1

4

32

Yuhuai (Tony) Wu

@Yuhu_ai_

7 years

Two papers got accepted to #nips2017 : 1. ACKTR: 2. Sticking the Landing:

Scalable trust-region method for deep reinforcement learning using...

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of...

arxiv.org

3

6

32

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

I’m curious to find out how far we can push Minerva to theorem proving! In the long run, I am expecting a great synergy between a strong natural language math model and an autoformalizer () to tackle challenging mathematical theorems! 3/

Autoformalization with Large Language Models

Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields...

arxiv.org

1

2

32

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We propose STaR, a Self-Taught Reasoner. We start with few-shot prompting to generate the rationale for all the problems in the dataset. We collect rationales that lead to the correct answer, and fine-tune the LLM further. 6/

1

0

29

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

1. The formal math data is very scarce. The whole Isablle proof script is only about 180MB. 2. There is almost zero aligned data between natural language and formal mathematics, whereas docstrings for language like Python are broadly available.

1

6

29

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

I'm glad to share that LIME is accepted at #ICML2021 ! One of the things I like about our publishing process is that there is always the next conference :) If you truly believe in your paper, then it will be published sooner or later! Just keep polishing 🛠️🛠️

Roger Grosse

@RogerGrosse

3 years

@yaringal We had a paper rejected with 8,7,6,6, with thorough reviews and lots of discussion. The one-sentence reason for rejection -- that training on data is the wrong way to instill knowledge in an algorithm -- feels like something out of AAAI 1993.

14

24

201

2

29

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

At #ICML2020 , we present OPtions as Responses (OPRE), an HRL agent in multi-agent settings. Our hierarchical agent generalizes to unseen opponent strategies and learns interpretable options. (1/n) Paper: Poster: .

1

8

27

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

Yeah I am stunned by this. Don't know what to think of it. We have worked so hard on this. Getting rejected by just one sentence meta-review, overriding all decisions made by the reviewers, just seems so crazy and unfair.

Roger Grosse

@RogerGrosse

3 years

@yaringal We had a paper rejected with 8,7,6,6, with thorough reviews and lots of discussion. The one-sentence reason for rejection -- that training on data is the wrong way to instill knowledge in an algorithm -- feels like something out of AAAI 1993.

14

24

201

1

0

26

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We show two randomly chosen few-shot examples in the prompt, from latex to formal math (Isabelle). Note that these two examples are merely examples of syntactical translations, without much sophistication in reasoning or natural language understanding. 2/

4

24

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

All talks and the panel discussion is now released at . The panel discussion starts at 4:33:47. Enjoy!

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

Happening now! ICLR virtual site: Gather town: Schedule: Accepted papers:

0

7

0

4

25

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Why is this surprising? People know large language models can turn natural language descriptions into code. However, the existing known successes are limited to commonly used programming languages (e.g., Python). Formalizing mathematics is different for at least two reasons. 10/

1

4

25

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We use Codex to formalize 3908 MATH problems. We then run expert iteration on these autoformalized statements. This allows us to achieve a new state of the art on the miniF2F theorem proving benchmark. This is the first proof-of-concept of practical autoformalization! 7/

1

3

23

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Most amazingly, Minerva shows an insane ability to generate logically consistent step-by-step solutions purely in natural language! See more here: 2/

3

1

23

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Can the model learn to formalize such problems if the prompt contains an example that explains the concept? We find if we add a tangentially related problem, then the model can formalize the “linear function” perfectly! 6/

2

3

22

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Lastly, big thanks to my collaborators: @AlbertQJiang , @WendaLi8 , @MarkusNRabe , Charles Staats, @Mateja_Jamnik , @ChrSzegedy !!!

1

3

23

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Let’s also look at a failure case as well – in this case Codex fails to formalize the concept of “linear function”. It made up a name: linear f. 5/

1

4

22

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

🚨👇Reminder that the submission deadline for the MATH-AI workshop at #NeurIPS2022 is tomorrow -- Sep 30, 11:59pm PT. Submit your recent works (e.g. ICLR submissions) if they are about Math&AI, reasoning, algorithmic capabilities!

Pan Lu

@lupantech

2 years

🚨Call for Papers🚨 Submission to the #NeurIPS2022 MATH-AI Workshop will be due on Sep 30, 11:59pm PT (2 days after ICLR😆). The page limit is 4 pages (not much workload🤩). Work both in progress and recently published is allowed. Act NOW and see you in #NewOrleans !🥳🥳🍻

0

9

26

0

4

20

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

This is also a fundamentally iterative process. With a better model, it can generate better rationales, and that can be used to train a better model. 7/

1

0

20

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Join us to work on reasoning with large language models!

Behnam Neyshabur

@bneyshabur

2 years

🔥Internship Opportunity on Improving the Reasoning Capabilities of Massive Language Models🔥: solving challenging problems in areas such as mathematics, science, programming, algorithms, and planning. Please see the following link for more info:

1

28

105

0

5

20

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

STaR shows many possible future directions. In general, any tasks that has an input and an output can be augmented with intermediate rationales. Tasks that require multiple steps of reasoning can benefit from it the most, such as theorem proving, program synthesis etc. 10/

1

19

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We further explore if the model can handle more advanced mathematics beyond competition problems. We find these models are surprisingly good at turning formal statements into natural language as well! 8/

1

3

18

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

In addition, for those problems the model answered incorrectly, we give the model a hint -- tell the model the right answer, and ask it to provide a justification. 8/

2

0

18

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We see the model makes a jump in reasoning. Going from the definition, "for all x, if x in A -> x in B", to a more concise and abstract phrase "A is a subset of B". Also the same for "finite intersections" and "arbitrary unions". See examples in the figures! 9/

1

5

17

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

Come and join us today (Wed) to learn about our recent works using neural nets for theorem proving #ICLR2021 ! IsarStep: High-level Mathematical Reasoning , 12-2pm ET INT: Evaluating Generalization in Theorem Proving , 8-10pm ET

0

5

18

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

I'll moderate a panel discussion tomorrow 10am PT/1pm ET at MATHAI , featuring Fields Medalist Tim Gowers @wtgowers and Turing Award winner Yoshua Bengio. We will be discussing reasoning, the role of math in general intelligence, and the challenges ahead.

1

17

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

@baaadas 8029 -- one that's submitted right after 4pm

0

1

17

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

I’ve been working with Blueshift on reasoning with LLMs. It’s an amazing team with an ambitious goal, and a group of super smart, talented people.

Behnam Neyshabur

@bneyshabur

2 years

🔥Opening in our team – Blueshift🔥 We are looking for a research engineer interested in extending the capabilities of large language models. Learn more about the role & apply here: Learn about our team: Please retweet :-) 🙏

2

25

83

0

3

16

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

Excited to share this new work! We trained a GNN-based branching heuristics for model counting. It generalizes to problems of much larger sizes, achieving improvement over SOTA by orders of magnitude.

Pashootan Vaezipoor ✈️ NeurIPS

@pashootan_vaezi

4 years

Can neural network agents improve wall-clock performance of propositional model counters? We present Neuro#, a neuro-symbolic solver that can do that: Joint work w/ @gilled34 , @Yuhu_ai_ , @cjmaddison , @RogerGrosse , Edward Lee, Sanjit Seshia, Fahiem Bacchus

0

3

18

0

1

16

Yuhuai (Tony) Wu

@Yuhu_ai_

7 years

This is always one of my favorite papers.

Ferenc Huszár

@fhuszar

7 years

This morning I read through this new paper by James Martens. It's a great extensive summary/review of second order gradient-based optimization, highly recommended:

1

30

137

0

3

14

Yuhuai (Tony) Wu

@Yuhu_ai_

5 years

A new work with Emilio and other CMU collaborators. The goal is to meta learn exploration. Instead of using a single agent to explore, which would result in a long horizon problem, we have multiple agents explore simultaneously, sharing findings from one to another.

Lisa Lee

@rl_agent

5 years

Check out Emilio's new paper: Concurrent Meta Reinforcement Learning (w/ @Yuhu_ai_ , @rsalakhu , and others) tl;dr CMRL learns a multi-agent communication protocol to coordinate exploration between parallel rollout agents.

0

10

39

0

15

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

@ericzelikman Human reasoning is often the result of extended chains of thought. We want to train a model that can generate explicit rationales before answering a question. The main challenge: most of the datasets only contain a question answer pair, but not the intermediate rationales.

1

0

14

Yuhuai (Tony) Wu

@Yuhu_ai_

11 months

@TobyPhln +1

0

4

13

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Check out: Paper link: (ArXiv will follow tonight) Google blog post: Sample explorer: 4/

Minerva: Solving Quantitative Reasoning Problems with Language Models

Posted by Ethan Dyer and Guy Gur-Ari, Research Scientists, Google Research, Blueshift Team Language models have demonstrated remarkable performance...

research.google

1

0

14

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Great work!!

Zhangir Azerbayev

@zhangir_azerbay

2 years

Autoformalization with LLMs in Lean... for everyone! The chat interface for autoformalizing theorem statements in Lean built by myself and @ewayers is now publicly available as a vs-code extension.

7

30

162

0

12

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Last but not least, if you are interested in AI for math / reasoning / LLMs, come visit the MATH-AI workshop on Dec 3:

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

🚨We are organizing the 2nd MATHAI workshop at NeurIPS! Check it out if you're interested in AI for math, and machine reasoning in general🤯! We have a great lineup of speakers & panelists! See more in call for papers: 👇

3

30

150

0

1

13

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Lastly, I want to thank my amazing collaborators, Eric Zelikman @ericzelikman , and Noah Goodman for this exciting work! 11/

2

1

13

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

If we can collect the instructions used by humans to fix the formalization, that'd be very valuable. @XenaProject @TaliaRinger .

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Autoformalization with LLMs in Lean! @zhangir_azerbay and Edward Ayers built a chat interface to formalize natural language mathematics in Lean: Very impressive work!

5

49

193

1

13

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

"Exploring Length Generalization in Large Language Models" accepted as an *Oral presentation*! We discovered that certain skills are hard to be encoded in model weights, but much easier to be acquired from the context. 5/ 10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We discover that you can teach LLMs to solve longer problems *only* via in-context learning, instead of fine-tuning. This is mind-blowing🤯🤯! -- that certain skills are hard to be encoded in model weights, but much easier to be acquired from the context.

2

16

105

1

0

12

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

We performed experiments on the arithmetic problem (from Nye et al.), and CommonsenseQA. On CQA, STaR with GPT-J attained 72.3%, which was on par with the result obtained by GPT-3 (73%), finetuned to directly output the final answer. 9/

1

0

11

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Instead, we ask: can we leverage large language models(LLMs)’ pre-existing knowledge to improve its reasoning? 5/

1

0

11

Yuhuai (Tony) Wu

@Yuhu_ai_

4 months

@cHHillee @giffmana That's right. But Grok-1 (in the blog) was also not trained for benchmarks. So you'll see the raw model has pretty much similar numbers as in the blog post.

1

0

12

Yuhuai (Tony) Wu

@Yuhu_ai_

7 years

People need RELAX!

David Duvenaud

@DavidDuvenaud

7 years

RELAX! Our new gradient estimator handles discrete variables and black-box functions. Now going to try hard attention, latent graphs, and more RL problems. by amazing students @wgrathwohl @chlekadl @Yuhu_ai_ @geoffroeder

3

151

447

0

4

11

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

@iandanforth Hi Ian, thanks for pointing out the Abstraction and Reasoning Challenge. We will take a closer look to see if our model fits!

1

0

12

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

@ChrSzegedy @MarkusNRabe @spolu @GuillaumeLample Challenge accepted ;)

1

0

12

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

@ChrSzegedy @MarkusNRabe @spolu @jessemhan @GuillaumeLample @f_charton @AlbertQJiang @WendaLi8 @DanHendrycks would love to see you there!

1

0

11

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Many thanks to all of my students / collaborators @AlbertQJiang @cem__anil @ericschmidt @JinPZhou @s_tworkowski , Felix Li, Imanol Schlag, @WendaLi8 , Michał Zawalski, mentors @ChrSzegedy @percyliang @bneyshabur , and wonderful teammates at N2formal, Blueshift for a fruitful year!

1

0

10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

LLMs autoformalize natural language mathematics into formal math code, the first proof-of-concept for autoformalization🤯: 1/10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

After showing a few examples, large language models can translate natural language mathematical statements into formal specifications. We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F! 1/ Paper:

5

110

459

1

0

11

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

This idea is straightforward, and we were surprised that no one has tried before. Then we realized it is not trivial to make it work.

1

0

10

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

Fun fact: Hu et. al. () found that most of the previous successful neural methods exploited a short-cut solution. After removing the dataset bias, those methods suffered a lot (e.g., CoPINet went from 91.4% -> 46.3%). SCL was not affected at all. (4/n)

1

0

11

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

One solution is to use human labels [Rajani et al. ]. But this is costly and hence not scalable. In addition, the model cannot improve beyond human labels. 3/

1

0

10

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

Our code for "Understanding Short-Horizon Bias in Stochastic Meta-Optimization" is released at .

GitHub - renmengye/meta-optim-public: Understanding Short-Horizon Bias in Stochastic Meta-Optimiz...

Understanding Short-Horizon Bias in Stochastic Meta-Optimization - renmengye/meta-optim-public

github.com

0

5

10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Into the future, we believe it may be possible to develop synthetic tasks that outperform natural pre-training on some downstream tasks: the complexity of existing natural data is fixed, while in some sense the complexity of fully synthetically generated data is infinite. 10/

2

1

10

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

SCL is designed to discover the compositional structures of the data. In RAVEN, It learns to discover the compositions of objects, attributes, and relationships. The figure shows an example where SCL learns the concept of “size”. (2/n)

1

0

10

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

Camera ready version on short-horizon bias, to appear in #iclr2018 . It tells you why you should always start with aggressive learning rate and then decay. Meta-optimization is hard because the objective is biased. A fantastic collaboration with @mengyer , @RogerGrosse and Renjie.

Roger Grosse

@RogerGrosse

6 years

Generalization to longer horizons is the Achilles' heel of gradient-based meta-optimization. Short horizon meta-optimizers decay the learning rate really quickly and stop making progress. New paper w/ @Yuhu_ai_ , @mengyer , and Renjie Liao.

1

19

85

1

3

10

Yuhuai (Tony) Wu

@Yuhu_ai_

6 years

Going back to my hometown and give a talk at 2050 Yunqi Conference!

2

0

10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

I was told many cool people are at MATHAI workshop😎

Swaroop Mishra

@Swarooprm7

2 years

Math-AI workshop starting at room 293-294 with @lupantech @wellecks @Yuhu_ai_ @HannaHajishirzi @percyliang #NeurIPS2022 #NLProc

0

4

28

0

2

10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Minerva attains grades higher than average high school students and surpasses timeline prediction by 3 years: 3/10

alewkowycz

@alewkowycz

2 years

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.

108

1K

8K

1

0

9

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Great opportunity to work on LLM reasoning with amazing people!

Behnam Neyshabur

@bneyshabur

2 years

If you are interested in solving challenging multi-step reasoning problems with LLMs, join us! We have an opening for a Research Scientist position at Blueshift! Learn more about the role & apply here: Learn about our team:

1

10

63

0

1

9

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

"Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search" accepted to DRL workshop: 10/10

Piotr Miłoś

@PiotrRMilos

2 years

Subgoal search is an appealing class of methods to solve complex tasks by considering intermediate subgoals that advance towards the goal. Is it beneficial to vary the subgoal distance (and how)? Turns out the answer is yes: A thread: 1/8

2

7

18

1

0

9

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Another solution is to use in-context learning to induce rational generation [Nye et al. , Wei et al. ]. But few-shot performance significantly underperforms finetuning. 4/

1

0

8

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

A very cool work on natural language theorem proving from @wellecks et. al.! It's nice to see lots of observations are shared between informal and formal math proving: the importance of premise selection, failure cases etc. Looking forward to combine the best of both worlds!

Sean Welleck

@wellecks

2 years

New paper: Theorem proving in natural mathematical language- the mix of symbolic and natural language used by humans- tests reasoning and plays a central role in mathematical education. Can language models prove theorems & help us when we're stuck? 1/N

3

61

212

1

9

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Looking forward to our discussion on Mathematics and AI tomorrow at #ICML2022 !

Katie Collins

@katie_m_collins

2 years

We're also very excited to be hosting a panel discussion on the nexus of AI and Mathematics from 3:45-4:30pm ET featuring @Yuhu_ai_ @PetarV_93 @TaliaRinger and @ibab_ml !

1

2

7

1

4

8

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Extrapolating to more difficult problems via equilibrium models: 9/10

Cem Anil

@cem__anil

2 years

🆕📜When can **Equilibrium Models** learn from simple examples to handle complex ones? We identify a property — Path Independence — that enables this by letting EMs think for longer on hard examples. (NeurIPS) 📝: []()

3

35

116

1

0

9

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

Happening now! ICLR virtual site: Gather town: Schedule: Accepted papers:

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

We’re excited to announce the MathAI workshop at ICLR 2021 : On the Role of Mathematical Reasoning in General Artificial Intelligence. Now accepting submissions! Submission Link: Deadline: Feb 26, 11:59PM PST

1

17

94

0

7

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Very nice work! Automating prompt engineering for the win!

Keiran Paster

@keirp1

2 years

APE generates “Let’s work this out in a step by step way to be sure we have the right answer”, which increases text-davinci-002’s Zero-Shot-CoT performance on MultiArith (78.7 -> 82.0) and GSM8K (40.7->43.0). Just ask for the right answer? @ericjang11 @shaneguML

3

14

101

0

1

9

Yuhuai (Tony) Wu

@Yuhu_ai_

4 years

By learning compositional structures, it can even generalize to unseen analogies. E.g., After learning (“color”, “constant”), and (“size”, “progression”), the model can generalize to (“color”, “progression”). (3/n)

1

0

7

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

LLMs are not good at premise selection in theorem proving due to limited context window. Thor addresses this by combining symbolic AI (sledgehammer) to achieve SOTA: 6/10

Albert Jiang

@AlbertQJiang

2 years

Language models are bad at retrieving useful premises from large databases for theorem proving, mainly because they're limited by a small context window. We use symbolic tools to overcome this difficulty, boosting proof rates from 39% to 57%. Thor: 1/

3

12

56

1

0

8

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Three experiments iteratively simplify pre-training, shedding light on the understanding of pre-training via synthetic tasks. 8/10

Yuhuai (Tony) Wu

@Yuhu_ai_

2 years

Excited to share this new work, which sheds light on the understanding of pre-training via synthetic tasks. We did three experiments that iteratively simplify pre-training while still retaining gains. Paper: W. Felix Li, @percyliang . 1/

2

20

112

1

0

8

Yuhuai (Tony) Wu

@Yuhu_ai_

3 years

We recently worked on extracting datasets for training neural theorem provers for Lean. Our model can prove 35.9% test theorems. Check out the following Demo! We created a tool for querying a 3B GPT model when writing math proofs in VS code. #InteractiveNeuralTheoremProving

Jesse Michael Han

@jessemhan

3 years

Excited to share this demo of interactive neural theorem proving in Lean (joint WIP with Jason Rute, @Yuhu_ai_ , Ed Ayers, and @spolu )! Below, the `gpt` tactic is querying a 3B param transformer trained on Lean proofs. We can prove 35.9% of theorems in a held-out test set.

2

28

128

1

8