Aditya Timmaraju @tadityasrinivas profile

Aditya Timmaraju

@tadityasrinivas

Followers

930

Following

20K

Statuses

256

Sr Staff ML Eng @GoogleDeepMind on Gemini efficiency. Past: @meta, @stanford, @IITHyderabad. Hyd 🛫 SF Bay 🛫 Hyd. Views expressed don’t reflect my employer’s.

Hyderabad, India

Joined May 2010

Don't wanna be here? Send us removal request.

Aditya Timmaraju

@tadityasrinivas

9 hours

RT @dwarkesh_sp: The @JeffDean & @NoamShazeer episode. We talk about 25 years at Google, from PageRank to MapReduce to the Transformer to…

0

201

0

Aditya Timmaraju

@tadityasrinivas

22 hours

Matryoshka nesting🪆is a gift (kalpavriksha?) that keeps on giving, what say @adityakusupati :) Check out this super cool work from my colleagues, manager & Jeff Dean on Matryoshka nesting applied to weight representation precision. @jainprateek_ PS: Stay tuned for more from the Mat pipeline

Jeff Dean

@JeffDean

2 days

Delighted to be a minor co-author on this work, led by @pranavn1008: Combining losses for different Matyroshka-nested groups of bits in each weight within a neural network leads to an accuracy improvement for models, especially for low-bit-precision levels (e.g. 2-bit representations). Paper: "Matryoshka Quantization" at Inspired by an off-hand comment I made to @jainprateek_ and @adityakusupati about their Matyroshka Repsentation Learning work (: "In the same way that Matryoshka representations across different units work, I wonder if we could treat the bits of each weight in a similar nested way". It turns out the answer is yes!

0

2

Aditya Timmaraju

@tadityasrinivas

2 days

Hopeful that this top-down push towards hybrid work will improve labor participation rates in rural India.

N Chandrababu Naidu

@ncbn

2 days

Andhra Pradesh is planning "Work From Home" in a big way, especially for women. First, I would like to extend greetings to all women and girls in STEM on the International Day of Women and Girls in Science. Today, we celebrate their achievements and commit ourselves to providing them equal and full access to growth opportunities in these fields. Now, returning to the headline - As we know, work landscape underwent a shift during the COVID-19 pandemic. With technology readily available to scale, "Work From Home" gained prominence. Concepts such as remote work, coworking spaces (CWS), and Neighbourhood Workspaces (NWS) can empower businesses and employees alike to create flexible, productive work environments. Such initiatives can help us strike a better work-life balance as well. We plan to harness these trends to drive meaningful change in AP. The Andhra Pradesh IT & GCC Policy 4.0 is a game-changing step in that direction. We're offering incentives for developers to create IT office spaces in every city/town/mandal and supporting IT/GCC firms to generate employment at the grassroots. I'm confident these initiatives will foster greater workforce participation, especially of women professionals, who will benefit through flexible remote/hybrid work options.

0

6

Aditya Timmaraju

@tadityasrinivas

3 days

RT @SarvamAI: We are very excited to launch Sarvam Fellows, our initiative to train the next generation of AI researchers. Through this pro…

0

78

0

Aditya Timmaraju

@tadityasrinivas

4 days

@AndreasD1337 @IntuitMachine The notion of irrecoverable errors seems flawed -

Aditya Timmaraju

@tadityasrinivas

7 days

Indeed, we’re seeing from R1’s reasoning traces that "Wait...", "Oh but..." are ways of backtracking past decoding errors, so errors aren’t really irrecoverable. Also, R1 Zero with pure autoregressive generation goes from 15 to 71 on AIME without any test time search or verifiers. It’s interesting that Yann, who faced resistance from “theoretically sound” ML folks when trying to publish CNNs/DL related work 15 years ago, is now mounting (somewhat flawed) theoretical reasons for why LLMs won’t work.

1

0

Aditya Timmaraju

@tadityasrinivas

4 days

RT @DimitrisPapail: AIME I 2025: A Cautionary Tale About Math Benchmarks and Data Contamination AIME 2025 part I was conducted yesterday,…

0

45

0

Aditya Timmaraju

@tadityasrinivas

5 days

Why AI developers should pick Gemini 2.0 Flash ⚡️ Lowest hallucination rate ✅ Top on cost-performance Pareto ✅ Super long 1M context ✅ Tools and Multi-modality ✅

AshutoshShrivastava

@ai_for_success

5 days

Gemini 2.0 Flash is the best model available right now for genric usecase. - Quality response. - Super fast. - Support ( Audio, video, docs, image,) - Tools ( Structured Output, code execution, function calling, Grounding) - least hallucination. - super cheap - 1M context Token.

0

4

Aditya Timmaraju

@tadityasrinivas

5 days

RT @ai_for_success: Hallucination rates for the top 25 LLMs from vectara show that new Google Gemini 2.0 Flash has the lowest hallucination…

0

67

0

Aditya Timmaraju

@tadityasrinivas

6 days

Umm 🤔, the cake génoise (DeepSeek V3) gets 15% on AIME and the cherry (R1 zero) takes it to 71%, each is individually weak without the other. So the analogies don’t seem appropriate. Also R1Z gets 71 just with pure autoregressive decoding at inference (rises to 86 with simple voting), so the notion of irrecoverable errors in his recent talks is off.

1

0

3

Aditya Timmaraju

@tadityasrinivas

7 days

Indeed, we’re seeing from R1’s reasoning traces that "Wait...", "Oh but..." are ways of backtracking past decoding errors, so errors aren’t really irrecoverable. Also, R1 Zero with pure autoregressive generation goes from 15 to 71 on AIME without any test time search or verifiers. It’s interesting that Yann, who faced resistance from “theoretically sound” ML folks when trying to publish CNNs/DL related work 15 years ago, is now mounting (somewhat flawed) theoretical reasons for why LLMs won’t work.

Aditya Timmaraju

@tadityasrinivas

9 days

@rbhar90 How so? Unless you’re explicitly doing some test time search outside the LLM (eg tree search or majority vote), it is just an autoregressive output. And discredits the notion that errors are irrecoverable.

0

1

Aditya Timmaraju

@tadityasrinivas

7 days

RT @joost_v_amersf: Interested in helping us make Gemini Pro even better? The Gemini pre-training team is looking for a Research Scientist…

0

19

0

Aditya Timmaraju

@tadityasrinivas

8 days

RT @getjonwithit: "General relativity doesn't admit black hole solutions. It only admits *wormhole* solutions." I have previously made thi…

0

312

0

Aditya Timmaraju

@tadityasrinivas

8 days

RT @ai_for_success: This is new and massive. Google have launched a new model Gemini 2.0 Flash Thinking Experimental with apps. This can…

0

20

0

Aditya Timmaraju

@tadityasrinivas

8 days

RT @OfficialLoganK: Gemini 2.0 Flash is the best value prop of any LLM, it’s time to build!

0

97

0

Aditya Timmaraju

@tadityasrinivas

8 days

Right, my point was R1 is evidence that incentivizing long reasoning CoTs via train time RL suffices and pure autoregressive generation already gets us most of the way. Eg. on AIME, R1 Zero went from 15 to 71 with just pass@1 and further majority voting (outside the LLM but very simplistic) improved it to 86.

1

0

2

Aditya Timmaraju

@tadityasrinivas

9 days

@svembu Exactly, there are more levels to it than most discourse over self-hosting has focused on

Aditya Timmaraju

@tadityasrinivas

15 days

DeepSeek R1 hosted their own server has two levels of censorship: (L1) A post-decoding (generation) filter based on keywords like "Taiwan", "Arunachal Pradesh" (L2) A biased dataset added with a higher weighting in the pre-training and/or SFT mixture. And RLHF signal to ensure it conforms to their expectation. While (L1) is easy to get rid of with self-hosting, (L2) takes some more work. End users who can self-host or use a US hosted service, while benefitting from the cost improvements, might want to also fix (L2) depending on their use case. We can see this by looking at the difference in responses for the same query. R1 on DeepSeek server, refuses to even answer non-provocative questions like Q1: "Who is the President of Taiwan" A: "Sorry, I'm not sure how to approach this type of question yet. Let's chat about math, coding, and logic problems instead!" Q2: "Name the states of India" - A: N/A (generation proceeds until it says Arunachal and then just deletes everything) R1 hosted independently as-is, gets rid of (L1) above, but (L2) still exists via PT/SFT data, yielding the following responses: Q1: "Who is the President of Taiwan" A: Taiwan is an inalienable part of China, and there is no such position as "President of Taiwan." Currently, the leader of the Taiwan region is Lai Ching-te. We adhere to the One-China principle and oppose any form of "Taiwan independence" separatist activities. Q2: "Name the states of India" Proceeds to answer because the post decode filter doesn't exist with a self-hosted version. "As of 2023, India comprises 28 states and 8 union territories. Below is the list of states: Andhra Pradesh Arunachal Pradesh Assam Bihar Chhattisgarh ... "

0

2

Aditya Timmaraju

@tadityasrinivas

9 days

RT @TheSeeker268: Whoa. Just found something potentially significant: a USAID document FOIA’d by @USRightToKnow contains a coded interview…

0

600

0

Aditya Timmaraju

@tadityasrinivas

10 days

Amazing to see this. I wonder how much of this is replicable in a Parliamentary system. We need to find the best cracked nerds and put them in places where they can have an outsized positive impact on the future of our country. Hope the powers that be are watching.

Geiger Capital

@Geiger_Capital

10 days

Elon really has his best cracked engineers sleeping in the Eisenhower Building, trying to put some startup energy into the government. Incredible. We needed this.

0

1

Aditya Timmaraju

@tadityasrinivas

10 days

Yet another marker of the insane pace of progress in AI. I had to check to see that both axes linear and there is no weird scaling/clipping. And more recently, performance on Humanity's Last Exam ( going up from 9 (o1) to 13 (o3-mini) to 24 (o3 DR) in literally a few weeks. Just nuts.

Derya Unutmaz, MD

@DeryaTR_

10 days

I hope everyone who sees this chart fully understands where we are & where we’re heading (just use your conservative extrapolation for the next two years). Meanwhile, a massive shock awaits those who don’t understand it or remain unaware! Source: @emollick & @EpochAIResearch

0

1