Dwarak Profile
Dwarak

@DwaraknathG

Followers
549
Following
1K
Statuses
425

Pretraining @ Cohere

London, England
Joined September 2019
Don't wanna be here? Send us removal request.
@DwaraknathG
Dwarak
2 days
New research from us!
@RobertY970316
Robert Yang
2 days
📷 Excited to share our new paper: "Rope to Nope and Back Again: A New Hybrid Attention Strategy" where we propose a novel architecture that outperforms RoPE-NTK-based approaches with full attention span. (1/8)
0
0
8
@DwaraknathG
Dwarak
14 days
RT @DAEIndia: 🇮🇳@LIGOIndia Facility Inaugurated: Gateway to Gravitational Wave Research in India Dr. A. K. Mohanty, Secretary, DAE & Chair…
0
615
0
@DwaraknathG
Dwarak
15 days
@sid_srk Ah I see, now I know why you’re not returning my calls
1
0
1
@DwaraknathG
Dwarak
15 days
@bookwormengr @dhume Good to see very sensible points. Well done!
0
0
2
@DwaraknathG
Dwarak
15 days
Finally a good take.
@bookwormengr
GDP
16 days
Writing this as an Indian who works on AI in leadership role for one the largest companies in the world (though strictly my personal opinion, but based on verifiable data). You heard it first here: —————————- First some more shocks: You heard DeepSeek. Wait till you hear about Qwen (Alibaba), MiniMax, Kimi, DuoBao (ByteDance) all from China. Within China, DeepSeek is not unique and their competition is close behind (not far behind). IMHO, China has 10 labs comparable to OpenAI/Anthropic and another 50 tier 2 labs. The world will discover them in coming weeks in awe and shock. AI is not hard (I am not high) ———————————— Ignore Sam Altman. Many teams that built foundation models are below 50 persons (e.g. Mixtral). In AI, LLM science part is actually quite easy. All these models are “Transformer Decoder only models”, an architecture that was invented in late 2017. There are improvements since then (flash attention, ROPE, MOE, PPO/DPO/GRPO), but they are relatively minor, open source and easy to implement. Since building foundation models is easy and Nvidia is there to help you (if not directly, then by sharing their software like “Megatron” that is assembly line to build AI models) there are so many foundation models built by Chinese labs as well as global labs. It is machines that learn by themselves…if you give them data & compute. This is unlike writing operating system or database software. Also, everyone trains on same data: internet archives, books, github code for the first stage called “pre-training”. What is part is hard then? ———————————- It is the parallel & distributed computing to run AI training jobs across thousands of GPUs that is hard. DeepSeek did lot of innovation here to save on “flops” and network calls. They used an innovative architecture called Mixture of Experts and a new approach called GRPO. with verifiable rewards both of which are in open domain through 2024. Also, there is lot of data curation needed particularly for “post training” to teach model on proper style of answering (SFT/DPO) or to teach them learn to reason (GRPO with verifiable reward). STF/DPO is where “stealing” from existing models to save cost of manual labor may happen. LLM building is nothing that Indian engineers living in India cannot pull off. Don’t worry about Indians who have left. There are plenty in the country as of today. Then why India does not have foundation models? ——————— It is for the same reason India does not have Google or Facebook of its own. You need to able to walk before you can run. There is no protected market to practice your craft in early days. You will get replaced by American service providers as they are cheaper and better every single time. That is not the case with Chinese player. They have a protected market and leadership who treats this skillset as existential due to geopolitics. So, even if Chinese models are not good in early days they will continue to get funding from their conglomerates as well as provincial governments. Darwinian competition ensures best rise to the top. Recall DeepSeek took 2 years to get here without much revenue. They were funded by their parent. Also, most of their engineers are not PHDs. There is nothing that engineers who built Ola/Swiggy/Flipkart cannot build. Remember these services are second to none when you compare them to their Bay Area counterparts. Also , don’t trivialize those services; there is brilliant engineering to make them work at the price points at which they work. Indian DARPA with 3B USD in funding over 3 years ———————- What we need is a mentality that treats this skillset as existential. We need a national fund that will fund such teams and the only expected output will be benchmark performance with benchmarks becoming harder every 6 months . No revenue needed to survive for first 3 years. That money will be loose change for GOI and world’s richest men living in India. @protosphinx @balajis @vikramchandra @naval
0
0
6
@DwaraknathG
Dwarak
15 days
RT @gkcs_: Indian tech influencers are reacting to DeepSeek in hilarious ways. "Why can't we build our own LLM? What's stopping us?" Noth…
0
155
0
@DwaraknathG
Dwarak
15 days
@gkcs_ First sensible tweet about all this stuff.
0
0
1
@DwaraknathG
Dwarak
22 days
RT @AravSrinivas: Re India training its foundation models debate: I feel like India fell into the same trap I did while running Perplexity.…
0
2K
0
@DwaraknathG
Dwarak
25 days
@LauraRuis is awesome :)
@MLStreetTalk
Machine Learning Street Talk
26 days
We spoke with @LauraRuis from @CohereForAI and @ucl about her paper "Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models" where she demonstrated an interesting gap between retrieval and reasoning queries in LLMs indicating the presence of synthesised procedural knowledge generation.
0
0
4
@DwaraknathG
Dwarak
28 days
RT @ultasawaal: The only way to describe the morning of 15th Jan - 🤯 India’s first private satellite constellation just went up. Not a te…
0
123
0
@DwaraknathG
Dwarak
30 days
RT @AshwiniVaishnaw: Discussed options for building India’s sovereign LLM with Sarvam team. Huge potential to solve population scale probl…
0
379
0
@DwaraknathG
Dwarak
1 month
@ishanspatil @PixxelSpace LFG 🇮🇳
0
0
1
@DwaraknathG
Dwarak
1 month
RT @codegptAI: AI Models in Arabic: A Performance Comparison 🧪🌐 How do LLMs perform when answering in Classical Arabic? 🌍 Spoiler: GPT-4 i…
0
6
0
@DwaraknathG
Dwarak
1 month
@ultasawaal Incredible work, Rahul! Very excited to see this. I work on LLMs - would love to chat if you’re down, to get a sense of what you’re doing and how I can contribute :)
0
0
0
@DwaraknathG
Dwarak
1 month
RT @1vnzh: North is a workspace for intelligent agents to easily and securely collaborate with us at work, using the same tools, signals an…
0
14
0
@DwaraknathG
Dwarak
1 month
RT @cohere: North provides a trusted platform that seamlessly integrates into the workplace tools and applications that employees already u…
0
5
0
@DwaraknathG
Dwarak
1 month
RT @cohere: Employees can instantly customize and deploy AI agents that can help them perform complex tasks, regardless of technical backgr…
0
7
0
@DwaraknathG
Dwarak
1 month
RT @cohere: Today, we’re launching early access for North! Our all-in-one secure AI workspace platform combines LLMs, search, and agents i…
0
99
0
@DwaraknathG
Dwarak
1 month
RT @aidangomez: If your company is interested in joining the early access program, sign up here! (link is at the bottom of the blog) https:…
0
2
0
@DwaraknathG
Dwarak
1 month
@aidangomez Acyr is mad at every one bro
0
0
3