Harveen Singh Chadha Profile
Harveen Singh Chadha

@HarveenChadha

Followers
3K
Following
4K
Statuses
2K

Data Scientist II at Microsoft | Built Vakyansh OSS in Bhashini | Testing LLMs for fun | Author: https://t.co/VEgrfTm8kV | Stock Market Investor | Views personal

Joined July 2019
Don't wanna be here? Send us removal request.
@HarveenChadha
Harveen Singh Chadha
4 years
Open Source Alert: Very excited to announce we are open sourcing Vakyash, a speech recognition framework to democratize speech recognition in Indic Languages. Some key features: 1. End to end training and experimentation platform built on top of @facebookai Wav2Vec 2.0.
Tweet media one
20
148
536
@HarveenChadha
Harveen Singh Chadha
3 days
India needs to setup at least 5 Tier-1 and 10 Tier-2 labs by end of this year if we are genuinely serious about AI
10
6
166
@HarveenChadha
Harveen Singh Chadha
3 days
Last month, I was trying to build a parser using Azure ADI, as GPT-4o struggled with tables and low-resource script handling. Then tried to get bounding box regions from ADI as input to GPT-4o, and it improved drastically. Some days later, gave a random finance report to notebookLM and was so surprised to see the parsing accuracy. Immediately, went and tried gemini-2.0-Flash-Exp, and after some prompt modifications, I was able to get near-perfect outputs. Not really surprised to see these results, in fact flash thinking would be much better IMO. Google is really stepping up their game, not just with quality but with affordability as well
Tweet media one
0
0
9
@HarveenChadha
Harveen Singh Chadha
4 days
RT @danielhanchen: We managed to fit Llama 3.1 8B < 15GB with GRPO! Experience the R1 "aha moment" for free on Colab! Phi-4 14B also works…
0
288
0
@HarveenChadha
Harveen Singh Chadha
4 days
Copying code is something every dev does, there is nothing wrong with it infact open source promotes reusability(with proper attributes). However, ethical considerations arise when you build a business around it. The inference code provided by AI4Bharat is MIT license, the code in Krutrim Translate repo is copied and modified with changes to run for your model configuration and you top it up with Krutrim Community License without even folking the repo. Is it legal ? Maybe yes (don’t have much clarity), is it ethical ? I leave it upto you to decide. I wish the best for your future projects, we all want you to win !
7
12
271
@HarveenChadha
Harveen Singh Chadha
5 days
if you think this is one of case.. I just checked the commit history of Chitrarth repo, it is copied from haotian-liu/LLaVA again the license is changed very conveniently
Tweet media one
5
17
362
@HarveenChadha
Harveen Singh Chadha
5 days
Trying krutrim translate today And seeing the inference scripts and usage instructions, I am sure that this work was done by very uninterested devs. I mean you just literally need to copy paste from indictrans2 repo 🤦‍♂️ The model works well though !
0
0
22
@HarveenChadha
Harveen Singh Chadha
6 days
Just tried it and 🤯
@amasad
Amjad Masad
6 days
Whatever you need… make an app for that. Now on your phone. For everyone. Free.
0
0
3
@HarveenChadha
Harveen Singh Chadha
6 days
Raising 1000 crores is not a joke with this portfolio of models. Just for comparison, sarvam’s valuation is 966 crores It’s a big day for Indian AI ecosystem. I wonder if AI4Bharat, with a similar portfolio, goes to raise, how much they can raise.
0
0
13
@HarveenChadha
Harveen Singh Chadha
6 days
@seyarkayarivu Please read “Feb 2024 released the model”, released where ?? And what is internal release ? By this logic openai-o3 was released before deepseek R1
2
0
4
@HarveenChadha
Harveen Singh Chadha
6 days
All models released by Krutrim today for the "open source" community come with the Krutrim Community License. This is comparable to what Llama has been using; while Llama 3.2 does not require a separate license if your MAU are <700M, Krutrim's limit is only 1M
Tweet media one
2
3
19
@HarveenChadha
Harveen Singh Chadha
6 days
Even though Krutrim's vocab is almost double of that of sarvam, the fertility score and average token count is still higher than that of sarvam in hindi. Vocabulary size (Sarvam): 64128 Vocabulary size (Krutrim): 131072 Average token count (Sarvam): 33.98 Average token count (Krutrim): 44.90 Fertility Score (Sarvam): 1.61 Fertility Score (Krutrim): 2.13 Methodology: Running tokenizer on 100k random hindi sentences.
Tweet media one
1
0
12
@HarveenChadha
Harveen Singh Chadha
6 days
Wow, krutrim just raised $230M
@bhash
Bhavish Aggarwal
6 days
Announcing the @Krutrim AI lab today! While we’ve been working on AI for a year, today we’re releasing our work to the open source community and also publishing a bunch of technical reports. Our focus is on developing AI for India - to make AI better on Indian languages, data scarcity, cultural context etc. Here’s a list of models we’re releasing: - Krutrim 2 and Krutrim 1 LLMs: While Krutrim 1 (India’s first LLM) was launched in Jan 24, it was a basic 7B model. We’re launching Krutrim 2 today as a much improved model. More here: - Chitrarth 1: India’s first Vision Language Model built on top of Krutrim 1 capable of understanding images and documents. More here: - Dhwani 1: India’s first Speech Language Model built on top of Krutrim 1 capable of tasks like Speech translations. More here: - Vyakhyarth 1: State of the art Indic Embedding model for use cases like Search and RAG. More here: - Krutrim Translate 1: State of the art text to text translation. More here: In addition, since there was no global benchmark for Indic performance, we’ve developed “BharatBench” and the technical report is here: We’ve also published a bunch of technical reports and papers here: Also announcing India’s first GB200 deployment in partnership with NVIDIA! Will be live by March and we will make it the largest supercomputer in India by end of year. We’re nowhere close to global benchmarks yet but have made good progress in 1 year. And by open sourcing our models, we hope the entire Indian AI community collaborates to create a world class Indian AI ecosystem. We’re still learning to walk before we can run, hopefully within this year! All our open source work here: Web: GitHub: Huggingface: Also, announcing an investment of ₹2,000 Cr today into Krutrim and a commitment of ₹10,000 Cr by next year!
Tweet media one
2
0
14
@HarveenChadha
Harveen Singh Chadha
8 days
There is nothing like “open-source chinese AI” its like saying I am using open-source French AI (transformers) to infer on my model Open source is just open source,
@mxtaverse
Arjun*
8 days
hosting open-source Chinese AI on Indian servers and selling it at a price to nationalist crowd... ...sounds like the tech equivalent of importing from China, assembling here and slapping a Made In India sticker on it to get PLI benefits
2
3
37