![kourosh hakhamaneshi Profile](https://pbs.twimg.com/profile_images/1591885372664913920/uSAe3nOG_x96.jpg)
kourosh hakhamaneshi
@CyrusHakha
Followers
889
Following
2K
Statuses
707
ML engineer @anyscalecompute 💻 prev PhD, EECS, @UCBerkeley 👨🎓
California, USA
Joined September 2010
RT @askalphaxiv: We used Gemini 2 Flash to build Cursor for arXiv papers Highlight any section of a paper to ask questions and “@” other p…
0
169
0
Cursor basically taught Microsoft the true potential of their original copilot concept. The evolution of copilot, before and after the emergence of cursor is like day and night.
Today, we are infusing the power of agentic AI into the GitHub Copilot experience, elevating Copilot from pair to peer programmer 🤖 (1/4)
0
0
5
RT @robertnishihara: Join our @raydistributed meetup next Thursday at the @BytedanceTalk Bay Area headquarters along with @nvidia. We'll be…
0
5
0
We are going global :-)
Anyscale is expanding to India! We're opening our first international office. Come work with us to get this office off the ground (DM @jaikumarharikoa).
0
0
1
RT @robertnishihara: We're expanding to India and building a small elite team. If you want to be part of the founding team here, DM me.
0
6
0
Some of my earlier attempts on llama-1B-instruct did not show similar behaviors i.e they didn’t improve eval metrics and there was no outstanding emerging behavior that I noticed. Many ablations are needed to understand the root cause of these behaviors and open source community is already investigating the impact of these design choices on capability and generalization: particular choice of RL algorithm, size of the initial model, whether it should be instruct tuned or not, mixture of prompts used during RL, reward engineering etc. It’s fascinating to see the power of open source once again.
0
0
0
RT @Alibaba_Qwen: The burst of DeepSeek V3 has attracted attention from the whole AI community to large-scale MoE models. Concurrently, we…
0
2K
0
RT @vllm_project: 🚀 With the v0.7.0 release today, we are excited to announce the alpha release of vLLM V1: A major architectural upgrade w…
0
98
0
I still cannot wrap my head around the fact that pure RL can cause emergent behaviors like self-reflection with use of words such as "hmm, wait..." , "umm". There must be a better explanation in the prior of the base model that RL is applied to.
There are some intriguing similarities between the r1 chains of thought and the o1-preview CoTs shared in papers and blog posts (eg . In particular, note the heavy use of the words "wait" and "alternatively" as a transition words for error correction and double-checking.
0
0
0
Reproduction of key ideas in reducing overthinking in reasoning models: Key enabler is a contrastive preference tuning like SimPO algorithm and collecting a small amount of data (10k samples) and a fairly simple pair construction trick.
1/5 ⚡️Presenting Sky-T1-32B-Flash⚡️, our open reasoning model that tackles "overthinking" to cut generation lengths (and inference cost!) by 50% without sacrificing accuracy – tuned with only $275! 📊Blog: 🏋️♀️Weights:
0
0
5
RT @NovaSkyAI: 1/5 ⚡️Presenting Sky-T1-32B-Flash⚡️, our open reasoning model that tackles "overthinking" to cut generation lengths (and inf…
0
18
0
Big take-away from this work is that if you have high quality traces of tree of thought, simple SFT can lift up the reasoning capabilities very effectively. These traces do not just include happy paths of jumping straight to the chain of thought leading to the answer; they also include self-reflection, backtracking etc all in-context effectively teaching the model how to course-correct if it made a mistake. Next step in open source research is how to generate these tree of thought traces independent of a teacher model (qwq in this case). This is the step that requires RL with search combo. Looking forward to working with @NovaSkyAI team on answering some of these questions.
1/6 🚀 Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450! 📊Blog: 🏋️♀️Model weights:
0
0
4
RT @NovaSkyAI: 1/6 🚀 Introducing Sky-T1-32B-Preview, our fully open-source reasoning model that matches o1-preview on popular reasoning an…
0
251
0
In my opinion, figuring out how to scale the process reward model and how to augment it with expert human experts, has the most weight in figuring our o1's RL training / inference.
There's a lot of confusion about o1's RL training and the emergence of RL as a popular post-training loss function. Yes, these are the same loss functions and similar data. BUT, the amount of compute used for o1's RL training is much more in line with pretraining. The words we use to describe training are strained already, but o1 may be better viewed as next-token pretraining, rl pretraining, and then some normal post-training.
0
0
1
Love nice, practical, noiseless work.
I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵
0
0
3