Federico Cassano Profile Banner
Federico Cassano Profile
Federico Cassano

@ellev3n11

Followers
623
Following
122
Media
5
Statuses
189
Explore trending content on Musk Viewer
Pinned Tweet
@ellev3n11
Federico Cassano
6 months
We finally released StarCoder2 Instruct! SC2-Instruct is the very first entirely self-aligned code LLM trained with a fully permissive and transparent pipeline. On benchmarks, we are beating even versions of StarCoder2 trained on GPT-4 distilled data!
6
46
177
@ellev3n11
Federico Cassano
3 months
Happy to share that I'll be taking a break from school to work at @cursor_ai as a research scientist! I'm extremely excited to continue my research on using AI and PL for automating software engineering with such a talent-dense team.
8
1
127
@ellev3n11
Federico Cassano
3 months
@Calcolis @cremieuxrecueil Because we receive environmental feedback, in this Nature's paper, the model doesn't.
3
0
31
@ellev3n11
Federico Cassano
3 months
I'm happy to share that our paper on evaluating LLMs at editing code through natural language instructions (Can It Edit?) has been accepted at @COLM_conf , ranking in the top 5% of all submissions! Please check out our preprint for more:
Tweet media one
0
0
28
@ellev3n11
Federico Cassano
5 months
10 months ago, these were the SOTA results for open-source code generation (from ). How things have changed! Benchmark: HumanEvalPack
Tweet media one
2
0
20
@ellev3n11
Federico Cassano
12 days
I'll be at COLM 2024 from Monday to Thursday, presenting my work on evaluating LLMs at code editing. Feel free to DM me if you'd like to meet and chat about automating software engineering, reinforcement learning, test-time compute, or research at @cursor_ai !
1
0
21
@ellev3n11
Federico Cassano
3 months
Llama-3.1 trains on synthetic translations of Python to low-resource languages (e.g., PHP) to improve performance on MultiPL-E! In our work, conditionally accepted to OOPSLA 2024, we present several experiments in this direction:
Tweet media one
6
5
18
@ellev3n11
Federico Cassano
16 days
@cristinaz_c In Italy if I want my employee to take home 2000€, I need to pay around 4200€, and it goes up with salary.
1
0
8
@ellev3n11
Federico Cassano
6 months
Also, huge shout-out to @milliondotjs for providing me with compute for a large part of this effort! Truly appreciated.
0
2
7
@ellev3n11
Federico Cassano
4 months
old news but MultiPL-T got (conditionally) accepted at OOPSLA 2024. See you in LA (probably)
1
0
6
@ellev3n11
Federico Cassano
4 months
You can full-parameter SFT Llama-3-70B on 8xH100 with 2TB of RAM. The perfect ratio of optimizer parameters to offload to CPU is 0.75
0
0
6
@ellev3n11
Federico Cassano
3 months
@natfriedman We pre-trained StarCoder2 on that. I don't think anyone has used it though:
0
0
5
@ellev3n11
Federico Cassano
28 days
Tweet media one
2
0
4
@ellev3n11
Federico Cassano
3 months
@HanchungLee most of these are artifacts of GPT-4 distillation, not true self-learning.
0
0
4
@ellev3n11
Federico Cassano
4 months
@corbtt @eugeneyan This is not true for code generation. Reproduced myself (disclaimer, n=1): # LiveCodeBench - GPT-4o - temperature=0 -> 43.5 - temperature=0.8 -> 42.8 - temperature=1 -> 42.2 This is likely because your metric is LLM-as-Judge
0
0
4
@ellev3n11
Federico Cassano
2 months
@TheZachMueller didn't know of this, amazing
0
0
4
@ellev3n11
Federico Cassano
5 months
@katieelink Recently there was this:
1
0
4
@ellev3n11
Federico Cassano
3 months
Interestingly, we find that while fine-tuning on natural data (from The Stack) in the low-resource language yields marginal improvements, fine-tuning on these self-translated samples significantly improves the model's performance, in some cases more than doubling it.
Tweet media one
0
0
3
@ellev3n11
Federico Cassano
5 months
@jeremyphoward at this current time, models are able to improve with self-correction if either they are used jointly with a specially-trained discriminator (ORM or PRM), or with environment feedback
0
0
4
@ellev3n11
Federico Cassano
7 months
commits are to code what videos are to images; there is so much more data out there
1
1
3
@ellev3n11
Federico Cassano
6 months
@RekaAILabs to say that GPT-4 gets 76.5 on HumanEval is a wild claim
2
0
3
@ellev3n11
Federico Cassano
5 months
Humanity would have so much more free time if NVIDIA decided to shorten CUDA_VISIBLE_DEVICES
0
0
3
@ellev3n11
Federico Cassano
4 months
@casper_hansen_ I wish people would actually read this paper...
0
0
3
@ellev3n11
Federico Cassano
5 months
@ArjunGuha @TheZachMueller @Prince_Canuma @winglian @StasBekman @charles_irl I think it's pretty normal when you have diverse samples! Here is a wandb report:
0
0
2
@ellev3n11
Federico Cassano
3 months
We utilize compilers to translate function headers and tests from Python to a low-res language, and then we use the LLM to fill-in the body of the function. Before training on the translations, we execute them on the mechanically-translated tests, and only train on passing impls.
1
0
3
@ellev3n11
Federico Cassano
16 days
@kinopee_ai Happy birthday!
1
0
3
@ellev3n11
Federico Cassano
2 years
@dccybersec neither. Go, OCaml or Rust
0
0
3
@ellev3n11
Federico Cassano
7 months
@abacaj People keep falling for these. Recently, some papers even got prestigious awards at top conferences (ICLR Oral) by using the 67 pass @1 numbers as their baselines. Anyways, people need to move away from humaneval. it's very likely all these models have trained on the solutions.
0
0
3
@ellev3n11
Federico Cassano
18 days
I always knew CUPS was sus
1
0
4
@ellev3n11
Federico Cassano
4 years
National Cyber League coming up next week!
0
0
2
@ellev3n11
Federico Cassano
4 months
@daniel_d_kang @natfriedman SWE-Bench is in training data though
0
0
2
@ellev3n11
Federico Cassano
10 months
@bindureddy There have been numerous un-censored open-source models before, this is not the first one.
0
0
0
@ellev3n11
Federico Cassano
4 months
@moyix is this from CS?
1
0
1
@ellev3n11
Federico Cassano
5 months
@_akhaliq Or digit-wise tokenizer
0
0
1
@ellev3n11
Federico Cassano
5 months
@LoubnaBenAllal1 Fineweb-Code, obviously
0
0
2
@ellev3n11
Federico Cassano
6 months
@natfriedman I think @Muennighoff 's paper showed this! > training LLMs on a mix of NL data and Python data at 10 different mixing rates and find that mixing in code is able to provide a 2× increase in effective tokens even when evaluating only NL tasks.
0
0
2
@ellev3n11
Federico Cassano
6 months
@ykilcher Not to be confused with GFlowNet's flow matching!
0
0
2
@ellev3n11
Federico Cassano
4 months
@SakanaAILabs Reminds me of Self-Taught Optimizer from @ericzelikman
0
0
2
@ellev3n11
Federico Cassano
5 months
@yacineMTB rust or bust
0
0
1
@ellev3n11
Federico Cassano
3 months
@justalexoki gotta keep the green garden thriving
0
0
2
@ellev3n11
Federico Cassano
5 months
@bneyshabur Note though that MATH is a dataset with all problems roughly at the same difficulty level. Not saying that we are hitting a wall nor saying this is not impressive! Just being nitpicky
1
0
1
@ellev3n11
Federico Cassano
6 months
@deliprao everyone has a friend that watches their models train from start to finish
1
0
2
@ellev3n11
Federico Cassano
3 months
@emerywells The guys at @milliondotjs will break that number
0
1
2
@ellev3n11
Federico Cassano
4 months
dumped a whole water bottle on my @system76 lemur pro's keyboard. shut it down, dried it with hair dryer. still kicking strong
2
0
2
@ellev3n11
Federico Cassano
1 year
@dhtikna @ArjunGuha Looks like the table got mixed up, thanks for pointing it out! The Lua 1b model performs much better: 17.3 pass @1 Almost a 2x improvement. We will revise the PDF
0
0
2
@ellev3n11
Federico Cassano
6 months
@jasontempborn @amanrsanger totally depends on the model and your definition of contamination. 87.8% of the issues and PRs in the benchmark are older than Jan 2023.
0
0
1
@ellev3n11
Federico Cassano
6 months
@SergioRocks Note how small the spread for Italy is. We really don't value talent back home.
0
0
1
@ellev3n11
Federico Cassano
10 months
@snagycs FYI hotcrp says: The site is not open for submissions at the moment. When will it go online?
1
0
0
@ellev3n11
Federico Cassano
4 months
@VJM0N @localghost aged like fine milk
1
0
1
@ellev3n11
Federico Cassano
6 months
@casper_hansen_ @amanrsanger cutoff needs to happen for repository creation, not issue/PR creation
0
0
1
@ellev3n11
Federico Cassano
1 year
@hardmaru @laion_ai grokking on chess moves!
0
0
0
@ellev3n11
Federico Cassano
8 months
0
0
1
@ellev3n11
Federico Cassano
5 months
0
0
1
@ellev3n11
Federico Cassano
6 months
@mSanterre @RekaAILabs most recent eval is 90. just reproduced myself. check the full leaderboard on evalplus:
1
0
1
@ellev3n11
Federico Cassano
28 days
@OfirPress @hughbzhang definitely not statistically significant after ~2022 due to the restricted sample size, but there is some effect.
0
0
1
@ellev3n11
Federico Cassano
5 months
@LoulyAdam @vikhyatk @LambdaAPI should be trainable with ZeRO 3
0
0
1
@ellev3n11
Federico Cassano
2 years
@wcrichton Also Principles of Abstract Interpretation has one! I think it's a very efficient way of learning one specific topic in a large book.
0
0
1
@ellev3n11
Federico Cassano
5 months
@dylanslack20 congratz!
0
0
1
@ellev3n11
Federico Cassano
4 months
@cloud11665 selling 8xh100 on pcie gotta be considered fraud
1
0
1
@ellev3n11
Federico Cassano
2 months
@TheZachMueller @cursor_ai I don't work on this part of the stack but I will certainly send this to the guys!
1
0
1
@ellev3n11
Federico Cassano
26 days
@brianryhuang openreview is the purgatory
0
0
1
@ellev3n11
Federico Cassano
7 months
@ylecun n=26, lol
0
0
1
@ellev3n11
Federico Cassano
4 months
@NeelNanda5 Yes, I suspect that these results apply mostly to PEFT and not full parameter finetuning
1
0
0
@ellev3n11
Federico Cassano
5 months
0
0
1
@ellev3n11
Federico Cassano
8 months
@fiveseveny noooooooooooooooooo. what is the pink thing T_T
1
0
1
@ellev3n11
Federico Cassano
2 months
@paulgauthier Paul I love your work, but are you not worried that your evals may be contaminated? Exercism is in training data of lots of models, I even used it as a fine-tuning set a while ago. Interested to hear your thoughts!
1
0
1
@ellev3n11
Federico Cassano
8 months
@GroqInc Internships? I know the right person for this
1
0
1
@ellev3n11
Federico Cassano
7 months
@Jason Pretty nuts to see people don't know how inflation works. If you adjust for CPI you see a pretty flat relationship i think. I've stared at many many CPI curves...
0
0
0
@ellev3n11
Federico Cassano
5 months
1
0
1
@ellev3n11
Federico Cassano
3 months
@jwilson1717 @Calcolis @cremieuxrecueil Because they have learned a really good reward model from the environment. See actor critic networks.
1
0
1
@ellev3n11
Federico Cassano
7 months
@StringChaos great stuff :)
0
0
1