Federico Cassano @ellev3n11 profile

Federico Cassano

@ellev3n11

Followers

623

Following

122

Media

5

Statuses

189

Research @cursor_ai Previously: @neu_prl , @scale_AI , @Roblox , @trailofbits Papers here:

https://t.co/PgUSaxXs1B

Boston - SF - Milan

Joined September 2020

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#2024MAMAVOTE • 2759166 Tweets

I VOTE • 1524142 Tweets

EP6 U STEAL MY HEART • 427164 Tweets

#Jin_Happy • 304834 Tweets

JIN IS COMING • 285295 Tweets

Choice - Female Award • 216095 Tweets

Libya • 213435 Tweets

Columbus • 204581 Tweets

HAPPY IS COMING • 191593 Tweets

seokjin • 180562 Tweets

Nobel • 168845 Tweets

KSJ1 IS COMING • 146078 Tweets

MY ROYALTY HEESEUNG DAY • 98394 Tweets

YOU ARE LOVED • 95195 Tweets

Bill Clinton • 78303 Tweets

Daron Acemoğlu • 76970 Tweets

Thanksgiving • 74892 Tweets

#Number_i_1stAnniv • 34323 Tweets

Europeans • 27053 Tweets

Leire • 25154 Tweets

Laken Riley • 22867 Tweets

kesha • 22128 Tweets

Native Americans • 20297 Tweets

Bret • 19971 Tweets

La Oreja de Van Gogh • 17311 Tweets

Amaia • 17304 Tweets

Reece James • 16713 Tweets

Billy Maximoff • 15301 Tweets

Nasser • 12955 Tweets

Europa Clipper • 12665 Tweets

جامعه الجلاله

Caspian

São Januário

Adriana Lima

Esra Erol

Dahlin

ethel cain

Destiny Rising

NICOLA EN HOY

回復体位

第998回

6 Canadian

キャラクター人気投票

$OMNIA

NetEase

LODVG

Geraldo

昭仁さん

#OGMIHARU

#ضي_دف

Last Seen Profiles

@sn_nfl

@ACNH_Mods

@cyb3rops

@AirennorGAMES

@jrvgcupvsc

@tunimagine

@therealDuggers

@RajaColiSange5

@DigitalEmad

@ambitiousstoya

@godbritbrit

@deborahwic69951

@vt_hazel

@hika_ting

@AmeblogKotetsu

@lidyadan

@kojikohji

@KapiltyagiIND

@msaknr

@habbazzade

Pinned Tweet

Federico Cassano

@ellev3n11

6 months

We finally released StarCoder2 Instruct! SC2-Instruct is the very first entirely self-aligned code LLM trained with a fully permissive and transparent pipeline. On benchmarks, we are beating even versions of StarCoder2 trained on GPT-4 distilled data!

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

huggingface.co

6

46

177

Federico Cassano

@ellev3n11

3 months

Happy to share that I'll be taking a break from school to work at @cursor_ai as a research scientist! I'm extremely excited to continue my research on using AI and PL for automating software engineering with such a talent-dense team.

8

1

127

Federico Cassano

@ellev3n11

3 months

@Calcolis @cremieuxrecueil Because we receive environmental feedback, in this Nature's paper, the model doesn't.

3

0

31

Federico Cassano

@ellev3n11

3 months

I'm happy to share that our paper on evaluating LLMs at editing code through natural language instructions (Can It Edit?) has been accepted at @COLM_conf , ranking in the top 5% of all submissions! Please check out our preprint for more:

0

28

Federico Cassano

@ellev3n11

5 months

10 months ago, these were the SOTA results for open-source code generation (from ). How things have changed! Benchmark: HumanEvalPack

2

0

20

Federico Cassano

@ellev3n11

12 days

I'll be at COLM 2024 from Monday to Thursday, presenting my work on evaluating LLMs at code editing. Feel free to DM me if you'd like to meet and chat about automating software engineering, reinforcement learning, test-time compute, or research at @cursor_ai !

1

0

21

Federico Cassano

@ellev3n11

3 months

Llama-3.1 trains on synthetic translations of Python to low-resource languages (e.g., PHP) to improve performance on MultiPL-E! In our work, conditionally accepted to OOPSLA 2024, we present several experiments in this direction:

6

5

18

Federico Cassano

@ellev3n11

16 days

@cristinaz_c In Italy if I want my employee to take home 2000€, I need to pay around 4200€, and it goes up with salary.

1

0

8

Federico Cassano

@ellev3n11

6 months

Also, huge shout-out to @milliondotjs for providing me with compute for a large part of this effort! Truly appreciated.

0

2

7

Federico Cassano

@ellev3n11

4 months

old news but MultiPL-T got (conditionally) accepted at OOPSLA 2024. See you in LA (probably)

1

0

6

Federico Cassano

@ellev3n11

4 months

You can full-parameter SFT Llama-3-70B on 8xH100 with 2TB of RAM. The perfect ratio of optimizer parameters to offload to CPU is 0.75

0

6

Federico Cassano

@ellev3n11

1 month

(I was right, what a shocker) The technique in question:

Pretraining on the Test Set Is All You Need

Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating...

arxiv.org

Federico Cassano

@ellev3n11

1 month

@mattshumer_ @GlaiveAI I think he may be using your super secret technique @RylanSchaeffer

0

1

6

Federico Cassano

@ellev3n11

6 months

It was a pleasure to work on this with @YuxiangWei9 , @JiaweiLiu_ , @YifengDing_ , @StringChaos , @harmdevries77 , @lvwerra , @ArjunGuha , and @LingmingZhang

0

5

Federico Cassano

@ellev3n11

3 months

@natfriedman We pre-trained StarCoder2 on that. I don't think anyone has used it though:

0

5

Federico Cassano

@ellev3n11

1 month

@hughbzhang @evanzwangg @ChowdhuryNeil @squeakymouse777 @vaskar_n @SeanHendryx @summeryue0 quality is fractal

0

5

Federico Cassano

@ellev3n11

28 days

@OfirPress @hughbzhang :(

2

0

4

Federico Cassano

@ellev3n11

3 months

@HanchungLee most of these are artifacts of GPT-4 distillation, not true self-learning.

0

4

Federico Cassano

@ellev3n11

4 months

@corbtt @eugeneyan This is not true for code generation. Reproduced myself (disclaimer, n=1): # LiveCodeBench - GPT-4o - temperature=0 -> 43.5 - temperature=0.8 -> 42.8 - temperature=1 -> 42.2 This is likely because your metric is LLM-as-Judge

0

4

Federico Cassano

@ellev3n11

2 months

@TheZachMueller didn't know of this, amazing

0

4

Federico Cassano

@ellev3n11

5 months

@katieelink Recently there was this:

1

0

4

Federico Cassano

@ellev3n11

3 months

Interestingly, we find that while fine-tuning on natural data (from The Stack) in the low-resource language yields marginal improvements, fine-tuning on these self-translated samples significantly improves the model's performance, in some cases more than doubling it.

0

3

Federico Cassano

@ellev3n11

5 months

@jeremyphoward at this current time, models are able to improve with self-correction if either they are used jointly with a specially-trained discriminator (ORM or PRM), or with environment feedback

Large Language Models Cannot Self-Correct Reasoning Yet

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding...

arxiv.org

0

4

Federico Cassano

@ellev3n11

7 months

commits are to code what videos are to images; there is so much more data out there

1

3

Federico Cassano

@ellev3n11

6 months

@RekaAILabs to say that GPT-4 gets 76.5 on HumanEval is a wild claim

2

0

3

Federico Cassano

@ellev3n11

5 months

Humanity would have so much more free time if NVIDIA decided to shorten CUDA_VISIBLE_DEVICES

0

3

Federico Cassano

@ellev3n11

4 months

@casper_hansen_ I wish people would actually read this paper...

0

3

Federico Cassano

@ellev3n11

5 months

@ArjunGuha @TheZachMueller @Prince_Canuma @winglian @StasBekman @charles_irl I think it's pretty normal when you have diverse samples! Here is a wandb report:

SC-Self-Align 3B

links-cdn.wandb.ai

0

2

Federico Cassano

@ellev3n11

3 months

We utilize compilers to translate function headers and tests from Python to a low-res language, and then we use the LLM to fill-in the body of the function. Before training on the translations, we execute them on the mechanically-translated tests, and only train on passing impls.

1

0

3

Federico Cassano

@ellev3n11

16 days

@kinopee_ai Happy birthday!

1

0

3

Federico Cassano

@ellev3n11

2 years

@dccybersec neither. Go, OCaml or Rust

0

3

Federico Cassano

@ellev3n11

7 months

@abacaj People keep falling for these. Recently, some papers even got prestigious awards at top conferences (ICLR Oral) by using the 67 pass @1 numbers as their baselines. Anyways, people need to move away from humaneval. it's very likely all these models have trained on the solutions.

0

3

Federico Cassano

@ellev3n11

7 months

@Euclaise_ There is little evidence that CoT improves performance for code generation. E.g. see DeepSeekCoder paper (); it barely moves the needle.

DeepSeek-Coder: When the Large Language Model Meets Programming --...

The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and...

arxiv.org

0

2

Federico Cassano

@ellev3n11

18 days

I always knew CUPS was sus

1

0

4

Federico Cassano

@ellev3n11

4 years

National Cyber League coming up next week!

0

2

Federico Cassano

@ellev3n11

4 months

@daniel_d_kang @natfriedman SWE-Bench is in training data though

0

2

Federico Cassano

@ellev3n11

10 months

@bindureddy There have been numerous un-censored open-source models before, this is not the first one.

0

Federico Cassano

@ellev3n11

4 months

@moyix is this from CS?

1

0

1

Federico Cassano

@ellev3n11

5 months

@_akhaliq Or digit-wise tokenizer

0

1

Federico Cassano

@ellev3n11

5 months

@LoubnaBenAllal1 Fineweb-Code, obviously

0

2

Federico Cassano

@ellev3n11

6 months

@natfriedman I think @Muennighoff 's paper showed this! > training LLMs on a mix of NL data and Python data at 10 different mixing rates and find that mixing in code is able to provide a 2× increase in effective tokens even when evaluating only NL tasks.

Scaling Data-Constrained Language Models

The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by...

arxiv.org

0

2

Federico Cassano

@ellev3n11

6 months

@ykilcher Not to be confused with GFlowNet's flow matching!

0

2

Federico Cassano

@ellev3n11

4 months

@SakanaAILabs Reminds me of Self-Taught Optimizer from @ericzelikman

0

2

Federico Cassano

@ellev3n11

5 months

@yacineMTB rust or bust

0

1

Federico Cassano

@ellev3n11

3 months

@justalexoki gotta keep the green garden thriving

0

2

Federico Cassano

@ellev3n11

5 months

@bneyshabur Note though that MATH is a dataset with all problems roughly at the same difficulty level. Not saying that we are hitting a wall nor saying this is not impressive! Just being nitpicky

1

0

1

Federico Cassano

@ellev3n11

6 months

@deliprao everyone has a friend that watches their models train from start to finish

1

0

2

Federico Cassano

@ellev3n11

3 months

@emerywells The guys at @milliondotjs will break that number

0

1

2

Federico Cassano

@ellev3n11

4 months

dumped a whole water bottle on my @system76 lemur pro's keyboard. shut it down, dried it with hair dryer. still kicking strong

2

0

2

Federico Cassano

@ellev3n11

1 year

@dhtikna @ArjunGuha Looks like the table got mixed up, thanks for pointing it out! The Lua 1b model performs much better: 17.3 pass @1 Almost a 2x improvement. We will revise the PDF

0

2

Federico Cassano

@ellev3n11

6 months

@jasontempborn @amanrsanger totally depends on the model and your definition of contamination. 87.8% of the issues and PRs in the benchmark are older than Jan 2023.

0

1

Federico Cassano

@ellev3n11

6 months

@SergioRocks Note how small the spread for Italy is. We really don't value talent back home.

0

1

Federico Cassano

@ellev3n11

10 months

@snagycs FYI hotcrp says: The site is not open for submissions at the moment. When will it go online?

1

0

Federico Cassano

@ellev3n11

4 months

@VJM0N @localghost aged like fine milk

1

0

1

Federico Cassano

@ellev3n11

6 months

@casper_hansen_ @amanrsanger cutoff needs to happen for repository creation, not issue/PR creation

0

1

Federico Cassano

@ellev3n11

1 year

@hardmaru @laion_ai grokking on chess moves!

0

Federico Cassano

@ellev3n11

8 months

@leonardtang_ @GroqInc @anyscalecompute they got the real moat

0

1

Federico Cassano

@ellev3n11

5 months

@TheZachMueller Searx :)

0

1

Federico Cassano

@ellev3n11

6 months

@mSanterre @RekaAILabs most recent eval is 90. just reproduced myself. check the full leaderboard on evalplus:

1

0

1

Federico Cassano

@ellev3n11

28 days

@OfirPress @hughbzhang definitely not statistically significant after ~2022 due to the restricted sample size, but there is some effect.

0

1

Federico Cassano

@ellev3n11

5 months

@LoulyAdam @vikhyatk @LambdaAPI should be trainable with ZeRO 3

0

1

Federico Cassano

@ellev3n11

1 year

@OwainEvans_UK Am I missing something or was this known before? MLPs store facts in one direction:

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language...

arxiv.org

1

0

Federico Cassano

@ellev3n11

2 years

@wcrichton Also Principles of Abstract Interpretation has one! I think it's a very efficient way of learning one specific topic in a large book.

0

1

Federico Cassano

@ellev3n11

5 months

@dylanslack20 congratz!

0

1

Federico Cassano

@ellev3n11

4 months

@cloud11665 selling 8xh100 on pcie gotta be considered fraud

1

0

1

Federico Cassano

@ellev3n11

2 months

@TheZachMueller @cursor_ai I don't work on this part of the stack but I will certainly send this to the guys!

1

0

1

Federico Cassano

@ellev3n11

26 days

@brianryhuang openreview is the purgatory

0

1

Federico Cassano

@ellev3n11

4 months

@diegocalanzone @deedydas There looks to be a label leak, making the results pass @16 rather than pass @1 :

Pass@k or Pass@1? · Issue #1 · trotsky1997/MathBlackBox

After seeing this work, I read the paper and found that the effect is very good. When reading the code, I found that this line of code seems to cause the indicator to degenerate from pass@1 to pass...

github.com

0

1

Federico Cassano

@ellev3n11

7 months

@ylecun n=26, lol

0

1

Federico Cassano

@ellev3n11

4 months

@NeelNanda5 Yes, I suspect that these results apply mostly to PEFT and not full parameter finetuning

1

0

Federico Cassano

@ellev3n11

5 months

@mallocmyheart @mustafa_kh4n incredibly based

0

1

Federico Cassano

@ellev3n11

8 months

@fiveseveny noooooooooooooooooo. what is the pink thing T_T

1

0

1

Federico Cassano

@ellev3n11

2 months

@paulgauthier Paul I love your work, but are you not worried that your evals may be contaminated? Exercism is in training data of lots of models, I even used it as a fine-tuning set a while ago. Interested to hear your thoughts!

1

0

1

Federico Cassano

@ellev3n11

8 months

@GroqInc Internships? I know the right person for this

1

0

1

Federico Cassano

@ellev3n11

7 months

@Jason Pretty nuts to see people don't know how inflation works. If you adjust for CPI you see a pretty flat relationship i think. I've stared at many many CPI curves...

0

Federico Cassano

@ellev3n11

5 months

@mandeepabagga @chrisatgradient @Gradient_AI_ @AIatMeta @huggingface Yeah, currently it's not great at that...

1

0

1

Federico Cassano

@ellev3n11

3 months

@jwilson1717 @Calcolis @cremieuxrecueil Because they have learned a really good reward model from the environment. See actor critic networks.

1

0

1

Federico Cassano

@ellev3n11

1 year

@OwainEvans_UK Also the paper by Kevin Meng and David Bau, ROME:

Locating and Editing Factual Associations in GPT

We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable...

arxiv.org

1

0

Federico Cassano

@ellev3n11

7 months

@StringChaos great stuff :)

0

1