lovish @louvishh profile

lovish

@louvishh

Followers

633

Following

693

Media

33

Statuses

180

phding @ucl and @aiatmeta (llama team). mostly random tweets here.

https://t.co/ym343FEkbc

london

Joined July 2021

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

TEMU • 446492 Tweets

最大5,000円報酬 • 236820 Tweets

Semarang • 215422 Tweets

روما • 203167 Tweets

領空侵犯 • 155196 Tweets

Arlington • 124729 Tweets

Sven • 123805 Tweets

Finale • 34051 Tweets

Scout • 31569 Tweets

#庭ラジ • 28840 Tweets

Switchの後継機種 • 24769 Tweets

ソフトメーカーラインナップ • 24381 Tweets

Rayo Vallecano • 22269 Tweets

Medina • 21958 Tweets

D-1 TO CHANYEOL SOLO • 19199 Tweets

伊藤さん • 17643 Tweets

瑞希誕生日 • 12657 Tweets

瑞希ちゃん

Avianca

有翼の叡智

パイオニア

レガシー

Star Wars Outlaws

Alex Sandro

アマリア

悲嘆禁止

Rich Lowry

津野くん

Dorico

菊池風磨構文

Delpino

Glowtime

Rikishi

MOTHER2

ヒステリックグラマー

菜乃ちゃん

あなたの担当カラー

Silksong

رضي المسلمون

第949回

ブースティング

$SAI

Lowe

Home Depot

Beşiktaşlı Oğuzhan

#ÖzyakupaJübile

ニンダイ

#توظيف_اخصايين_التغذيه

#اسبوع_الكاشباك

#اي_لعبه_20ريال

Last Seen Profiles

@ohmcrae

@Aphrodi_Tee_

@TorbayWilliam

@_RosendoQ

@FredSergent

@7SportsXtra

@samiharf

@uae__D

@AshrafAlruwaily

@UTD_Exile

@ohyesitjiggles

@ahmmedhanii

@FCPSScience

@kazudious

@nothinggel

@Gem_Nodes

@suayipbirinci

@pouriiam

@omagnokarl

@ProxyRotator

Pinned Tweet

lovish

@louvishh

1 month

405b is out! working on llama 3 has been a truly rewarding experience and i'm super grateful to all my teammates! i'm excited to see how the llama models will be used by the community! p.s. - we wrote a paper and not just a tech report 😛

12

19

128

lovish

@louvishh

2 years

exactly 365 days ago, something snapped in my mind, and i finally bought a gym membership. happy then, happier now 🙂

43

32

1K

lovish

@louvishh

2 months

🚨 New Paper 🚨 Evaluations can have a lot of variance, throwing off model comparisons especially during pre-training. In our latest work, “Quantifying Variance in Evaluation Benchmarks”, we explore this phenomenon in depth. A thread [1/n]

3

23

152

lovish

@louvishh

11 months

life update ✨: i’ve moved to london and started my phd at @ucl_nlp and @AIatMeta ! looking forward to collaborating with folks in research and making new friends. if you’re around in the uk and would like to say hi, please feel free to reach out!

5

1

67

lovish

@louvishh

2 years

celebrating iclr acceptance with a swim

17

2

58

lovish

@louvishh

2 years

just remove sugar, cook in olive oil, bake/grill instead of fry. and of course, you can have cheat days too. and most importantly, don’t worry if there are off days. i had quite a few relapses where i was eating anything i wanted to. just pick yourself up and start again 😁

5

0

52

lovish

@louvishh

2 years

another one of my schoolmates is getting married, and here i am, still contemplating if i should apply for a phd.

5

1

35

lovish

@louvishh

2 years

and diet is also a very important part of this process. i tried a bunch of things - keto/intermittent/calorie deficit etc. i found that you don’t have to follow anything extreme and calorie deficit diets are just as good, and in fact more sustainable for long term.

1

0

36

lovish

@louvishh

2 years

since a lot of you’ve been asking, for the workouts, i focused on both strength training and cardio. cardio is important for burning calories and weights for building muscle. my average workout routine is 30 mins cardio + 35-45 mins strength training for 5/6 days a week.

1

0

34

lovish

@louvishh

2 years

working a full time job and still being handed 500 rupaye ka note forcefully by relatives whenever you meet them has to be the most desi thing ever ffs.

4

2

32

lovish

@louvishh

29 days

@giffmana we use our internal evals repository to run all the evals. we did release the inputs/outputs/metrics using our repo for most evaluation tasks (including mmlu) here:

meta-llama/Meta-Llama-3.1-405B-Instruct-evals · Datasets at Hugging Face

huggingface.co

1

28

lovish

@louvishh

1 year

in kigali, rwanda for #ICLR2023 ! hit me up if you would like to talk about nlp, large language models, and optimization! also stop by our poster:

0

22

lovish

@louvishh

1 year

can never do this in blr. 🚴🏼🚴🏼

0

22

lovish

@louvishh

2 years

used to blast loud music in my room during wfh. have to wear earphones in the office like some civilised guy now.

2

0

21

lovish

@louvishh

1 year

after a lot of socializing, information overload, and llm discussions at iclr, it’s time for a solo trip in cape town!

2

0

21

lovish

@louvishh

1 year

officially a xoogler now!

0

21

lovish

@louvishh

1 month

@_xjdr @AIatMeta ngl, the paper writing was too much fun!

1

0

18

lovish

@louvishh

4 months

fixed the fixed fix for llama3

Armand Joulin

@armandjoulin

4 months

Fixed the fix.

6

9

115

0

3

17

lovish

@louvishh

6 months

just got my bike stolen. i guess i am a true londoner now 😭

2

1

16

lovish

@louvishh

2 months

in the bay area this week, who should I meet?

2

0

15

lovish

@louvishh

2 years

saw an indian couple in barcelona eating rotis with fork and knife. i’ve seen everything now.

1

0

14

lovish

@louvishh

4 months

a preview of things to come from all things llama 😎 glad to be working with such an amazing team!

Mike Lewis

@ml_perception

4 months

Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come...

18

97

507

1

0

14

lovish

@louvishh

3 months

neurips deadline over, time for some yolo travel and pre-training runs 🙃

1

0

14

lovish

@louvishh

2 years

deleted instagram a month ago, and now i’ve an urge to shitpost/sadpost on the bird app. the thought of senior folks at work seeing those posts is holding me back 😬😂

2

1

12

lovish

@louvishh

1 year

i should save all the random papers, code, and memes in my bookmarks before this site blows up.

0

1

10

lovish

@louvishh

1 month

here's the paper link:

1

0

10

lovish

@louvishh

1 year

the funny thing is that sequoia india is going to give $10 million to ten teams of three engineers each, instead of just giving $100 million to a single company to train foundation models.

1

0

10

lovish

@louvishh

2 years

perfect sunday morning in blr

1

0

10

lovish

@louvishh

2 months

MMLU performance is at a chance level even after training for 210B tokens for the standard formulation (the model is presented with all the choices and asked to predict the most relevant choice). But MMLU-Cloze gives a better signal during the early stages of the training. [5/n]

1

9

lovish

@louvishh

1 year

the amount of weird looks you get when you ask for a table for one in a restaurant is crazyyy. bro why you judging me, i’m just enjoying my solo trip.

2

0

9

lovish

@louvishh

1 month

@NamanGoyal21 if we extrapolate, looks like it's gonna be 131k gpus 💀

0

9

lovish

@louvishh

2 years

palm, say-can, and dall-e 2. going through the ai research updates this week feels like …

1

0

9

lovish

@louvishh

2 years

anxiety hits on a different level when you see a senior author’s cursor in the paper section you are writing on overleaf.

0

8

lovish

@louvishh

2 years

my family is having bread pakoras with chai for their sunday breakfast and i’ve to be content with my egg white smoothie and oats!? .__. this healthy shit is hard fr 😭

3

0

8

lovish

@louvishh

1 month

what size 👀

Noam Brown

@polynoamial

1 month

GPT-4o mini is out! It's best in class for its size, especially at reasoning.

16

24

245

0

8

lovish

@louvishh

2 years

punjabis my age have only seen either badal or captain as the cm. it’s so refreshing to see bhagwant mann this time around. he used to do comedy in his past life and regularly visited my school in sangrur during annual events and performed standup comedy! fun times 😂

2

0

7

lovish

@louvishh

2 years

@ImZackAdams @OtherGu83695592 +1 to that. just focus on yourself and have fun.

1

0

6

lovish

@louvishh

2 months

We prune samples with low item discrimination and while we find modest improvements in both standard error (a decrease) and monotonicity (an increase), the drift in the estimated accuracy is mildly concerning. [7/n]

1

0

7

lovish

@louvishh

2 months

We track various metrics - seed mean/seed variance/95% CI/monotonicity for the 7B seed runs on both discrete and continuous metrics. We find that tracking continuous metrics is important as they have higher monotonicity and give higher signal compared to discrete metrics. [4/n]

1

0

7

lovish

@louvishh

4 months

@deliprao @AIatMeta i would say the "doesn't challenge the frontier" is not entirely correct. yes, we don't release the 400B+ model for now but it's already on-par with opus/gpt-4 while it's still under training.

1

0

7

lovish

@louvishh

9 months

no offense to gemini, but what's this hokum with chain of thought MMLU, just report the 5-shot numbers lol

2

1

7

lovish

@louvishh

5 months

was planning to go for therapy today but the new mixtral is not gonna benchmark itself 🙃

Mistral AI

@MistralAI

5 months

magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%%3A1337%2Fannounce&tr=http%3A%2F%%3A1337%2Fannounce

274

826

6K

0

7

lovish

@louvishh

2 months

This work was done with my amazing collaborators - @Aaditya6284 , @RylanSchaeffer , Andrew Poulton, @sanmikoyejo , Pontus Stenetorp, @sharan0909 , and @_dieuwke_ . arXiv Link:

Quantifying Variance in Evaluation Benchmarks

Evaluation benchmarks are the cornerstone of measuring capabilities of large language models (LLMs), as well as driving progress in said capabilities. Originally designed to make claims about...

arxiv.org

1

0

7

lovish

@louvishh

1 year

when is this madness gonna end?

1

0

6

lovish

@louvishh

2 months

We try to reduce variance by taking inspiration from item analysis, where we define item difficulty (average score across models) and item discrimination (correlation b/w models’ score on a given point and models’ overall score) for each sample in the benchmark. [6/n]

1

0

6

lovish

@louvishh

2 months

Moreover, using tinyBenchmarks as a cheap evaluation measure during early stages of pre-training does not give an informative signal on 3/3 of the benchmarks we examined due to increased variance. [9/n]

1

0

6

lovish

@louvishh

1 year

quoting from : "we offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." ... someday, i hope to have read all of noam shazeer's research.

1

5

lovish

@louvishh

2 months

We also analyse item response theory, as used by Polo et al. 2024 (tinyBenchmarks). We find that simply using the performance on the 100 samples selected by tinyBenchmarks can lead to large deviations in the mean (7% for ARC-C) and high variance. [8/n]

1

0

5

lovish

@louvishh

3 years

i always use the same first word for every wordle. i guess this was bound to happen someday 😂🤷🏻‍♂️ wordle 206 1/6 🟩🟩🟩🟩🟩

2

0

5

lovish

@louvishh

2 years

dealing with uber drivers in blr

0

5

lovish

@louvishh

2 years

just because ap dhillon dropped a new song doesn’t mean you have to put it in every one of your instagram stories.

1

0

4

lovish

@louvishh

2 months

Benchmark datasets are used for establishing progress with frontier AI models. Any major model release is accompanied by a slew of scores on these benchmarks. Yet, despite their importance, benchmark scores are often regarded as a one-dimensional number. [2/n]

1

0

5

lovish

@louvishh

2 months

We present a deep dive into variance in benchmark scores across 13 popular benchmarks using over 280 models, including fully trained public models as well as a set of 7B models that we trained from scratch, differing only in their initialisation random seed. [3/n]

1

0

4

lovish

@louvishh

9 months

@srush_nlp @sharakelyan probably because the same shot performance did not match gpt4. also the mmlu reporting is interesting (appendix 9.1): 5-shot mmlu did not match gpt4, and i guess that's why they developed the chain of thought based evaluation.

0

1

lovish

@louvishh

1 year

day 1 of iclr and got this notification

0

4

lovish

@louvishh

1 year

p.s. - i have met some incredible startup folks but none of them are working in ai. the current ai startup scene is dominated by people who all migrated from crypto/web3. complete chaos.

1

0

3

lovish

@louvishh

2 years

’twas a good weekend ✨

1

0

4

lovish

@louvishh

4 months

@Teknium1 sonnet is dead 🙂

0

4

lovish

@louvishh

1 month

@billyuchenlin @togethercompute mmlu redux seems low, can you inspect/share the inputs/raw model outputs?

2

0

4

lovish

@louvishh

4 months

the uk doesn't know how to name its holidays. what the hell is a bank holiday?!

0

4

lovish

@louvishh

1 month

@markchen90 patience mark, patience 😛

0

4

lovish

@louvishh

2 years

@Pankajstocks still have those for loose/baggy outfits 😛

1

0

4

lovish

@louvishh

1 year

the only silver lining about living in bangalore is the good people living in this pathetic city. everything else, from traffic to water to infra, is abysmal.

Rasagy Sharma

@rasagy

1 year

No picnics, no photography, no playing sports in Cubbon Park? Bangaloreans are being policed in our most beloved green space. Join the campaign on @Jhatkaadotorg to urge the Horticulture Department to roll back these bizarre rules! RT & sign please! 🙇

19

163

395

0

3

lovish

@louvishh

1 year

sorry, i guess it's peakxvpartners now lol

1

0

3

lovish

@louvishh

9 months

aditya’s the best out there - great researcher, and the most helpful and kind person!!

Aditya Kusupati

@adityakusupati

9 months

📢📢At the last minute, I decided to go on the job market this year!!! Grateful for RTs & promotion at your univ.😇 CV & Statements: Will be at #NeurIPS2023 ! presenting AdANNS, Priming, Objaverse & MADLAD. DM if you are around, would love to catch up👋

2

49

181

0

3

lovish

@louvishh

27 days

@khoomeik openreview is down 🥲

1

0

3

lovish

@louvishh

2 years

the cherry on the top is the badal family and the captain losing their respective seats. total decimation.

0

3

lovish

@louvishh

3 months

Leopold Aschenbrenner

@leopoldasch

3 months

Virtually nobody is pricing in what's coming in AI. I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project. SITUATIONAL AWARENESS: The Decade Ahead

264

877

4K

0

3

lovish

@louvishh

2 years

i’m not crying. you are 🥺😭

Roger Federer

@rogerfederer

2 years

Tomorrow night. My last match. Doubles with @RafaelNadal 💪🏽❤️

2K

26K

217K

0

3

lovish

@louvishh

4 months

@osanseviero thanks for all your amazing work Omar! have a safe flight!

0

3

lovish

@louvishh

1 month

@robdadashi we use the mmlu number from the gemma report because we were getting a lower number using our internal evals. gemma was not following the instructions properly and was using a lot of ** ** to enclose text. and this is 5-shot mmlu, 0 shot was even lower.

1

0

3

lovish

@louvishh

2 years

craving a bhatura with some chhole rn.

0

3

lovish

@louvishh

1 year

@nsaphra @andrewgwils folks probably don’t know this but conference organizers have to submit a list of every single attendee to the ministry of external affairs for approval, which is a big pain. colt organizers in bangalore had to go through this.

0

3

lovish

@louvishh

2 years

@kritipraks @IJCAIconf @AyanMukhrjee @MilindTambe_AI Woohoo 🎉 Congrats!

1

0

3

lovish

@louvishh

2 years

there’s nothing worse than seeing someone with the same name as you doing idiotic stuff online.

0

3

lovish

@louvishh

1 year

looking to transfer volt gym membership in indiranagar. dm me if you want it.

2

0

2

lovish

@louvishh

2 years

@kritipraks very aesthetic ✨ want to borrow some of those books too 🤭

1

0

2

lovish

@louvishh

4 months

@shengs1123 didn't know you were on twitter lol

0

2

lovish

@louvishh

2 months

@LChoshen One obvious difference is reliability is looking at rankings of different "trained-out" models, while we compute the variance across seeds during pre-training of the same model, where there’s no obvious way to establish a ranking.

1

0

2

lovish

@louvishh

3 years

@sansiddh wow … did not expect my friday to start like this.

0

2

lovish

@louvishh

1 month

@jxmnop not everyone at meta uses slurm 😛

0

2

lovish

@louvishh

2 years

@move_4_7 here you go:

lovish

@louvishh

2 years

since a lot of you’ve been asking, for the workouts, i focused on both strength training and cardio. cardio is important for burning calories and weights for building muscle. my average workout routine is 30 mins cardio + 35-45 mins strength training for 5/6 days a week.

1

0

34

0

2

lovish

@louvishh

2 years

@Mizzling_Gaze the first 1/2 months are difficult, but wfh helped a lot in following a consistent routine during that period. and when i started seeing the progress after these initial couple of months, it motivated me to keep going.

1

0

2

lovish

@louvishh

1 month

@eugeneyan not just llama 2, we use synthetic data from intermediate and expert llama 3 models as detailed here:

0

2

lovish

@louvishh

1 year

also, it's quite funny seeing all the rage against sam. i've lived in blr for over an year now, and i have yet to meet someone truly passionate about solving some problem. so, there's a mindset problem, and folks in startups need to worry less about retiring with lots of money.

1

0

2

lovish

@louvishh

29 days

@giffmana ahh sorry about that, out of my hands 🙈

1

0

2

lovish

@louvishh

2 years

best twitter bot 🤌🏻

Gender Pay Gap Bot

@PayGapApp

2 years

In this organisation, women's median hourly pay is 45.1% lower than men's.

26

430

2K

1

0

2

lovish

@louvishh

1 year

@Eepsita my flatmates and i are moving out of a 3bhk in indiranagar (7th main near 100 ft road). let me know if you’d be interested in that. our landlord is quite sweet too!

2

0

2

lovish

@louvishh

2 years

@shaily99 101

0

2

lovish

@louvishh

5 months

@abhi_venigalla @BlancheMinerva can confirm this. the mixtral paper reports 59.7 for arc-c (which is using the normal eval setup) and 85.8 using 25-shot mmlu-style prompt. mistral-7b is 54.3 and 78.5 respectively.

0

1

lovish

@louvishh

3 months

@RylanSchaeffer sorry to hear that :/

0

1

lovish

@louvishh

2 years

@gudda1997 woohoo 🙌🏻