This is an account for my everyday life and potentially some fringe content about AI/tech.
Will tweet at a higher frequency compared to my main acct. So follow only if you want to know more about my personal life, rants, whatever..
Also thanks to
@agikoala
for the great idea.
Been pretty excited waiting for
@MistralAI
's new paper about how the model is able to beat (in all of our tests) models 3-10x the size.
Sliding Window Attention seems to be the main reason - and it's genius.
Let me explain why it's brilliant and what I understand.
A good summary thread but this is not the "AI research community". This falls under startup tech bro grifter club.
Legitimate people smell this stinky vibe from a mile away.
A story about fraud in the AI research community:
On September 5th, Matt Shumer, CEO of OthersideAI, announces to the world that they've made a breakthrough, allowing them to train a mid-size model to top-tier levels of performance. This is huge. If it's real.
It isn't.
Real AI people don't call themselves weird shit like "AI experts", "AI visionaries"..."AI thought leader"...
They call themselves "member of technical staff", "research scientist" or something like that.
Google was always leading and dominating in LLMs and AI. The only difference in the recent past is that so many clueless people entered the field and spammed their 50 IQ takes all over social media.
Occupational hazard of a ML researcher: I'm starting to track my calories (for better health) and I can't resist the urge to do a taste vs cost (calorie) plot and only eat the foods on the pareto frontier. 😶
The new meta for PhD students is to take a well known benchmark, gpt3.5 turbo it, fine-tune a model to beat some well known incumbents and get an emnlp paper.
Despicable.
Stack ranking of the transformer infra that I have used across the past years
1. T5x + seqio (2021-2023)
2. Mesh tensorflow + seqio (2019-2021)
3. Tensor2tensor (2018-2019)
4. Pax (2022-2023)
5. Fairseq (2019)
6. Megatron-LM
7. Random tf1/tf2 codebases 🥲
8. HF transformers
It sucks being an expert in this ai/LLM hype thing because u cringe so much at horridly misinformed bad takes. It's everywhere. Even VCs are writing their own surveys of AI now wtf??
Someone save me.
lmao. from a professor.
academia is like the worse place to be. no impact and boring AF.
bigtech is the place for impact, startup for upside. but academia? maybe for retirement.
PhD graduates in AI mostly take boring jobs at big tech companies due to short-term monetary incentives.
While understandable to some degree, it's also quite sad to see so many great researchers 'disappear' and give up their talent - join or do your own startup instead!
i have to say that from my time as a phd student to my time at Google and now at a startup, I have never felt like i worked a single day because research/ML is just so much fun.
i have a friend who left one of the LLM war companies to join finance/trading as a ML guy and what they told me it's like finding a nice little meadow to chill at while everyone else is fighting a major war.
The biggest flex of a senior person in AI research is: "I did not contribute so don't include me in the paper."
So many people in google behaved like this. Very respectable!
Meanwhile in other parts of the world....🤨 Folks discuss how to build large teams for many papers 🙃
I think most decent technical people can see through the noise and drama with Gemini image and recognise the technical feat of Gemini 1.5.
Those who can't, not sure if people should really care about what they think.
To all my Google friends: I know this week has been tough with a lot of criticism about Gemini's gaffes.
Just wanted to say I love all of you and am rooting for you. I know everyone means well, and am grateful for your work & eager to see where you next take this amazing tech!
Haha! The silver lining of all the LLM noise is that you get papers like this that you typically wouldn't see in academia. 😂
LLMs are entering the era of "natty or not".
It's really funny to see some senior professors get invited to give talks and they scramble to say something smart about LLMs and generative AI when they have absolutely no experience or clue about what's going on. 🤣
many people don't know but some high profile yet "not so senior" (think L5/L6) RS-es can't really code or technically contribute to projects. they're literally only capable of editing papers and putting their name on many papers by getting involved in many projects.
research/engineering output is all about sequentially consistent & productive actions to move towards a goal.
what i learned is that people could be decent coders but fail to be productive because they have no macro sense of how to move forward at all.
It's actually crazy to think that the top 2 definitive & most incorporated transformer mods (swiglu & mqa/gqa) are both proposed by Noam.
Everything else hasn't a definitive consensus yet,e.g., rope, parallel layers etc.
We're still waiting for the game changer relpos.
Fwiw I wanted to add that I'm pretty proud of myself for today's milestone. You have no idea how painful it was to be tripping all over "git pip conda cuda gpus regular Linux stuff right etc" after I left G. It was so painful. Like learning to walk again after a car accident.
Today marks my first year at Google (DeepMind).
One year ago today, I joined Google Brain as a student researcher and first started working on large language models. During my time as a student researcher, I investigated how larger language models can do in-context learning
peer review & submitting to conferences is kinda like bad RLHF for researchers.
it's like optimizing for strange artifacts/quirks in the review system which makes research inherently wonky.
just put it on arxiv & submit to a conf if you fancy a vacation in that location.
there is no way the 9-5, strict 40 hour 5-day week is a workable schedule for anyone training LLMs or being involved in this.
it's a sport. you have to be on call all the time.
I actually think technical staff deserves to fly business more than business staff. Airlines should rename or make a new class of flights called technical IC class that is more luxurious than current business.
We deserve it man.
2023 was the year I broke free. when i realised there is no end game to publishing papers and increasing citations. i didn't check my eoy citation count. it's pointless.
do research and savour it intrinsically. write code, build together & enjoy the lifestyle.
I have to break my healing mode nice hippo sabbatical because I feel compelled to talk about the grifter drama. 🥲 there's just so many good lessons there the common folk could learn from!
Q: what did you do for your phd?
A: i spent all my time analysing whenever a startup make some tweaks to their API and wrote a paper about it.
closed source orgs are so mysterious now that it seems like a research question altogether to figure out what they are doing.
How is ChatGPT’s behavior changing over time?
If you are developing with LLMs or in this case GPT-3.5 or GPT-4, it's definitely worth taking a look at this report.
There is suspicion in the AI community that models like GPT-4 are changing/degrading in performance and behavior.
Meet Reka Core, our best and most capable multimodal language model yet. 🔮
It’s been a busy few months training this model and we are glad to finally ship it! 💪
Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body
Managed to go down from 80kg to 72kg in <2 mo! 😄😄
1. Eating high protein meals. 3 eggs to replace 1 meal a day and spamming protein shake whenever hungry.
2. Some exercise. Hiked with
@swyx
and tried random shit like rowing
3. Unf*king my sleep cycle.
Feeling better!
realised ive gained so much weight ever since becoming a dad. late nights, increased workload, eating more and shifting priorities to care for the baby etc. need to start going back to a healthier life or else i'll "scale up" to no return 🫤😐
1/n Have you ever wondered why decoder-only Transformer models like GPT-4 have dominated over other Transformer models like encoder-only (ex: BERT) or encoder-decoder models (ex: Flan T5)? What is the intuitive explanation for this?
To understand their supremacy, consider how
Our latest model Inflection-2.5 () is not bad. In fact, it was the ~4th best publicly "known" models when it was released in early March. And it was created by our pretraining team of < 15 people!
2/
New paper from
@RekaAILabs
🔥 (yes an actual paper).
This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today.
The fun part here is that we constructed it by trying to
Random thought I had today: if only given model weights (i.e., runnable model) and no other details, is it possible to determine how many tokens the model has seen during pretraining?
People tout open source like some kind of balancing act to powerful frontier labs. This is romanticised and dramatized.
Open source has not made any real impact except wagging their tails and drooling their tongues waiting for zuck to drop llama crumbs.
despite recent progress and endless cheerleading, open-source AI is a worsening investment for model builders, an inferior option for developers and consumers, and a national security risk. I wrote about the closed-source future of foundation models here
Sorry startup bros, hacking together a quick and dirty oai wrapper sass and screaming random things like arr, mrr, product market fit is not something respectable at all. Even the rest and vest folks at big co deserve more respect.
many of the research breakthroughs come from google, which codebases and infra are light years ahead of most other companies (and universities).
given this, i would think that a huge number of research breakthroughs are close to production ready the moment they are born.
an underrated benefit of doing research is having freedom to create your own process, and only where necessary; no one forces you to optimize or organize
some of the biggest research breakthroughs have come from codebases so chaotic they would make a level 5 FAANG engineer faint
Of all LLM infra I've used both inside and outside Google..have to say t5x + seqio is still miles ahead of everything out there. I have a soft spot for the deprecated mesh tensor flow too.
sometimes i wonder to myself if i should buy like a console and start playing games or something then i remember: being an ai researcher / engineer is just like playing games for work all day long.
I got literally 100 pings today to tell me a llm was named after me, or that I have become a llm. 🥲🥲🥲
Now my first name and last name have all been taken by chatbots. Happy now?
There's no such thing as seniority in this age of AI. It's either you contribute or you don't. No empire building, medal collecting, title amassing whatever. It's just all about what code did you write? And which LLM did you contribute to?
PSA: Seriously, the meta for PhD students in under prestige/visible universities is to gain the respect/mentorship from just 1 semi-visible industry RS at a top lab & coauthor 1 good work together. Not publishing more papers with your unknown advisor that no one will read.
Whenever asked I still tell people my job is a research scientist (technically true)
I somehow cannot seem to identify as an entrepreneur.
Research scientist vibes well with me. Making money is cool only when you do it in a cool way.
thoughts:
1. no one cares about papers accepted at confs anymore. mainly just for fun/vacation.
2. citations are more important but not that important too.
the new meta is just being in the place to work on a frontier model. (e.g., gpt4, gemini etc).
trumps everything else.
human evaluation results just came in and today we hit a nice model performance milestone at reka. its one of the model goals we've had since starting the company.
You can be doing objectively well on certain dimensions in life but if your internal reward model doesn't align you'll end up just hating yourself or feeling something is off.
Aside from incumbents (gdm, oai, anthropic) I think only a few teams managed to train strong models. Xai, inflection, character, mistral and us (reka). Meta will depend on how llama3 lands. Everything else is NPC tier.
Being new to the startup world ive always treated raising money as a measure of success.
Today I learned that you can raise 1.3B and still fail spectacularly. 🙃
Big conundrum of benchmark creators.
- make full private = tough adoption
- private and public = everyone reports public and ignores private
- fully public - dataset gets totally wrecked by researcher descent.
2nd option eventually becomes the 3rd.
What's the way out?
A sad truth about evaluation is that:
If you make a private test set for your benchmark, people just won't adopt it. We have our official MMMU private test set hosted in EvalAI (), but everyone is still reporting validation score. I found it's similar for
people ask me why edge is trained on 4.5T tokens and flash on 5T tokens.
the simple reason could be that our 7b job died somehow at 4.5 tokens and I was lazy to restart it.
could be as simple as that. don't over think stuff!
Google presents Training LLMs over Neurally Compressed Text
- Outperforms byte-level baselines by a wide margin
- Worse PPL than subword tokenizers but the benefit of shorter sequence lengths
If you use up all your compute all the time and maximize information gain each time, it's an effective strategy to becoming a productive ML researcher.
Maybe GPUs and cuda being bad is a feature not a bug 😶🙃. Definitely tons of suffering. Maybe by design.
I sure became more resilient after starting to use these. 🥹🥹
Jensen Huang, $NVDA, CEO: “Resilience matters in success. I don’t know how to teach it to you, except for: I hope suffering happens to you .. because .. greatness comes from character. And character isn’t formed out of smart people. It’s formed out of people who suffered.
I'm sick and tired of people pitching me alternative AI. The only AI you should care about are LLMs and generative models.
Everything else is frankly noise and mostly crap.
tech people: member of technical staff 🫡👌
non tech people: self-inflates titles into some random director/vp on linkedin or something. seen this self-appointment happening so much 😂🤦♂️
What kind of person opportunistically names their paper "Sora". It's a survey paper damnit! Did you hope to be cited or get attention by some mix up?
Do you have negative self respect?
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and
Overheard from a friend "it's funny that Yann lecun said he published 80 papers since 2022. I checked his google scholar and none of them are actually good". 🥴🙃😛
was i happier in 2017, as an unknown grad student. going to confs like an invisible entity. no expectations of career, money whatever.
i was making a grad student stipend then, poor AF but something felt so peaceful about the days back then.
realised ive gained so much weight ever since becoming a dad. late nights, increased workload, eating more and shifting priorities to care for the baby etc. need to start going back to a healthier life or else i'll "scale up" to no return 🫤😐