Food $200
Data $150
Rent $800
Substack newsletters $3600
Utility $150
Somebody who is good at the economy please help me budget this. my family is dying
@JayaGup10
gonna start an "AI daycare" where sf girls drop off their bf's: they can all talk to each other and get the AI out of their system. it's free, we just take 5% of any company they start
This demo uses Gazelle, the world's first public LLM with direct audio input. By skipping transcription decoding, we save time and can operate directly on speech - with inflection, tone, emotion, etc.
🔊Excited to ship the latest iteration of Gazelle - we now process and reply to spoken commands, with <150ms latency. This model shows strong reasoning capability and effectively generalizes to new tasks.
Want to try the demo? Thread!
🔊Excited to ship the latest iteration of Gazelle - we now process and reply to spoken commands, with <150ms latency. This model shows strong reasoning capability and effectively generalizes to new tasks.
Want to try the demo? Thread!
update on harvey:
they just announced $100m in new funding from insiders (including all the best firms in the valley).
that’s a much more powerful signal than anything i/the harvey competitors who have run with my prior tweet have to say.
I've built and worked on trillion scale infra. The number one performance lesson is always to _reduce variance_ first -- simpler architectures with fewer components will always win.
The ASR-LLM-TTS cascaded systems will never be viable.
one time I interviewed a Twitter ML influencer and they bombed a leetcode easy. for a long time, my takeaway was 'those who can do, those who can't poast' but perhaps it should have been 'publicity is the best way to get roles you're unqualified for'
Very excited to see OpenAI launch voice-to-voice: humanlike conversation is one of the most important problems of our time, and simpler architectures win.
Nowadays, few people publish how cutting-edge models work. Here’s my explanation for the end-to-end approach, some thoughts
Instruct tuned 405B sets SOTA for MMLU-Pro (!) Still seems to be a bit behind 3.5 Sonnet on the other hard evals, but very much in the same ballpark. Can't wait to vibe-check it.
Also notably -- new license () removes the prohibition on using Llama 3 to
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!
We can tier out models by price / 1M output tokens.
O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
👀👀found a hidden Microsoft page with a new TTS model, VALL-E 2, which claims to 'achieve human parity for the first time.'
AFAICT the paper is not out but description/snippets released 2024-05-20.
quick Llama-3 8B throughput testing with different GPU's on
@modal_labs
- cheaper than dedicated providers, YMMV, script/method in replies
TLDR: H100 is much more cost efficient than A100. fp8 kv-cache is ~5% more throughput.
After that, it's bog standard optimization work - our implementation is very close to if not SOTA for multimodal LLM inference, and close to theoretical maximums.
With an H100, we expect this experience to be <300 ms - below human reaction time!
Obviously this particular model is undertrained and there's a lot of room for improvement, but I'm very confident this is the future of voice AI.
What would you do with truly real-time and empathetic chat?
I work in big tech. A name you have heard of and probably used before.
Instead of performance reviews, my boss handed me a business card with threes shapes. I'm now on an island with the other engineers and only one of us can make staff. Wish me luck!
LLMs are officially in the web3 era
- undergrads prominently flaunting Stanford creds
- non technical guys writing white papers brought on to "promote and market"
- copy pasted code and weights
Only thing missing was exit liquidity
So sad to hear the news ()😰. The conclusion of our investigation:
1. Llama3-V can be run using MiniCPM-Llama3-V 2.5's code and config.json after changing param names
2. It behaves similarly to MiniCPM-Llama3-V 2.5 in unrevealed experimental features
🔊 I'm releasing a research preview of Gazelle, a unified speech-language model. It's not actually good (yet!) but already does tasks no other model can do - including even Gemini, Claude or GPT4. Thread!
@soumithchintala
@SkyLi0n
a googler's response (paraphrased): "nowadays we just copy-paste code. no need for unit tests, python typing, or readability review"
feels short term positive long term negative
I trained a joint speech-language model that you can *talk* to - for less than the price of a Chipotle bowl. Why I think this is the future of conversational AI and where we go from here: 🧵
I read the Thiel social network pitch deck so you don’t have to. Prediction: the
@deadspin
alumni blog this weekend will get more engagement than Column ever will
once I was a data scientist and got good at it and became a PM (playing google docs)
then I was an ML engineer and got good at it and became a tech lead (playing google docs)
now I am a founder and am getting better at it, and once again, I just play google docs (and linkedin)
Meta trained a E2E speech experience with Llama 3.1 - pretty cool! This should equal real-time speech response.
Audio encoder + adapter + LLM = audio in, text out
Custom TTS model uses LLM embeddings to condition output - IMO elegant to stay in latent space and avoid phonemes.
Today I’m thrilled to announce
@Lux_Capital
's NYC AI Directory & NYC AI Map - 2 resources for the burgeoning AI talent ecosystem
READ MORE👇
NYC AI Directory:
NYC AI Map:
Compared leaked Llama 3.1 benchmarks with other leading models, very excited for the release!
We can tier out models by price / 1M output tokens.
O($0.10): 4o-mini and <10B param models. I think 4o-mini will still be best but a strong local 8B will unlock lots of applications.
The best part of doing a startup is getting to choose the right thing over the prettiest thing.
The hardest part is saying no to everyone who just wants the pretty thing.
@HipCityReg
@pmarca
Every TMT MD is forwarding that article to their analysts and associates, saying “look, A16Z is becoming like us. Please don’t leave”
these dudes are so cool. "10M context Gemma" - but no eval results and nobody on the Github or HF has managed to run the code properly. key parts of implementation are "left to the reader."
55k people downloaded this model and not a single positive thing to say?
Introducing Gemma with a 10M context window
We feature:
• 1250x context length of base Gemma
• Requires less than 32GB of memory
• Infini-attention + activation compression
Check us out on:
• 🤗:
• GitHub:
• Technical
"What's a nice girl like you still doing on the market?" - guy browsing streeteasy, trying to figure out what's wrong with an apartment "listed 3 days ago"
This is starting to get into weird territory: added a markdown renderer to the cells and more LLM backends
I intend to open source this (eventually), but come try it out:
Inference via Llama 3.1-8B or 4o-mini is included for free
real spreadsheet, streaming autofill, custom in-line commands, time to do some real work :)
I made this so I could generate and clean lots of high quality synthetic data quickly, and sheets are the nicest UX for that.
HuBERT operates at 50hz (tokens/sec); other labs have reported high quality audio reconstruction is very difficult with fewer than 50hz. This is an unusable token rate: just 1 minute of audio equates to 3k tokens, requiring tons of memory and slower inference.
OpenAI’s new ‘head
My take on useful AI products:
The useful AI apps today are new interfaces or copilots to old interfaces. These can be great businesses but not great venture bets (too high risk or low cap). The "good" bets are full automation agentic plays, because of the upside, but the models
Was inspired by the
@AnthropicAI
test case generator, so I made my own AI spreadsheet. Given some existing values, can we 'fill in the blanks' in a new row? Yes!
This feels like a super intuitive experience to me - what do you think?
If you didn’t study 80 hrs a week from age 13 to 22, you won’t have a good enough GPA, from a good enough school, to have the privilege of working 80 hrs a week for a VC
The new models apparently run on HGX H200. In FP8, batch 1, perfect MBU, you can serve up to a 20B dense model at 200 tok/s. With MoE, maybe this is like 40B active params or 80B actives? (not as familiar with MoE inference math).
the latency of the new gpt-4o model is insane - its running at close to 180 - 200 tokens per second for text output. time to first token is also near instantaneous with thousands of input tokens. openai did an amazing job - congrats to the team.
Broke: dunking on "an ml engineer turned VC" who doesn't understand basic math
Woke: pitching them your worst startup ideas because they don't understand basic math
I used to think the quintessential tech experience was working in SF, surrounded by hip startups, now I understand it to be not working, to be at Barry's midday, to write a blog for 20 readers. At least I got one of those down!
@maksym_andr
Unfortunately I believe the authors have been confused by the ChatGPT ui and simply saw the result of Whisper ASR, NOT anything to do with the audio input modality.
I've replicated figure 7 here using text only input. Speaking gets the same result because... it's just
Astrology doesn’t work, but machine learning might.
Suppose you are Facebook or LinkedIn. You have a massive database of life histories. So you could probably do a decent forecast of where a 30-year-old with X job in Y city is likely to be in 5 years, using similar profiles.
LLMs are officially in the web3 era
- undergrads prominently flaunting Stanford creds
- non technical guys writing white papers brought on to "promote and market"
- copy pasted code and weights
Only thing missing was exit liquidity