MaximeRivest Profile Banner
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™ Profile
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™

@MaximeRivest

Followers
360
Following
513
Statuses
1K

Distributor of Open WebUI at https://t.co/eOfBo6QJB7. | I build systems to run 1M prompts/day | 1M+ views on YT | AI understanding, experience, and inspiration for all

Joined January 2018
Don't wanna be here? Send us removal request.
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
13 hours
Finetunes are everywhere.
@slow_developer
Haider.
17 hours
Google Chief Scientist, Jeff Dean "AI now generates 25% of Google's integrated code" Google has already trained a Gemini model on its internal codebase to help developers this doesn’t cover everything, but it improves AI-assisted coding by integrating code into its parameters
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
Wow, got to dive into this one
@itsPaulAi
Paul Couvert
3 days
Wow that's very impressive Zonos is a 100% open source AI model that can clone any voice 🀯 You can basically run it anywhere as it's only 1.6B parameters. Link below
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
I am thinking of applications where one-off questions (prompts) are applied on databases of 1 000 000+ rows. Interpolating many columns into the context of the prompt. So almost nothing seems overkill for my task πŸ˜… My dream final system should, take a user prompt and: 1) Find good row examples for the user to manually classify 10-100 of those. 2) optimized the prompt for a large model 3) apply to ~10000 to create a synthetic dataset 4) fine tune a small model 5) apply the finetune the to whole dataframe.
1
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
Hmmm, I see what you mean. Vllm, announced some optimization for deepseek (and other moe), last week. Could It be that we are, still, really under optimizing it? What I want to see (or try) is a benchmark between sglang and vllm for deepseek v2.5, Llama 405b and deepseek v3/r1. Deepseek should strongly outperform llama because it can process batch faster given it is an moe, right? Also, I would love for the moe to be very modular between domains, it could mean that we could load less experts in vram. I have not seen any study of the moe distribution for each tokens. Do you know if the 'load balancing' of the moe, done at training, kinda guarantees, that it's all over the place?
0
0
1
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
The era of fine-tune small models is now!!!
@AravSrinivas
Aravind Srinivas
2 days
We’ve post-trained some really good models on top of Llama 3.3 that far surpass 4o-mini and 3.5-Haiku and match 4o and 3.5-Sonnet for answer quality, despite being way cheaper and blazing fast! Users are loving it already based on retention metrics. This is thanks to our work on building our custom inference stack on top of NVIDIA GPUs as well as exploring new hardware like Cerebras!
0
0
2
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
Yes!! I love big inference. I am just starting to marry it to dspy. From what I have seen in the docs, dspy in more 1-to-1 designed. Makes sense for the model designing the prompts, but less for the model doing the task. Especially when the task is when of those that you mention. Do you know is there is an easy 'big inference' mode?
1
0
1
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
Fine tune Fine tune Fine tune. It's may be the year of Agents and Voice but it is also the year of Finetunes
@ivanfioravanti
Ivan Fioravanti α―…
1 day
Apple MLX fine tuning of Qwen/Qwen2.5-7B-Instruct on Italian wine classification. Base model: 62% accuracy Fine tuned model: 82% accuracy πŸ”₯Fine tuning works! πŸ”₯ I still remember 1 year ago when people where telling me prompting + few shots were enough πŸ˜‚
1
1
5
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
1 day
@TrelisResearch I was taking the number from this other post they made. Am I miss interpreting median throughput?
Tweet media one
1
0
1
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
Modularity is beautiful. Nobody needs help with cooking, mechanics, and coding all at once, in one response. We do not have one website that has all knowledge. We did not write one textbook that cover all topics. Why do we try to make one llm that can discuss everything?
0
0
1
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
Yet another example that the future is one of small fine-tuned models reached through a dispatcher.
@ivanfioravanti
Ivan Fioravanti α―…
2 days
I tested fine tuning of Qwen/Qwen2.5-3B-Instruct with my wine_classification tests and it rocks! From 54% to 70% after 0.5 epochs! I added a new wine_mlx_server_unstructured.py provider without structured output.
0
0
1
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
@ai_for_success And be free for months
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
Yes, my understanding is that it still gets out at about 50 tok/sec(no?) for a given sequence. Based on my (imperfect) understanding of datacrunch other tables. But my theory is not strong enough to predict this very precisely. That's why I'll run it soon (if I don't find someone doing it soon ) πŸ˜… I focus more on 'shear token' output as this is more important to my application and for things like 'deep research' latency does not count anymore.
1
0
1
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
4 days
At full load, 8xh200 gpus, running deepseek r1 can output 2864 tokens/second. This is not quantized (compromised quality) because mixed precision fp8 was the precision use at training.
Tweet media one
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
Everything points to small specialized models.
@bclavie
Benjamin ClaviΓ©
3 days
What if a [MASK] was all you needed? ModernBERT is great, but we couldn't stop wondering if it could be greater than previous encoders in different ways. Maybe we don't need task-specific heads? Maybe it can do all sort of tasks with only its generative head? Spoilers: Yes
Tweet media one
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
@DaveShapi The terminal will let you know
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
2 days
RT @ivanfioravanti: My family AI server has now access to o3-mini, web search through searxng and Flux images generation through Replicate…
0
2
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
3 days
@rasbt Should go for more complexe sciences. AI is fuzzy, and I train on numerical ecology. It does not get more complex than life at that scale.
0
0
0
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
3 days
@DaveShapi But also, for inference, data centers are unnecessary. Each village, town, and city can host their on inference server. Heating up the city hall at the same time :D
@MaximeRivest
Maxime Rivest πŸ§™β€β™‚οΈπŸ¦™
4 days
A remote village of 1000 family could provide A(G?)I (for free) for all its citizens for 5-10 years for the same price as building a new 1/3 mile of road... the implication of every city doing that would be crazy !!! here is the the proof 🧡
0
0
0