Maxime Rivest 🧙‍♂️🦙 @MaximeRivest profile

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

Followers

360

Following

513

Statuses

1K

Distributor of Open WebUI at https://t.co/eOfBo6QJB7. | I build systems to run 1M prompts/day | 1M+ views on YT | AI understanding, experience, and inspiration for all

Joined January 2018

Don't wanna be here? Send us removal request.

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

13 hours

Finetunes are everywhere.

Haider.

@slow_developer

17 hours

Google Chief Scientist, Jeff Dean "AI now generates 25% of Google's integrated code" Google has already trained a Gemini model on its internal codebase to help developers this doesn’t cover everything, but it improves AI-assisted coding by integrating code into its parameters

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

Wow, got to dive into this one

Paul Couvert

@itsPaulAi

3 days

Wow that's very impressive Zonos is a 100% open source AI model that can clone any voice 🤯 You can basically run it anywhere as it's only 1.6B parameters. Link below

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

I am thinking of applications where one-off questions (prompts) are applied on databases of 1 000 000+ rows. Interpolating many columns into the context of the prompt. So almost nothing seems overkill for my task 😅 My dream final system should, take a user prompt and: 1) Find good row examples for the user to manually classify 10-100 of those. 2) optimized the prompt for a large model 3) apply to ~10000 to create a synthetic dataset 4) fine tune a small model 5) apply the finetune the to whole dataframe.

1

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

Hmmm, I see what you mean. Vllm, announced some optimization for deepseek (and other moe), last week. Could It be that we are, still, really under optimizing it? What I want to see (or try) is a benchmark between sglang and vllm for deepseek v2.5, Llama 405b and deepseek v3/r1. Deepseek should strongly outperform llama because it can process batch faster given it is an moe, right? Also, I would love for the moe to be very modular between domains, it could mean that we could load less experts in vram. I have not seen any study of the moe distribution for each tokens. Do you know if the 'load balancing' of the moe, done at training, kinda guarantees, that it's all over the place?

0

1

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

The era of fine-tune small models is now!!!

Aravind Srinivas

@AravSrinivas

2 days

We’ve post-trained some really good models on top of Llama 3.3 that far surpass 4o-mini and 3.5-Haiku and match 4o and 3.5-Sonnet for answer quality, despite being way cheaper and blazing fast! Users are loving it already based on retention metrics. This is thanks to our work on building our custom inference stack on top of NVIDIA GPUs as well as exploring new hardware like Cerebras!

0

2

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

Yes!! I love big inference. I am just starting to marry it to dspy. From what I have seen in the docs, dspy in more 1-to-1 designed. Makes sense for the model designing the prompts, but less for the model doing the task. Especially when the task is when of those that you mention. Do you know is there is an easy 'big inference' mode?

1

0

1

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

Fine tune Fine tune Fine tune. It's may be the year of Agents and Voice but it is also the year of Finetunes

Ivan Fioravanti ᯅ

@ivanfioravanti

1 day

Apple MLX fine tuning of Qwen/Qwen2.5-7B-Instruct on Italian wine classification. Base model: 62% accuracy Fine tuned model: 82% accuracy 🔥Fine tuning works! 🔥 I still remember 1 year ago when people where telling me prompting + few shots were enough 😂

1

5

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

1 day

@TrelisResearch I was taking the number from this other post they made. Am I miss interpreting median throughput?

1

0

1

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

Modularity is beautiful. Nobody needs help with cooking, mechanics, and coding all at once, in one response. We do not have one website that has all knowledge. We did not write one textbook that cover all topics. Why do we try to make one llm that can discuss everything?

0

1

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

Yet another example that the future is one of small fine-tuned models reached through a dispatcher.

Ivan Fioravanti ᯅ

@ivanfioravanti

2 days

I tested fine tuning of Qwen/Qwen2.5-3B-Instruct with my wine_classification tests and it rocks! From 54% to 70% after 0.5 epochs! I added a new wine_mlx_server_unstructured.py provider without structured output.

0

1

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

@ai_for_success And be free for months

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

Yes, my understanding is that it still gets out at about 50 tok/sec(no?) for a given sequence. Based on my (imperfect) understanding of datacrunch other tables. But my theory is not strong enough to predict this very precisely. That's why I'll run it soon (if I don't find someone doing it soon ) 😅 I focus more on 'shear token' output as this is more important to my application and for things like 'deep research' latency does not count anymore.

1

0

1

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

@TrelisResearch

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

4 days

At full load, 8xh200 gpus, running deepseek r1 can output 2864 tokens/second. This is not quantized (compromised quality) because mixed precision fp8 was the precision use at training.

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

Everything points to small specialized models.

Benjamin Clavié

@bclavie

3 days

What if a [MASK] was all you needed? ModernBERT is great, but we couldn't stop wondering if it could be greater than previous encoders in different ways. Maybe we don't need task-specific heads? Maybe it can do all sort of tasks with only its generative head? Spoilers: Yes

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

@DaveShapi The terminal will let you know

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

2 days

RT @ivanfioravanti: My family AI server has now access to o3-mini, web search through searxng and Flux images generation through Replicate…

0

2

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

3 days

@rasbt Should go for more complexe sciences. AI is fuzzy, and I train on numerical ecology. It does not get more complex than life at that scale.

0

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

3 days

@DaveShapi But also, for inference, data centers are unnecessary. Each village, town, and city can host their on inference server. Heating up the city hall at the same time :D

Maxime Rivest 🧙‍♂️🦙

@MaximeRivest

4 days

A remote village of 1000 family could provide A(G?)I (for free) for all its citizens for 5-10 years for the same price as building a new 1/3 mile of road... the implication of every city doing that would be crazy !!! here is the the proof 🧵

0