Andi Marafioti Profile
Andi Marafioti

@andi_marafioti

Followers
379
Following
118
Media
16
Statuses
147

🤖 Machine Learning R&D @huggingface | multimodal | 🎧 Sound Engineer | 🇦🇷 in 🇨🇭 | 🌍 ES, EN, FR, DE

Bern, Switzerland
Joined April 2022
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@andi_marafioti
Andi Marafioti
23 days
I'm quite proud of my first 🤗hugging face blog!
6
33
173
@andi_marafioti
Andi Marafioti
1 month
Excited to share I've joined @huggingface 🤗 as a ML Research Engineer part of the multimodal team! Excited to see what amazing new models and datasets we can create for the community!
13
13
220
@andi_marafioti
Andi Marafioti
23 days
🚀 Fine-tune Florence-2 on any task! We are releasing fine tuning scripts for microsoft's Florence-2, alongside with a walkthrough blogspot, a space demo, and a Colab notebook. @mervenoyann @skalskip92 🧵
1
45
192
@andi_marafioti
Andi Marafioti
9 days
GPUs going BRRRR at @huggingface 's open science cluster 🚀 Stay tuned: T-8 days 🥳
Tweet media one
1
10
70
@andi_marafioti
Andi Marafioti
1 month
We added idefics2 and idefics2-chatty to the Unsolvable Problem Detection Leaderboard. 🚀 This benchmark was developed to measure the robustness of VLMs by asking them questions about images that cannot be answered. #MachineLearning #AI #NLP 🧵
1
10
25
@andi_marafioti
Andi Marafioti
1 month
🚀 @argilla_io is joining @huggingface 🤗! Argilla is the leading company in dataset creation with a ton of open-source contributions! Many of their efforts are to enable multilingual LLMs all over the world! Oh, and they also are co-authors of Zephyr ORPO!
0
4
24
@andi_marafioti
Andi Marafioti
22 days
Once you go async you can never go back
Tweet media one
2
1
17
@andi_marafioti
Andi Marafioti
20 days
I've been working on a synthetic dataset and every day I optimize the script. So far I got: Monday -> ETA 12 days Tuesday -> ETA 6 days Wednesday -> ETA 2 days Today -> ETA 4 hours My goal is to make it so efficient that I can 10x the original intended size 🚀🤗
1
1
16
@andi_marafioti
Andi Marafioti
23 days
@Ahmed_Masry97 @mervenoyann @skalskip92 That's awesome! Let me know how it goes! you can start here: You only need to do: python distributed_train.py --dataset cauldron --epochs 5 --eval_steps 20000 Maybe playing a bit with the learning rate would also help :)
0
0
10
@andi_marafioti
Andi Marafioti
23 days
@Ahmed_Masry97 @mervenoyann @skalskip92 You need to consider that we only fine-tuned on DocVQA, a pretty small dataset. The code to fine-tune on The Cauldron is up there and would probably achieve a better performance, but I'm GPU poor :(
1
0
5
@andi_marafioti
Andi Marafioti
26 days
It’s such a privilege to finish a hard work day and go into my garden and gather flowers. I’m so lucky 💞
Tweet media one
0
0
4
@andi_marafioti
Andi Marafioti
1 month
@sabokrou This breaks my heart, I'm sorry you have to go through it.
0
0
1
@andi_marafioti
Andi Marafioti
26 days
Florence-2 fine-tuned on DocVQA excels at retrieving information from images! Just a few final tweaks, and I'll release the code for anyone to fine-tune for their own tasks. Stay tuned! 😊📚
0
2
4
@andi_marafioti
Andi Marafioti
1 month
Apple Intelligence is a huge tailwind for multimodal models. To do 'on-screen awareness' they need to understand image, text, audio, and video. I'm super excited to work on creating these models for the open-source community @huggingface
0
0
4
@andi_marafioti
Andi Marafioti
15 days
@Laz4rz @pixqc @wateriscoding the use cases are particular. Like you don’t want a proper database but you need sql speeds on complex queries. I almost dropped it completely but just a few days ago I thought about using it for the dataset I’m generating. In the end, I got more speed ups out of other things.
1
0
3
@andi_marafioti
Andi Marafioti
15 days
@Laz4rz @pixqc @wateriscoding It allows for sql queries (pandas as well) and it’s pretty fast at it. I don’t think it drops everything into memory but I didnt profile it.
0
0
3
@andi_marafioti
Andi Marafioti
20 days
@SatpalPatawat No links yet, I want to create a powerful instruct VQA dataset.
0
0
3
@andi_marafioti
Andi Marafioti
28 days
@elisagdelope @deep_chem Having done a phd, I get that sentiment so much!
1
0
3
@andi_marafioti
Andi Marafioti
1 month
Overall, these results are fantastic for the open-source community, demonstrating that the gap between closed and open-source models is continuously shrinking! 🌐👏 #OpenSource #AI #Innovation
Tweet media one
1
0
3
@andi_marafioti
Andi Marafioti
1 month
For the Absent Answer Detection task, idefics2-chatty is the best model of its class! It even beats GPT-4o and Gemini pro in the hardest setting! 🏆 #MachineLearning #AI #NLP
Tweet media one
1
0
3
@andi_marafioti
Andi Marafioti
23 days
code: model weights:
1
0
3
@andi_marafioti
Andi Marafioti
28 days
I'm creating question/answer pairs from documents using an LLM. It mostly works well, but the number of questions doesn't scale linearly if I feed the model several pages. I tend to split the pages, but the context becomes less rich. Any ideas?
Tweet media one
0
0
2
@andi_marafioti
Andi Marafioti
23 days
We fine-tuned Florence-2 with a small learning rate and froze the vision encoder. Our experiments ranged from a single A100 GPU setup to a powerful 8 H100 GPU cluster. Showing the potential for small setups 🚀💻
1
0
3
@andi_marafioti
Andi Marafioti
9 days
@PriNova75 @huggingface Right? It's crazy! I'm happy that at least we track it and show it on all jobs we are running to create some awareness among engineers. Also there is some incredible work by @SashaMTL that keeps us in check. But yes, the carbon emissions of ML training are nuts.
0
0
3
@andi_marafioti
Andi Marafioti
1 month
For the Incompatible Answer Set Detection, idefics2-chatty performs exceptionally well, outperforming Gemini Pro and LlaVA-1.6-13B for base type questions, though it falls behind for more complex queries. 📊 #AIResearch #Tech
Tweet media one
1
0
2
@andi_marafioti
Andi Marafioti
23 days
@xhinker Hi Andrew! There’s a bug in the original implementation that doesn’t allow models to be fine tuned. We opened PRs on all the models to fix them, but you need to point the hub to our prs or use the model I uploaded. Did you change the model you were fine tuning?
2
0
1
@andi_marafioti
Andi Marafioti
1 month
@dingchilling @huggingface I’d love to work with music again sometime in the future :) this is a paper I wrote a few years ago:
1
0
1
@andi_marafioti
Andi Marafioti
2 months
Excited to share my journey in Machine Learning and Sound Engineering! 🌟 With 10 years of experience and a PhD, from Argentina to Switzerland, speaking four languages and always learning. Let's innovate together! 🚀🎧 #MachineLearning #SoundEngineering #TechForGood
0
0
2
@andi_marafioti
Andi Marafioti
23 days
Here’s the full blog:
@andi_marafioti
Andi Marafioti
23 days
I'm quite proud of my first 🤗hugging face blog!
6
33
173
0
0
2
@andi_marafioti
Andi Marafioti
28 days
@SanhEstPasMoi @Grammarly I've been using it for two years now, and it has improved my messages and taught me to write better. It's super helpful when it says, 'Remove this for confidence.' My only two quarrels: 1) It constantly wants to delete my emojis :'( 2) Only English :(
0
0
2
@andi_marafioti
Andi Marafioti
1 month
@bigemptyboulder @huggingface I had six rounds of interviews but they were all pretty fair/on point and not very time consuming. Tips for a successful application are to make cool stuff you can showcase such that your profile stands out 🤗🚀
0
0
1
@andi_marafioti
Andi Marafioti
23 days
We fine-tuned the model on DocVQA, a hard benchmark, and it achieves 57% ANSL - outperforming Idefics2 (without fine-tuning) and DeepSeek-VL (with fine-tuning) 🤯
1
0
2
@andi_marafioti
Andi Marafioti
25 days
@mervenoyann @pebkac_roll And we’re open about putting them to good use! If you have an impactful open source project and need some compute power, definitely jump in my dms 🚀
0
0
2
@andi_marafioti
Andi Marafioti
28 days
Goal of the day: Set up a fine-tuning example for Florence-2. 🚀
1
1
2
@andi_marafioti
Andi Marafioti
15 days
I hate whenever I see a post that says “do x or you’ll be left behind”. No, you won’t. Relax, you can afford to lose a trend or two.
1
0
2
@andi_marafioti
Andi Marafioti
1 month
In the Incompatible Visual Question Detection, idefics2-chatty matches the best model so far in the <10B class, Qwen-VL-Chat, but lags behind larger, closed-source models. Still, an impressive feat for open-source! 🎨 #ComputerVision #ML
Tweet media one
1
0
2
@andi_marafioti
Andi Marafioti
15 days
@Laz4rz @pixqc @wateriscoding I tried to use polars on production and it would be a bit buggy in unexpected ways. To me, polars shines when you need to process millions of rows, specially recurrently. It’s an use case that happens in many ml pipelines
1
0
2
@andi_marafioti
Andi Marafioti
11 days
@effemanfredi Yes, the model has some custom code from Microsoft and at some point it sets the model_type to an empty string. It’s just a bug but hard to squash 😅
1
0
1
@andi_marafioti
Andi Marafioti
23 days
@Laz4rz @Nigh8w0lf @mervenoyann @skalskip92 They have this in the paper. Basically they asked annotators to provide three levels of detail
Tweet media one
2
0
2
@andi_marafioti
Andi Marafioti
14 days
I added Lora training to the florence2 fine tuning repo. I was expecting that with Lora I would be able to fit way larger batch size but reducing to 1% of trainable params I can fit 25% more samples per batch. Is that normal?
1
1
2
@andi_marafioti
Andi Marafioti
23 days
0
0
2
@andi_marafioti
Andi Marafioti
1 month
🚀 Exciting news! The MM-UPD benchmark is now live on @huggingface Hub as a leaderboard 🏆. It evaluates how vision language models handle unsolvable problems 🤓. Currently, top VLMs like GPT-4V and LLaVA-Next-34B struggle with it. 🔗 Link in the next tweet!
Tweet media one
1
0
2
@andi_marafioti
Andi Marafioti
21 days
@LukaszBorchmann @mervenoyann @skalskip92 Hi 👋 I don't think a text-only model would achieve anything in this dataset. Where did you see a BERT base performing so well? I'm interested! We didn't target SOTA performance here; performance would improve training on a larger dataset like The Cauldron.
1
0
2
@andi_marafioti
Andi Marafioti
1 month
big thanks to @AtsuMiyaiAM for putting it together!: I'm currently working on submitting idefics2 to it!
0
1
1
@andi_marafioti
Andi Marafioti
28 days
I'm going to build my first space on @huggingface today :D I want to visualize a dataset I'm generating and let other people explore it
1
0
2
@andi_marafioti
Andi Marafioti
26 days
Excellent summary on an incredible new multimodal model. The future is bright!
@mervenoyann
merve
26 days
EPFL and Apple just released 4M-21: single any-to-any model that can do anything from text-to-image generation to generating depth masks! 🙀 Let's unpack 🧶
Tweet media one
10
134
807
0
0
2
@andi_marafioti
Andi Marafioti
8 days
When you realize LLMs are smarter than you...
Tweet media one
1
0
3
@andi_marafioti
Andi Marafioti
1 month
@mervenoyann @huggingface Awesome leaderboard! Interesting to see that there is only one model <10B Seems like there are still some room for improvements in the field :)
0
0
1
@andi_marafioti
Andi Marafioti
25 days
@siddhi12612 Python and rust 🚀
0
0
1
@andi_marafioti
Andi Marafioti
1 month
@MaziyarPanahi @osanseviero @huggingface You'll be the first one to know when we get there <3
1
0
1
@andi_marafioti
Andi Marafioti
26 days
1
0
1
@andi_marafioti
Andi Marafioti
27 days
For the tutorial on fine-tuning Florence-2. Would you be more interested in seeing how to do it with a collab notebook or how to do it with a multi-GPU setup to get better performance?
0
1
1
@andi_marafioti
Andi Marafioti
16 days
Great opportunity to use for free the HF 🤗 cluster for distillation projets!
@osanseviero
Omar Sanseviero
17 days
Given the rise of big LLMs (100B+ params), it would be exciting to see more open-source experiments in distillation🥁 I'm happy to announce a 1000 A100 hours grant for open access work in the LLM distillation space 🤗 Examples: - Distilling an 8B model into <1B model - Writing
11
63
371
0
0
1
@andi_marafioti
Andi Marafioti
1 month
@eliebakouch Congratulations! The first paper can be so daunting, great job!
1
0
1
@andi_marafioti
Andi Marafioti
27 days
@Aniketu89741067 Let’s connect!
0
0
1
@andi_marafioti
Andi Marafioti
26 days
1
0
1
@andi_marafioti
Andi Marafioti
28 days
Here are a number of comparisons. For its size, it's great in captioning, but the larger models perform better. It does best in visual question answering. Large models perform sometimes better, but not always. It's SOTA on Referring Expression Comprehension.
Tweet media one
1
0
1
@andi_marafioti
Andi Marafioti
23 days
@Laz4rz @Nigh8w0lf @mervenoyann @skalskip92 Yeah, they are super different annotations and results. But if you need to fine tune for captioning I would still start from one of those. Try to get a feeling for which one is closest to what you want.
1
0
1
@andi_marafioti
Andi Marafioti
1 month
@nicolayr_ @huggingface Thank you so much 🤗 I will do my best 🚀
0
0
1
@andi_marafioti
Andi Marafioti
27 days
@mervenoyann @huggingface Merve could you save some from me? I’ll come to Paris office in July 🤗
0
0
1
@andi_marafioti
Andi Marafioti
13 days
I launched the first ablations on the new synthetic dataset! Wish me luck 🍀
0
0
1
@andi_marafioti
Andi Marafioti
8 days
Have you tried to answer MMMU questions? They are frigging hard! At least the paper mentions that college experts achieve 76-82% accuracy, so it's not just me that's dumber than an LLM xD
0
0
2
@andi_marafioti
Andi Marafioti
21 days
@A_Sol_R @mervenoyann That’s where Florence comes in 🌈
1
0
1
@andi_marafioti
Andi Marafioti
15 days
Last one that really annoyed me was “if you’re not using these gmaps functions you’ll be left behind”… really?
0
0
1
@andi_marafioti
Andi Marafioti
19 days
That moment when you make your job so efficient that runs at lightspeed, but it crashes the node so you need to artificially throttle it :(
1
0
1
@andi_marafioti
Andi Marafioti
19 days
@bytebelt Congrats man !
0
0
1
@andi_marafioti
Andi Marafioti
23 days
0
0
1
@andi_marafioti
Andi Marafioti
29 days
pip install uv Let that be your last pip command
0
0
1
@andi_marafioti
Andi Marafioti
15 days
@ManuelFaysse Really cool work! Congrats!
0
0
2
@andi_marafioti
Andi Marafioti
26 days
As a software engineer with 10 yoe, I use LLMs to code complex systems all the time. It’s like I have a small army of juniors that code whatever I tell them to and then I only need to review the code and put the pieces together.
1
0
1
@andi_marafioti
Andi Marafioti
8 days
@NONDA30 Thank you !🙏
0
0
1
@andi_marafioti
Andi Marafioti
1 month
@berraksismann @ISCAInterspeech @BussoCarlos Super cool! Where can I find the dataset?
2
0
1
@andi_marafioti
Andi Marafioti
1 month
@_PrasannaLahoti @huggingface Thank you ! Like wise 🤗
0
0
1
@andi_marafioti
Andi Marafioti
22 days
@Laz4rz Whenever you’re limited by I/O. Here it takes some time for me to load the images and submit them to the gpus, so instead of waiting for the images to be loaded, I load them in parallel while I wait for the gpus to finish their tasks
1
0
1
@andi_marafioti
Andi Marafioti
26 days
0
0
1
@andi_marafioti
Andi Marafioti
1 month
@pcuenq @huggingface Thank you Pedro! Glad to be here 🤗
0
0
1
@andi_marafioti
Andi Marafioti
22 days
@xhinker you're running into all the bugs we did 😅 This is pretty much the last one that actually breaks everything
1
0
1
@andi_marafioti
Andi Marafioti
29 days
@txhno So far it's been on par with pip. Like pip-compile, uv generates a platform-specific requirements.txt file (unlike, e.g., poetry and pdm, which generate platform-agnostic poetry.lock and pdm.lock files).
1
0
1
@andi_marafioti
Andi Marafioti
19 days
Dataset is cooked 🚀 Time to let the cluster rest 😴
0
0
1
@andi_marafioti
Andi Marafioti
26 days
1
0
1
@andi_marafioti
Andi Marafioti
1 month
@kazemi_sm @WenhuChen Do you have code for the implementation? I could evaluate idefics2 using our cluster :)
1
0
1
@andi_marafioti
Andi Marafioti
1 month
@realmrfakename @huggingface For now multimodal as in images+text, but I believe the field will naturally evolve to integrate audio and video to the mix 🌈
0
0
1
@andi_marafioti
Andi Marafioti
29 days
Where was uv all these years ❤️? An in-place replacement for pip that is 10-100x faster 🤯. Go ahead and try it, it's amazing:
Tweet media one
2
0
1
@andi_marafioti
Andi Marafioti
27 days
@BuckedUnicorn @Gradio yeah, that's funny
0
0
1
@andi_marafioti
Andi Marafioti
22 days
@teodor_io With async you get to control when each “thread” holds the processor, so race conditions are less likely.
1
0
1
@andi_marafioti
Andi Marafioti
2 months
@haeggee Congrats on finishing the paper !
0
0
1
@andi_marafioti
Andi Marafioti
1 month
Tomorrow I'm getting certified as a climbing guide with the Swiss alpine club! I'm going on a multi-pitch climb with a group and made this nice poster as marketing :)
Tweet media one
0
0
1
@andi_marafioti
Andi Marafioti
22 days
@Laz4rz Yes, if you use torch then this is already optimized in the DataLoader and trainer loops. I'm using HFs llm-swarm () to spawn "servers" with LLMs where I submit tasks. Creating the tasks takes time, and I do it while I wait for the servers to respond.
1
0
1
@andi_marafioti
Andi Marafioti
7 days
@m_chirculescu PaliGemma is a larger model, and it is already pre-trained for VQA, so it does better out of the box. However, you need more computer power to fine-tune PaliGemma on your particular dataset, so I would choose Florence if you need to fine-tune it.
0
0
1