![chansung Profile](https://pbs.twimg.com/profile_images/1337734515033174020/gyVhyP0O_x96.jpg)
chansung
@algo_diver
Followers
4K
Following
12K
Statuses
4K
@GoogleDevExpert for ML and @googlecloud | @huggingface Fellow | @dstackai Ambassador | @MistralAI Ambassador | Researcher | Engineering | Open Source Lover
Daejeon, Republic of Korea
Joined August 2018
Gemini 2.0 Flash is your go-to model for almost every daily tasks with lowest price ever!
Introducing Arena-Price Plot! 💰📊 An interactive plot of price vs. performance trade-offs for LLMs. Frontier efficiency models: 🔹 Gemini-2.0-Flash/Lite by @GoogleDeepMind 🔹 DeepSeek-R1 by @deepseek_ai 🔹 GPT-4o by @OpenAI 🔹 Yi-Lightning by @01AI_Yi 🔹 Ministral 8B by @MistralAI LLM efficiency is accelerating—kudos to the labs driving the frontier!
0
2
5
All new @MistralAI Le Chat Go to the official announcement blog post, and I am sure you will find many Gems there including: Speed: ~1000 words / sec with Flash Answer. High Quality Image generation Multi modality is a thing in AI industry, but for more accurate analysis on document and image, proper OCR model understands it much better. This is what MistralAI does. Also, I am really looking forward to playing with the upcoming feature, "Data connectors and multi-step agents". Its like GPTs and MCP, but it would be the first move on introducing multi agent workflow directly from the Chat platform. Find more and more interesting news below! -
Introducing the all new Le Chat: your ultimate AI sidekick for life and work! Now live on web and mobile!
0
0
3
.@Gradio is evolving from AI demo framework to production ready full-stack frameworks! Let's build more realistic AI use-cases with ease.
🎉 Massive Gradio Update Alert! Two new feature drops that will make building web apps even more fun and awesome 🤩 1️⃣ Starting with Multipage Apps - build complex ML interfaces using extremely simple syntax.
0
2
23
Big News: Gemini 2.0 Models are Here! Exciting updates in the world of Gemini! The Gemini 2.0 Flash model is now officially GA (Generally Available), including for apps and APIs! We also have the release of Gemini 2.0 Flash Lite (Public Preview) and Gemini 2.0 Pro (Exp). Because Gemini is connected to so many services, it's difficult to cover everything in detail. So, I'll focus on a quick summary of performance improvements, and the Gemini App and API aspects. ✦︎ Performance Boosts We've got a new contender: Gemini 2.0 Flash Lite! I suspect this model is positioned to replace the previous 1.5 Flash 8B. However, its speed is reportedly closer to 1.5 Flash, so it might not be a direct 1:1 comparison. It's possible this is a completely different model architecture, simply using the Gemini name. Looking at quality, the 2.0 series performs exceptionally well across well-known benchmarks and on s arenas. One notable change is that Long Context Retrieval capabilities seem to have decreased compared to the 1.5 series. Aside from that, the 2.0 models appear to be a full upgrade from 1.5 in almost every other metric. Here's a rough guide for migrating: 1.5 Flash & 1.5 Flash 8B users: => Consider Gemini 2.0 Flash Lite 1.5 Pro users: => Consider Gemini 2.0 Flash Need a significant boost beyond 1.5 Pro: => Check out Gemini 2.0 Pro (Exp) The arena scores show all 2.0 series models within the top 10! Keep in mind that reasoning-specific arenas are not yet included, so this is primarily an evaluation of their chatting capabilities. Essentially, for general tasks, it feels like there isn't a model currently surpassing Gemini (at least in my experience). It's hard to definitively say "the best" since OpenAI's "o3-mini" isn't listed, but it's reasonable to compare the non-Thinking Gemini 2.0 models to GPT-4o. Google seems heavily focused on developing cost-effective, high-performance, and extremely fast models. This might stem from a different approach to incorporating reasoning. Instead of relying on RL training to make the model itself "think," they might be focusing on adding thinking capabilities after the foundation model is built (based on my interpretation of related Google research papers). ✦︎ API Usability Improvements A new genai SDK has been released for API usage, and pricing has been overhauled for better usability. A key improvement is the unification of Vertex AI and AI Studio API access, along with a more consistent pricing structure. This addresses many previously reported user concerns. Unified Pricing: Instead of separate pricing for image, video, and text inputs, the 2.0 models use a single, integrated pricing model. Token-Based Pricing: Both Vertex AI and AI Studio now use token length for pricing (previously, Vertex AI used string length). Consistent Pricing Regardless of Token Length: The previous 128K input token length pricing tier is gone. Pricing is now consistent regardless of token length (though it's not simply capped at the old 1.5 model's upper limit). Grounding (Google Search) integration is also improved. Instead of being a purely paid feature, there's now a free tier for up to 1,500 requests per day! The separate existence of Vertex AI and AI Studio likely comes down to scale. AI Studio has some rate limits (with AI Studio's backend ultimately being Vertex AI), while Vertex AI offers those limits removed. Think of AI Studio as providing pre-defined policies on top of Vertex AI for a specific segment of users (and it seems to have slightly better pricing). For most situations, AI Studio should be more than sufficient. ✦︎ Finally, the Gemini App All the newly announced Gemini 2.0 models are now available in the Gemini App, and you can also try out the Thinking models! That's a quick overview of Gemini 2.0. For all the details, please refer to the official announcements!
0
1
8
Open-R1 Update (2025/02/03) The update of the Open-R1 project, hosted by @huggingface, to reproduce DeepSeek-R1 at the community level, has been shared. ✦︎ Since building DeepSeek-R1 itself from scratch requires a lot of resources and incurs astronomical amounts of $$, the recipe that has been carried out so far is focused on confirming the possibility of Distillation mentioned in the DeepSeek-R1 report. ✦︎ What is needed for this is the "synthetic data" generated by DeepSeek-R1 and the "GRPO (Group Relative Policy Optimization)" technique that is needed to train the target model through RL. => Since the GRPO implementation was officially merged into Hugging Face's TRL library a week ago, it can be said that all the codes for creating a Distillation model using DeepSeek-R1 are in place. ✦︎ We successfully obtained results similar to the MATH-500 benchmark performance published in the DeepSeek-R1 report by applying the Distillation recipe to the Qwen 1.5/7/14/32B model and the Llama 8/70B model. As mentioned in the report, this is similar to the o1-mini. => This reproduction Open-R1 recipe is available on the GitHub repository, so anyone can use it. ✦︎ It is said that the process of obtaining the "synthetic data" used to train the model to be distilled from the DeepSeek-R1 model was more difficult than expected. This is because the amount of tokens generated by the DeepSeek-R1 model for reasoning is very large, equivalent to 10 sheets of A4 paper on average. The DeepSeek-R1 model itself is very large, and generating this many tokens requires a lot of GPU memory. For this experiment, 32 H100 GPUs were used only for the purpose of generating synthetic data. => Also, since the time it takes to generate tokens is also a significant amount, it is better to minimize GPU IDLE time by generating them in streaming => learning rather than generating them in batches.
0
1
6
Simple Paper Review #5 I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from @HKUniversity, @UCBerkeley, @GoogleDeepMind, and @nyuniversity The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error. The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization. I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.
0
5
5
A brief summary of the o3-mini by @OpenAI After looking through the system card, I don't think the o3-mini model is significantly better than the o1. It is much better than the o1-mini, and it seems that the "mini" class has reached the o1 level of performance. The system card contains interesting and various benchmark scores. I recommend reading it at least once because you can indirectly experience that releasing an ML model requires a lot of effort not only in training the model but also in testing. I know that the o3-mini is good, but since everyone talks about it being good, it's pointless for me to talk about it, so I thought about it from a different perspective. Looking at the system card that records various test scores from GPT-4o, o1-prev., o1, and o3-mini, you can see that the models that show strengths for specific tasks are different. It seems to be mostly o3-mini, but there are cases where o1 is better, or o1-prev. is better than o1, and there are cases where GPT-4o is better. : Of course, the evolution of the "generation" of the model does not mean that it inherits the previous generation's ability for each benchmark. In other words, if you were focusing on "a certain value" when using the previous generation model, you should do a full re-examination of your workflow every time you replace the model. Since there are o3-mini (low) and (high), I thought they released two types of models, but it seems that low is the base and medium/high are simply cases where you have to think more (I'm guessing). The performance of the system card is all recorded based on low (there are also models in versions that are not publicly available). : Third-party reasoning models (e.g. Gemini 2.0 Thinking, DeepSeek-R1, etc.) can also perform significantly better than advertised, if only a means was provided to tell them to "think more". So a fair(?) comparison between the o3-mini model and other models would probably have to be based on low, and only after a means to tell them to think more is introduced.
1
0
2
RT @UnslothAI: Run DeepSeek-R1 (671B) locally on @OpenWebUI - Full Guide No GPU required. Using our 1.58-bit Dynamic GGUF and llama.cpp.…
0
192
0
RT @fchollet: The Keras team at Google is looking for part-time contractors (note: the offer is from *Google*, not me personally, and not N…
0
23
0
Finally reaching toward 3K followers on the @huggingface Hub! Follow me if you are interested to play with some cool Space apps and read some poster style of paper summaries!
0
0
4