Wonmin Byeon Profile
Wonmin Byeon

@wonmin_byeon

Followers
971
Following
121
Statuses
58

Researcher

California
Joined March 2020
Don't wanna be here? Send us removal request.
@wonmin_byeon
Wonmin Byeon
8 months
Here is our new 8B Mamba-based Hybrid LLM: Higher MMLU compared to the 8B transformer and long context extension up to 128K sequences.
@ctnzr
Bryan Catanzaro
8 months
A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less
Tweet media one
4
6
36
@wonmin_byeon
Wonmin Byeon
27 days
@BeeAGass It's open now. Sorry.
0
0
1
@wonmin_byeon
Wonmin Byeon
27 days
@JeffKarikariOs It's open now. Sorry.
0
0
0
@wonmin_byeon
Wonmin Byeon
27 days
@r69shabh No it's for a PhD student. Sorry.
1
0
1
@wonmin_byeon
Wonmin Byeon
27 days
@Mannananba From any country.
0
0
0
@wonmin_byeon
Wonmin Byeon
27 days
@fk0804018 This position is for a PhD student.
0
0
0
@wonmin_byeon
Wonmin Byeon
27 days
Sorry. My DM is open now.
1
0
10
@wonmin_byeon
Wonmin Byeon
1 month
RT @rupspace: Wrote a post about Highway networks, ResNets and subtleties of architecture comparisons
Tweet media one
0
40
0
@wonmin_byeon
Wonmin Byeon
2 months
RT @PavloMolchanov: 🚀 Introducing Hymba-1.5B: a new hybrid architecture for efficient small language models! ✅ Outperforms Llama, Qwen, an…
0
56
0
@wonmin_byeon
Wonmin Byeon
3 months
Let's move to Bluesky!
Tweet media one
0
0
1
@wonmin_byeon
Wonmin Byeon
3 months
Our new hybrid model is out! Our Hymba-1.5B even outperforms LLaMA 3.2-3B. Check out the paper for more details.
@PavloMolchanov
Pavlo Molchanov
3 months
Sharing our team’s latest work on Hymba - an efficient small language model with hybrid architecture. Tech report: Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be mitigated, and how KV cache can be shared across layers. Learn how we built a model with end-to-end ecosystem: data selection, architecture analysis and design, training Base and Instruct models and open them to the community. Did I mention that our Hymba-1.5B Base model outperforms LLaMA 3.2-3B while being trained on 7× fewer tokens and achieving 12× higher throughput? More details and model links come soon!
Tweet media one
0
1
9
@wonmin_byeon
Wonmin Byeon
4 months
RT @rupspace: Interested in Discrete Diffusion? I've just released a Github repo where you can learn about and play with discrete diffusion…
0
20
0
@wonmin_byeon
Wonmin Byeon
7 months
I will give a talk at KAIST today (July 17th) at 5pm PDT. The talk is about Mamba-based models and the findings from our recent paper. Everyone is welcome to join! The Zoom link is below.
@aliceoh
Alice Oh
7 months
Excited to host a Zoom talk by Dr. Wonmin Byeon on her research with Nvidia colleagues on "An Alternative Architecture for Efficient Large Language Models (LLMs)" This will be on Zoom, July 17th 5 pm PDT (July 18th 9 am KST), Abstract: Widely used Large Language Models (LLMs) are based on Transformer architectures. While Transformer-based language models are highly parallelizable and can model massive amounts of data, they introduce significant computational overhead due to the quadratic self-attention calculations, especially on longer sequences. They also have large inference-time memory requirements from the key-value cache. More recently, State Space Models (SSM) like Mamba have been shown to have fast parallelizable training and inference. Studies show that SSMs can match or exceed the language modeling capabilities of Transformers, making them an attractive alternative. In this talk, I present the strengths and weaknesses of Mamba, Mamba-2, and Transformer models at larger scales. I also introduce a hybrid architecture consisting of Mamba-2, attention, and MLP layers. While pure SSMs match or exceed Transformers on many tasks, they lag behind Transformers on tasks that require strong copying or in-context learning abilities. In contrast, the hybrid model closely matches or exceeds the Transformer on all standard and long-context tasks and is predicted to be up to 8x faster when generating tokens at inference time. Bio: Wonmin Byeon () is a senior research scientist at NVIDIA Research in Santa Clara, US. She received her Ph.D. in Computer Science from Technical University Kaiserslautern, Germany. During her Ph.D., she was a visiting researcher at IDSIA, Switzerland, working with Juergen Schmidhuber. She then joined as a post-doctoral researcher at IDSIA and ETH Zurich. Her research interests include Recurrent Neural Networks, State Space Models, and linear RNNs for temporal or spatio-temporal domains. 📷
1
1
16
@wonmin_byeon
Wonmin Byeon
8 months
w/ @RWaleffe, @DuncanARiach, @BrandonNor90881 , Vijay Korthikanti, @tri_dao @_albertgu @ahatamiz1, Sudhakar Singh, @deepakn94, Garvit Kulshreshtha, Vartika Singh, Jared Casper, @jankautz, @MohammadShoeybi, @ctnzr
0
0
2
@wonmin_byeon
Wonmin Byeon
8 months
5-shot MMLU.
Tweet media one
0
0
1
@wonmin_byeon
Wonmin Byeon
8 months
Our phonebook evaluation.
Tweet media one
0
0
1
@wonmin_byeon
Wonmin Byeon
8 months
The paper also includes an in-depth analysis of Mamba and Mamba-2 compared to Transformers and how to design a hybrid model.
0
0
1
@wonmin_byeon
Wonmin Byeon
1 year
ConvSSM: State Space Models for long videos 🎉 We finally released the code and the pretrained models. Code: Paper: @NVIDIAAI @jimmysmith1919
@wonmin_byeon
Wonmin Byeon
1 year
📢 Excited to share our work at #NeurIPS2023: ConvSSM, a powerful sequence model for long videos. poster: Tuesday at 5:15pm, Great Hall & Hall B1+B2 #705 (coming soon) Work done with @jimmysmith1919 @shalinidemello @jankautz 🧵👇
Tweet media one
1
2
15