Wonmin Byeon @wonmin_byeon profile

Wonmin Byeon

@wonmin_byeon

Followers

971

Following

121

Statuses

58

Researcher

California

Joined March 2020

Don't wanna be here? Send us removal request.

Wonmin Byeon

@wonmin_byeon

8 months

Here is our new 8B Mamba-based Hybrid LLM: Higher MMLU compared to the 8B transformer and long context extension up to 128K sequences.

Bryan Catanzaro

@ctnzr

8 months

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less

4

6

36

Wonmin Byeon

@wonmin_byeon

27 days

@BeeAGass It's open now. Sorry.

0

1

Wonmin Byeon

@wonmin_byeon

27 days

@JeffKarikariOs It's open now. Sorry.

0

Wonmin Byeon

@wonmin_byeon

27 days

@r69shabh No it's for a PhD student. Sorry.

1

0

1

Wonmin Byeon

@wonmin_byeon

27 days

@Mannananba From any country.

0

Wonmin Byeon

@wonmin_byeon

27 days

@fk0804018 This position is for a PhD student.

0

Wonmin Byeon

@wonmin_byeon

27 days

Sorry. My DM is open now.

1

0

10

Wonmin Byeon

@wonmin_byeon

1 month

RT @rupspace: Wrote a post about Highway networks, ResNets and subtleties of architecture comparisons

0

40

0

Wonmin Byeon

@wonmin_byeon

2 months

RT @PavloMolchanov: 🚀 Introducing Hymba-1.5B: a new hybrid architecture for efficient small language models! ✅ Outperforms Llama, Qwen, an…

0

56

0

Wonmin Byeon

@wonmin_byeon

3 months

Let's move to Bluesky!

0

1

Wonmin Byeon

@wonmin_byeon

3 months

Our new hybrid model is out! Our Hymba-1.5B even outperforms LLaMA 3.2-3B. Check out the paper for more details.

Pavlo Molchanov

@PavloMolchanov

3 months

Sharing our team’s latest work on Hymba - an efficient small language model with hybrid architecture. Tech report: Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be mitigated, and how KV cache can be shared across layers. Learn how we built a model with end-to-end ecosystem: data selection, architecture analysis and design, training Base and Instruct models and open them to the community. Did I mention that our Hymba-1.5B Base model outperforms LLaMA 3.2-3B while being trained on 7× fewer tokens and achieving 12× higher throughput? More details and model links come soon!

0

1

9

Wonmin Byeon

@wonmin_byeon

4 months

RT @rupspace: Interested in Discrete Diffusion? I've just released a Github repo where you can learn about and play with discrete diffusion…

0

20

0

Wonmin Byeon

@wonmin_byeon

7 months

I will give a talk at KAIST today (July 17th) at 5pm PDT. The talk is about Mamba-based models and the findings from our recent paper. Everyone is welcome to join! The Zoom link is below.

Alice Oh

@aliceoh

7 months

Excited to host a Zoom talk by Dr. Wonmin Byeon on her research with Nvidia colleagues on "An Alternative Architecture for Efficient Large Language Models (LLMs)" This will be on Zoom, July 17th 5 pm PDT (July 18th 9 am KST), Abstract: Widely used Large Language Models (LLMs) are based on Transformer architectures. While Transformer-based language models are highly parallelizable and can model massive amounts of data, they introduce significant computational overhead due to the quadratic self-attention calculations, especially on longer sequences. They also have large inference-time memory requirements from the key-value cache. More recently, State Space Models (SSM) like Mamba have been shown to have fast parallelizable training and inference. Studies show that SSMs can match or exceed the language modeling capabilities of Transformers, making them an attractive alternative. In this talk, I present the strengths and weaknesses of Mamba, Mamba-2, and Transformer models at larger scales. I also introduce a hybrid architecture consisting of Mamba-2, attention, and MLP layers. While pure SSMs match or exceed Transformers on many tasks, they lag behind Transformers on tasks that require strong copying or in-context learning abilities. In contrast, the hybrid model closely matches or exceeds the Transformer on all standard and long-context tasks and is predicted to be up to 8x faster when generating tokens at inference time. Bio: Wonmin Byeon () is a senior research scientist at NVIDIA Research in Santa Clara, US. She received her Ph.D. in Computer Science from Technical University Kaiserslautern, Germany. During her Ph.D., she was a visiting researcher at IDSIA, Switzerland, working with Juergen Schmidhuber. She then joined as a post-doctoral researcher at IDSIA and ETH Zurich. Her research interests include Recurrent Neural Networks, State Space Models, and linear RNNs for temporal or spatio-temporal domains. 📷

1

16

Wonmin Byeon

@wonmin_byeon

8 months

w/ @RWaleffe, @DuncanARiach, @BrandonNor90881 , Vijay Korthikanti, @tri_dao @_albertgu @ahatamiz1, Sudhakar Singh, @deepakn94, Garvit Kulshreshtha, Vartika Singh, Jared Casper, @jankautz, @MohammadShoeybi, @ctnzr

0

2

Wonmin Byeon

@wonmin_byeon

8 months

5-shot MMLU.

0

1

Wonmin Byeon

@wonmin_byeon

8 months

Our phonebook evaluation.

0

1

Wonmin Byeon

@wonmin_byeon

8 months

The paper also includes an in-depth analysis of Mamba and Mamba-2 compared to Transformers and how to design a hybrid model.

0

1

Wonmin Byeon

@wonmin_byeon

1 year

ConvSSM: State Space Models for long videos 🎉 We finally released the code and the pretrained models. Code: Paper: @NVIDIAAI @jimmysmith1919

Wonmin Byeon

@wonmin_byeon

1 year

📢 Excited to share our work at #NeurIPS2023: ConvSSM, a powerful sequence model for long videos. poster: Tuesday at 5:15pm, Great Hall & Hall B1+B2 #705 (coming soon) Work done with @jimmysmith1919 @shalinidemello @jankautz 🧵👇

1

2

15