sriniiyer88 Profile Banner
Srini Iyer Profile
Srini Iyer

@sriniiyer88

Followers
1K
Following
638
Media
18
Statuses
139

Research Scientist at Facebook AI Research

Seattle, WA
Joined February 2012
Don't wanna be here? Send us removal request.
@sriniiyer88
Srini Iyer
9 months
New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: https://t.co/GJSiFtugju (1/n)
1
22
86
@sriniiyer88
Srini Iyer
3 months
Turns out, if you teach llamas how to self-reflect and backtrack from wrong reasoning paths, it does extra well on math reasoning! - MATH 500: 65.8% ➡️ 81.8% - AMC 23: 37.5% ➡️ 64.4% - AIME 24: 10% ➡️ 30% Amazing work by @danieljwkim, can be a nice long weekend read!
@danieljwkim
Joongwon Kim
3 months
Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: https://t.co/PdzwNVqkJ2
1
13
65
@sriniiyer88
Srini Iyer
4 months
This is exciting! Check out our new step-by-step playbook that shows how to do MoT on top of your existing transformer implementation! Also, MoT is now in TMLR! Huge congrats to @liang_weixin, @VictoriaLinML and others!
@liang_weixin
Weixin Liang
4 months
🎉 Excited to share: "𝐌𝐢𝐱𝐭𝐮𝐫𝐞-𝐨𝐟-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 (𝐌𝐨𝐓)" has been officially accepted to TMLR (March 2025) and the code is now open-sourced! 📌 GitHub repo: https://t.co/KiDbxpDWt0 📄 Paper: https://t.co/KQoZ3cunEf How can we reduce pretraining costs for
Tweet media one
Tweet media two
1
1
4
@jffwng
Jeff Wang 👨‍🚀
5 months
We just released model weights for our 1B & 8B-parameter BLT: Byte Latent Transformer, token-less model with sig. improvements in inference efficiency and robustness Model on @huggingface: https://t.co/vMyZOpZy3M Code: https://t.co/iKoyxKG40l Paper: https://t.co/FLBRnHLl5d
Tweet media one
12
78
469
@EntilZhaPR
Dr. Pedro Rodriguez @[email protected]
5 months
By popular demand (see our GH issues 😅), we're releasing 1B and 8B weights for our BLT models! We're also hard at work at adding BLT to HF transformers! Model Weights: https://t.co/gfqg5ADYkg Code + Instructions for loading weights:
Tweet card summary image
github.com
Code for BLT research paper. Contribute to facebookresearch/blt development by creating an account on GitHub.
0
6
19
@AIatMeta
AI at Meta
5 months
🚀 Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest advancements are transforming our understanding of perception. 1️⃣ Meta Perception Encoder: A large-scale vision encoder that excels across several image &
55
224
971
@sriniiyer88
Srini Iyer
5 months
Huge thanks to @EntilZhaPR , @gargighosh, @ArtidoroPagnoni, @LukeZettlemoyer for this release!
0
0
3
@sriniiyer88
Srini Iyer
5 months
BLT model weights are out! Responding to popular demand, we just open-sourced model weights for our 1B and 8B BLT models for the research community to play with! https://t.co/XQsYrM9GqK Hoping to see many new and improved BLT based architectures this year!
Tweet card summary image
huggingface.co
3
21
71
@sriniiyer88
Srini Iyer
8 months
We're hiring PhD interns for Summer 2025 in Seattle to work with us on improving BLT even more! If this is something that excites you, reach out to me on dm/email asap!
@AIatMeta
AI at Meta
9 months
New from Meta FAIR �� Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ https://t.co/0iamZCRnMN
Tweet media one
4
29
315
@sriniiyer88
Srini Iyer
9 months
BLT related post by Meta AI - eliminate all tokenization once and for all!
@AIatMeta
AI at Meta
9 months
New from Meta FAIR — Byte Latent Transformer: Patches Scale Better Than Tokens introduces BLT, which for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency & robustness. Paper ➡️ https://t.co/0iamZCRnMN
Tweet media one
0
3
10
@dimitrizho
Dimitri Zhorzholiani
9 months
Meta's Byte Latent Transformer (BLT) paper looks like the real-deal. Outperforming tokenization models even up to their tested 8B param model size. 2025 may be the year we say goodbye to tokenization.
Tweet media one
0
2
3
@edkesuma
Edrick🕗
9 months
Gm. Woke up to a new paper on Byte Latent Transformers (BLT). Now you can increase model size without increasing inference compute by tweaking *patch sizes*. Great day for LLMs. Full article: https://t.co/ZLZIf0i7vb
Tweet media one
3
3
12
@PowerSystemAuto
Power System Automation
9 months
Meta AI's Byte Latent Transformer (BLT) is revolutionizing the tokenization process, enhancing scalability and efficiency. This model could redefine how we approach natural language processing, paving the way for more streamlined AI applications. Exciting times ahead for tech
1
1
3
@Smol_AI
AI News by Smol AI
9 months
[13 Dec 2024] Meta BLT: Tokenizer-free, Byte-level LLM https://t.co/JyB3XgAkU3 a few months ago @karpathy noted that tokenizers are the root of all evils in llm flaws. Could @AIatMeta have finally cracked the algorithm to process byte-level data directly (enabling all kinds of
Tweet media one
@scaling01
Lisan al Gaib
9 months
META JUST KILLED TOKENIZATION !!! A few hours ago they released "Byte Latent Transformer". A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness! (I was just talking about how we need dynamic
Tweet media one
Tweet media two
Tweet media three
Tweet media four
7
7
41
@ZainHasan6
Zain
9 months
Pretty cool work on tokenization-less transformer from Meta! > Byte Latent Transformer (BLT), byte-level LLM architecture, matches tokenization-based LLM performance > BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. >
2
5
22
@AkshatS07
Akshat Shrivastava
9 months
Been waiting for this one, a strong step in removing tokenization from LLMs. Congrats to the team!
@sriniiyer88
Srini Iyer
9 months
New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: https://t.co/GJSiFtugju (1/n)
0
3
19
@jmbollenbacher
JMB
9 months
This could be one of the biggest AI papers of the year, if it really works as well as they report in this paper. It's hard to overstate how impactful ending the tyranny of tokenizers would be for AI. I'm very eager to see the open source implementations and replications.
@ArtidoroPagnoni
Artidoro Pagnoni
9 months
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
Tweet media one
3
3
16
@_xjdr
xjdr
9 months
Llamas ... Tokenizer Free?! USING ENTROPY STEERING?!?!! sometimes the universe conspires to make a paper just for you and it feels wonderful when it happens.
@ArtidoroPagnoni
Artidoro Pagnoni
9 months
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
Tweet media one
12
36
709
@scaling01
Lisan al Gaib
9 months
I can rest now🥲 I have gathered all the infinity stones. thanks @karpathy
Tweet media one
8
12
1K
@liliyu_lili
Lili Yu (ICLR2025)
9 months
We scaled up Megabyte and ended up with a BLT! A pure byte-level model, has a steeper scaling law than the BPE-based models. With up to 8B parameters, BLT matches Llama 3 on general NLP tasks—plus it excels on long-tail data and can manipulate substrings more effectively. The
@ArtidoroPagnoni
Artidoro Pagnoni
9 months
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 https://t.co/5QGrlJdK0y Code 🛠️ https://t.co/jCdDI5BXwe
Tweet media one
0
10
70