![Lin Zheng Profile](https://pbs.twimg.com/profile_images/1787279728002621440/09ojaByD_x96.jpg)
Lin Zheng
@linzhengisme
Followers
262
Following
335
Statuses
74
RT @etash_guha: I’m excited to announce Open Thoughts. The DataComp team is working on the best reasoning datasets for math, code, and more…
0
7
0
RT @yihengxu_: Happy to teach Qwen2.5-VL agent capabilities to use computers and mobile devices :D Now Aguvis became Master Shifu. Waiti…
0
11
0
RT @Alibaba_Qwen: 🎉 恭喜发财🧧🐍 As we welcome the Chinese New Year, we're thrilled to announce the launch of Qwen2.5-VL , our latest flagship vi…
0
558
0
RT @taoyds: One more big news on open source AI community! Thanks, Qwen and DeepSeek! Happy Lunar New Year 🧧! The year of 🐍.
0
4
0
@YiTayML ...our eval was zero-/few-shot and primarily focused on English, while ByT5 was trained on mC4 with span corruption and typically needs task-specific fine-tuning. (2/2)
0
0
3
@iamgrigorev Thanks a bunch! Yes, we'd love to scale up, but unfortunately we don't have the compute for it rn :( But this model only consumed ~0.5T tokens and scaled pretty well so far, so there's def room for improvement!
0
0
2
@janekm Haha yes, but we do standardize JPEG processing for easier training. Here’s our HF-format image processor: There has been some work on JPEG image modeling, and our image pipeline is adapted from @XiaochuangHan 's awesome JPEG-LM:
👽Have you ever accidentally opened a .jpeg file with a text editor (or a hex editor)? Your language model can learn from these seemingly gibberish bytes and generate images with them! Introducing *JPEG-LM* - an image generator that uses exactly the same architecture as LLMs (e.g., Llama). It simply learns the language of canonical codecs rather than natural languages. 📖
1
0
2
@minosvasilias Thanks for the suggestion! We're compiling more eval results, including char-level understanding (like CUTE) & robustness tasks into our upcoming tech report. Excited to share more soon!
0
0
2
@A_Badr057 Thanks so much for the kind words! Our training was done on SambaNova's specialized software and hardware stack, so adapting the code is a bit tricky. That said, we're actively working on making EvaByte training and inference more accessible—stay tuned! xD
0
0
1
RT @ma_chang_nlp: 🥳 Introducing our #ICLR2025 paper: "Non-myopic Generation of Language Models for Reasoning and Planning" TLDR: We introd…
0
17
0
RT @FaZhou_998: 1⃣Obviously, in my AC's view, running 30+ small to medium-scale pre-training experiments shows no real-world value. 2⃣And o…
0
2
0
RT @ma_chang_nlp: Equally excited to share our ICLR rejected work: "Benchmarking and Enhancing Large Language Models for Biological Pathway…
0
4
0
RT @sansa19739319: Our paper has been accepted to @iclr_conf #ICLR2025 We hope this first 7B diffusion language model inspires the commun…
0
10
0
RT @UrmishThakker: Very excited to announce the availability of the best open-source tokenizer-free language model - EvaByte. A 6.5B byte-l…
0
3
0
RT @changran_hu: 🚀Excited to announce the best open-source tokenizer-free language model! EvaByte, our 6.5B byte-level LM developed by @HK…
0
4
0