lvwerra Profile Banner
Leandro von Werra Profile
Leandro von Werra

@lvwerra

Followers
8K
Following
5K
Media
298
Statuses
2K

Machine learning @huggingface: co-lead of @bigcodeproject and maintainer of TRL.

Bern, Switzerland
Joined March 2019
Don't wanna be here? Send us removal request.
@lvwerra
Leandro von Werra
11 months
Super proud of what the @BigCodeProject community achieved. Building the best in class code LLMs in an open and collaborative way is no easy feat and is the result of the hard work of many community members!.
@BigCodeProject
BigCode
11 months
Introducing: StarCoder2 and The Stack v2 ⭐️. StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open!.
Tweet media one
3
13
78
@lvwerra
Leandro von Werra
1 month
Jupyter Agents - LLMs running data analysis directly in a notebook!. The agent can load data, execute code, plot results and following your guidance and ideas!. A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!
44
438
2K
@lvwerra
Leandro von Werra
3 years
Evaluation is one of the most important aspects of ML but today’s evaluation landscape is scattered and undocumented which makes evaluation unnecessarily hard. For that reason we are excited to release 🤗 Evaluate!. Let’s take a tour:
Tweet media one
12
330
2K
@lvwerra
Leandro von Werra
2 years
Did you know that you can train all Llama-2 models on your own data in just a few lines?. The script even works with the 70B model on a single A100 GPU thanks to the magic of 4bit and and PEFT!. Learn more: Full script:
Tweet media one
18
274
1K
@lvwerra
Leandro von Werra
26 days
Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases. Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
Tweet media one
7
183
1K
@lvwerra
Leandro von Werra
3 years
Our book "Natural Language Processing with Transformers: Building Language Applications with Hugging Face" can now be preordered!. This thread gives an overview of what you can expect by summarizing the content of each chapter:
Tweet media one
19
173
1K
@lvwerra
Leandro von Werra
2 years
Excited to introduce: StackLlama🦙. An end-to-end tutorial for training Llama with RLHF on preference data such as the StackExchange questions!. Blog: Demo: Code: The resulting model is surprisingly fun!🧵
Tweet media one
23
222
881
@lvwerra
Leandro von Werra
3 years
It finally arrived! 🎉 So I guess it is a real thing now. Thanks to everybody who ordered it. Because of all of you it is the #1 release on Amazon in NLP, #3 in ML&AI, and #4 in all of computer science! ❤️.
Tweet media one
20
86
804
@lvwerra
Leandro von Werra
3 years
Can we create all the code for training GitHub CoPilot in a (looong) tweet thread? Yes, see how to train CodeParrot🦜, a large GPT-2 model for code, from scratch in this thread! Ready - go!
Tweet media one
8
111
539
@lvwerra
Leandro von Werra
3 years
How do models like GPT-2 and BERT represent position of tokens? When visualizing their positional encodings I found an interesting pattern. A short thread:
Tweet media one
Tweet media two
8
81
517
@lvwerra
Leandro von Werra
4 months
solving problems using BERT that can be solved by a RegEx is another level of skill issue.
@mervenoyann
merve
4 months
solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue.
14
33
508
@lvwerra
Leandro von Werra
3 years
Thanks to @_lewtun all chapters from our book "NLP with Transformers" are now just one click away!. And you can run them on your favourite platform: . - @Google Colab.- @kaggle Notebooks.- @HelloPaperspace Gradient.- @awscloud StudioLab. #transformersbook
Tweet media one
3
111
480
@lvwerra
Leandro von Werra
2 months
All the things you need to know to pretrain an LLM at home*!. Gave a workshop at @unibern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO. *assuming your home includes an H100 cluster
Tweet media one
Tweet media two
Tweet media three
Tweet media four
12
71
468
@lvwerra
Leandro von Werra
3 years
Training large (transformer) models can be challenging and sooner or later a big red "CUDA out of memory" slaps you in the face. But there are a few things one can do to scale model training. We updated the docs and in this thread I cover the main tricks:.
Tweet media one
7
75
428
@lvwerra
Leandro von Werra
1 month
The research team at Hugging Face shipped in 2024! 🚢. How? Small teams, big ambitions and a sprinkle of compute. With the goal to do great open science we released models, datasets, tools, educational blogs and more!. A recap:
Tweet media one
5
91
440
@lvwerra
Leandro von Werra
5 years
I built my first python package: jupyterplot! Plot real-time results in Jupyter notebooks. It is founded on @andreas_madsen's excellent python-lrcurve library. Effortless development and publishing with #nbdev by @jeremyphoward and @GuggerSylvain.
7
103
408
@lvwerra
Leandro von Werra
3 years
Recently, we have been working on preparing a large code dataset for pretraining a large language model - a project we call CodeParrot 🦜. This is a thread about the challenges we faced when creating such a large corpus.
Tweet media one
8
64
386
@lvwerra
Leandro von Werra
5 years
Excited to share my quarantine work of the past couple of weeks: The Transformer Reinforcement Learning (trl) library. Train @huggingface transformer language models with sparse rewards via reinforcement learning (PPO).
Tweet media one
4
72
357
@lvwerra
Leandro von Werra
2 years
The Falcon 🦅 has landed in the 🤗 ecosystem! . Make the most of these truly open source models that stormed to the top of the LLM leaderboard. We show how to host them locally or via endpoints, and how you can fine-tune them on a Colab GPU!. Check it out:
7
87
360
@lvwerra
Leandro von Werra
3 months
@EugeneVinitsky You could upload it to the Hugging Face Hub, which is free to host and download. It also adds a dataset viewer if it's any common format/modality, but it's git-lfs under the hood so you can upload whatever files format you want. You can either use the `datasets` or.
4
8
366
@lvwerra
Leandro von Werra
1 year
Today I gave myself a promotion:
Tweet media one
13
12
354
@lvwerra
Leandro von Werra
1 year
Companies are currently deciding whether to invest in fine-tuning their own open LLMs (e.g. Llama 2) or to use closed LLMs via an API (e.g. Claude or GPT4). Of course, you can build quick & cheap prototypes with closed LLMs, but fine-tuning on high quality domain data is cheaper
Tweet media one
8
64
340
@lvwerra
Leandro von Werra
2 years
A very underrated architecture tweak to GPT is multi-query attention (MQA): sharing value/key across attention heads saves a lot of memory in the kv-cache. Max generation batch size on a Colab GPU with a 1B model:❗️512❗️ vs 32 (vanilla GPT). Test it here:
7
70
337
@lvwerra
Leandro von Werra
3 years
Finally upgraded my MacBook to an M1 (Max). I was curious if I could use the GPU to train a Transformer model. TL;DR: It works and training DistilberBERT is roughly 2.5x faster than the K80 you usually get on Colab and 50% slower than a V100! . (Installation steps below)
Tweet media one
6
33
331
@lvwerra
Leandro von Werra
4 years
Excited to finally share what kept us up at night for the past few months: . Natural Language Processing with Transformers. A book about building real-world applications with 🤗 transformers in collaboration with @_lewtun, @Thom_Wolf, and @OReillyMedia.
Tweet media one
8
77
326
@lvwerra
Leandro von Werra
3 months
How does a company called Hugging Face make money?. Your best wrong answers only to celebrate the 1000th time the question was asked 🎉.
103
15
322
@lvwerra
Leandro von Werra
1 year
Anybody can train a custom Mistral model on their own dataset in just a few lines of code with TRL! 🚀. The SFTTrainer supports DeepSpeed for distributed training or PEFT if you are limited by GPU resources. TRL: Full script:
Tweet media one
3
71
317
@lvwerra
Leandro von Werra
3 years
Recently, AlphaCode was released: a large language model pretrained on a GitHub dataset with a dozen programming languages. We release a similar (but larger!) dataset on the Hugging Face Hub! A thread on how to use it and two nice features of the Hub:.
6
73
297
@lvwerra
Leandro von Werra
11 months
Text clustering at home? Yes, with text-clustering, a tiny smol repo:. The pipeline is fully built on open tools:.1⃣embed with sentence-transformers.2⃣project to 2d with UMAP.3⃣run DBSCAN clustering.4⃣label clusters with Mixtral. Runs in 5-10min and tada:
Tweet media one
10
44
280
@lvwerra
Leandro von Werra
3 months
SmolLM2 is here!💥. The strongest 1B model and matching models twice the size such as Gemma-2, Qwen2.5 or Llama3.2. Huge effort from a smol team going toe-to-toe with the biggest tech labs in the world. and fully open!
Tweet media one
4
43
280
@lvwerra
Leandro von Werra
3 years
Training language models with reinforcement learning and human feedback has gained more traction with InstructGPT, GopherCite, or the latest research from Anthropic. With TRL you can, too! Happy to release TRL 0.1.0 with some major improvments! 🎉🎉🎉.
5
58
261
@lvwerra
Leandro von Werra
1 year
Great news: The Zephyr team releases the Alignment Handbook! 🎉. It includes the full code to fine-tune your own models with their success recipe and builds on top of TRL. Along with it the required models and datasets are also released!🔥. Check it out:.
6
47
259
@lvwerra
Leandro von Werra
2 years
With the ChatGPT wave rolling over twitter I also want to celebrate that the TRL library crossed 500 stars⭐️🎉. If you want to play with reinforcement learning for language models (the tech behind ChatGPT) yourself checkout the repo&library:.
Tweet media one
4
40
247
@lvwerra
Leandro von Werra
10 months
We released 🍷FineWeb: 15T high quality tokens from the web. It's the best ready-to-use AND the largest pretraining dataset. Outperforms all other datasets in our 350B token ablations but scales to much longer training runs due to its sheer size!.
Tweet media one
@gui_penedo
Guilherme Penedo
10 months
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Tweet media one
3
52
240
@lvwerra
Leandro von Werra
1 year
Is the complicated RL part in RLHF really necessary for fine-tuning LLMs on human preference data? The new DPO method challenges that view!. DPO is now integrated into TRL and we used it to train StackLlama-2!. Read more about it in the blog post:.
7
55
203
@lvwerra
Leandro von Werra
1 year
Google releases Gemma: a state-of-the-art 2B and a 7B model trained for 2T and 6T tokens respectively! 🚀. They release both a base and an instruction tuned model which can be used for commercial applications. You can fine-tune it in just a few lines:
Tweet media one
2
30
193
@lvwerra
Leandro von Werra
2 years
Releasing TRL==0.2.0! 🚀. Training language models with RL (e.g RLHF) is now easier than ever:. New accelerate backend (multi-GPU support!), support for decoder and encoder-decoder models, weight sharing, shiny new docs (and logo), reworked examples!. 🧵:
Tweet media one
5
44
189
@lvwerra
Leandro von Werra
1 year
Not much is known about the pretraining data of Code Llama but there is some good evidence the @StackOverflow was part of it. Found some breadcrumbs while working on a demo with a hello world example: Suddenly the model started generating a discussion between two users.
Tweet media one
8
42
179
@lvwerra
Leandro von Werra
4 years
Tokenizing 200k texts with @huggingface‘s FastTokenizers: <20s. Drawback: not enough time to get a fresh coffee 🙂.
4
19
171
@lvwerra
Leandro von Werra
10 months
Why would any company release a strong LLM for free? . This is @ServiceNow (mkt cap ~$160Bn) stock price since the release of StarCoder. Only required a small amount of compute and a handful of people while building up a lot of valuable know-how fast. It's not a zero-sum game!
Tweet media one
9
30
163
@lvwerra
Leandro von Werra
2 years
Finding the right code snippet solving your problem is a super useful skill. @novoselrok made this much easier with a new tool: And on top he wrote an excellent in-depth guide to how to build a search model with transformers:.
Tweet media one
0
40
161
@lvwerra
Leandro von Werra
5 years
Working with nbdev ( is such a great experience!. Finally breaking the cycle of starting to code in notebooks, then exporting the code to a library, and finally writing documentation. Awesome job @jeremyphoward & @GuggerSylvain!.
2
27
168
@lvwerra
Leandro von Werra
3 years
Sharing data with 🤗datasets has recently become a matter of a few lines. Here's an example where Alice shares some interesting data with Bob via the 🤗 Hub. 👩‍💻-->🤗-->👨‍💻. Since 🤗datasets also works with Pandas this is a great way to share your favourite DataFrames.
Tweet media one
5
27
160
@lvwerra
Leandro von Werra
1 year
Introducing TextEnvironments in TRL 0.7.0! . With TextEnvironments you can teach your language models to use tools to solve tasks more reliably. We trained models to use Wiki search and Python to answer trivia and math questions!. Let's have a look how🧵
Tweet media one
7
36
166
@lvwerra
Leandro von Werra
8 months
Nemotron-340B was trained on 768x8 H100s. What this actually looks like: each square is a GPU and each color block highlights a model instance. The model was trained with TP=8, PP=12, DP=64 (shades indicate PP stages) such that one model instance requires 96 GPUs.
Tweet media one
4
26
147
@lvwerra
Leandro von Werra
9 months
Amazing release from @IBM: Fully transparent Granite Code models! 🚀. Comes in 3B, 8B, 20B and 34B sizes with base and instructions models and are pretrained on The Stack and high quality sources🔥. Models: Paper: Congrats team!
Tweet media one
3
38
130
@lvwerra
Leandro von Werra
3 years
One of the most interesting aspects of code generation models such as Codex or CodeParrot🦜 is the evaluation. Metrics such as BLEU or ROUGE metrics don't work well for that task. Let's see why and then build an evaluation script in the process!
Tweet media one
2
19
129
@lvwerra
Leandro von Werra
1 year
With NEFTune you can boost the performance of supervised fine-tuning, especially on noisy instruction datasets! . We added NEFTTune to TRL so you can use it with just one additional line in the SFTTrainer:
Tweet media one
3
33
128
@lvwerra
Leandro von Werra
2 years
With the release of ChatGPT we thought it is a good time to write a blog explaining how RL from human feedback works and what the current state of this exciting new field is. Collaboration with @natolambert and @carperai folks @lcastricato and @Dahoas1.
1
31
125
@lvwerra
Leandro von Werra
3 years
Training language models like CodeParrot🦜requires a large datasets. After first training experiments with CodeParrot🦜we realized that duplicates in the dataset had a strong impact on performance. Let's see what it takes to deduplicate a large (~200GB) dataset with🤗datasets.
Tweet media one
2
21
125
@lvwerra
Leandro von Werra
26 days
Checkout the code and super detailed video walkthrough!. Code: Video: Work lead by @Haojun_Zhao14 and @FerdinandMom!.
1
25
126
@lvwerra
Leandro von Werra
1 month
Or watch how the model solves the Lokta-Volterra equation and plots the results and refines them. Try it out:
5
22
124
@lvwerra
Leandro von Werra
1 year
The full pipeline from processing web scale datasets to pretraining and evaluating LLMs on large GPU clusters is now possible in probably <1000 LoC 🚀. In the last few weeks we released three libraries that we hope will enable everyone to build amazing datasets and strong LLMs!
Tweet media one
4
22
114
@lvwerra
Leandro von Werra
6 months
It's beautiful to see how far you can get with a 360M model (5x smaller than GPT-2!) with a few tricks:. - curate the pretraining data for educational content.- choose well tuned hyperparameters.- and latest: add simple conversations to the SFT mix. Result: 🤏SmolLM
3
23
114
@lvwerra
Leandro von Werra
10 months
Meta Llama3 is out - almost identical architecture as Llama2 but trained for. 15 trillion tokens!🚀. Chatting with the instruct model or fine-tuning the base model on a custom dataset is as easy as one command!. Read more about it in the blog post:
Tweet media one
2
24
109
@lvwerra
Leandro von Werra
2 years
A few weeks back @harmdevries77 released an interesting analysis (go smol, or go home!) of scaling laws which @karpathy coined the Chinchilla trap. A quick thread on when to deviate left or right from the Chinchilla optimal point and the implications.🧵
Tweet media one
1
27
107
@lvwerra
Leandro von Werra
1 year
Falcon 180B model was just released: it beats Llama 2 as well as ChatGPT and is on on par with Palm2-L 💪. It trained on 3.5T tokens on 4k GPUs totalling in 4x the compute of Llama 2 🔥. Blog: Model: Demo:
Tweet media one
3
17
106
@lvwerra
Leandro von Werra
3 years
With today's release of 🤗Evaluate (0.2.0) we are adding three more evaluators (QA, NER, Image classification). This make it a matter of a few lines of code to evaluate any (model, dataset, metric) triplet🎉. Really excited about this! Let's take a tour!
Tweet media one
2
20
104
@lvwerra
Leandro von Werra
1 year
It's been a wild year part 1: TRL went from a pet project to train transformer with PPO to 7k stars with 100+ contributors!. Now there is also SFT, DPO, IPO, DDPO (diffusion!), and with PEFT and accelerate you can both train on small hardware and at any large scale. 🚀🚀🚀
Tweet media one
5
17
107
@lvwerra
Leandro von Werra
3 years
One frequent feedback we get on the paper version of our book is that color would add to the reading experience. We invested a lot of time to create the figures and thanks to @OReillyMedia we released ALL of them in the book's repository. A thread with a few of my favourites:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
18
102
@lvwerra
Leandro von Werra
2 years
We are hosting a reading group on Discord about how to apply transformers to your use-cases and how to participate in open-source!🤓. Every two weeks we go through chapters of the Transformers book and discuss how to contribute to open-source. Join here:
Tweet media one
2
18
105
@lvwerra
Leandro von Werra
5 years
I had a blast last Wednesday at the RL Meetup Zürich talking about leveraging the @huggingface transformer library in combination with reinforcement learning to fine-tune language models. In case you are interested the talk is now online:.
5
25
99
@lvwerra
Leandro von Werra
1 year
The code family is expanding extremely fast: Welcome SQLCoder!. Since I made this graphic two weeks ago there have been three new awesome code models. @deci_ai's small and powerful DeciCoder, @Muennighoff et al's OctoCoder trained on commits and now @defogdata's SQLCoder🚀
Tweet media one
@rishdotblog
Rishabh Srivastava
1 year
We just open-sourced SQL Coder, a 15B param text-to-SQL model that outperforms OpenAI's gpt-3.5! When fine-tuned on an individual schema, it outperforms gpt-4. The model is small enough to run on a single A100 40GB in 16 bit floats, or on a single
Tweet media one
1
17
96
@lvwerra
Leandro von Werra
1 year
The power of building tools, datasets, and models in the open: the community can build on top of it and everyone profits! . Exhibit A: since the release of 📑The Stack and ⭐️StarCoder research groups from academia and industry have trained models on top BigCode's releases.
Tweet media one
2
25
86
@lvwerra
Leandro von Werra
4 years
Our book about NLP with Transformers is taking shape and we are working on the final chapters. We would love to know what you want to see covered in the book. If you had to choose one topic that the currently available resources don't cover well what would it be? Let us know!.
@lvwerra
Leandro von Werra
4 years
Excited to finally share what kept us up at night for the past few months: . Natural Language Processing with Transformers. A book about building real-world applications with 🤗 transformers in collaboration with @_lewtun, @Thom_Wolf, and @OReillyMedia.
Tweet media one
5
14
85
@lvwerra
Leandro von Werra
2 years
There is an incredible Chrome extension by @JiaLi52524397 that let's you use StarCoder in Jupyter Notebooks. It uses previous cells and their outputs to make predictions. As such it can infer your DataFrames columns from the cell outputs or interpret your markdown text!
Tweet media one
@JiaLi52524397
Jia Li
2 years
Proud to be part of this incredible journey with @LoubnaBenAllal1, @lvwerra & the amazing team! 🎉 StartCoder has this unique capacity to incorporate jupyter text/output. Experience it for yourself here: Or
2
21
78
@lvwerra
Leandro von Werra
2 years
We released evaluate==0.3.0🎉🎉🎉. A few highlights:.- text2text evaluators (incl. translation and summarization)!.- new bias metrics!.- configure dataset splits and subsests in the evaluator!. Special thanks to contributors @mathemakitten & @SashaMTL♥️
Tweet media one
0
17
75
@lvwerra
Leandro von Werra
8 months
The FineWeb tech report is out!🍷. Find all the details involved in building a high quality pretraining dataset for LLMs including the new super strong FineWeb-Edu subset with 1.3T tokens. [built with the beautiful @distillpub template by @ch402 et al]
Tweet media one
2
13
77
@lvwerra
Leandro von Werra
1 year
Excited to announce that trl has a new home: . 🚀. It started as a pet project a few years ago and recently became much more active with so many contributors. With the transfer we'll move full steam ahead to make RL and RLFH more accessible for everyone!
Tweet media one
0
14
73
@lvwerra
Leandro von Werra
2 years
This week's release of 💫StarCoder was the culmination of 6month incredibly hard work from a lot of collaborators in @BigCodeProject. It's great to see people are enjoying it and already using it for various things. It is also in fine company among trending models on the Hub!
Tweet media one
1
10
71
@lvwerra
Leandro von Werra
3 years
Tweet media one
0
4
69
@lvwerra
Leandro von Werra
11 months
Took some time to reflect on the past 1+year of the @BigCodeProject: Here are a few of my learnings from leading it during this time and some ingredients I think are important for a successful open collaboration in ML. What is BigCode?.BigCode is an open scientific collaboration.
1
18
69
@lvwerra
Leandro von Werra
5 years
Turns out you can also use the trl library to tune GPT-2 to generate negative movie reviews 😈. Experiment by @mrm8488 and model available on @huggingface models.
2
15
66
@lvwerra
Leandro von Werra
1 year
Code Llama by @MetaAI is now available on the Hugging Face Hub and more exciting stuff coming soon!.
1
13
64
@lvwerra
Leandro von Werra
1 year
The closed models via API vs. developing in-house models discussion is reminiscent of the situation with software a few decades ago: buy off the shelf products or hire developers to build custom software. These days every large company is either a software company or at least.
1
12
61
@lvwerra
Leandro von Werra
2 years
One consequence of the plot is that as you go left you need to pay with data but gain inference efficiency and as you go right you gain data efficiency but pay with inference cost. For example at the critical model size (1/3 model size, 100% overhead) you need 6x more data! 🧵.
@Thom_Wolf
Thomas Wolf
2 years
There is a fascinating recent trend of training *smaller models for longer* w.r.t. Chinchilla optimal predictions. Best explanation I've seen of this? This new blog post by @harm_devries (with collaborators of the @BigCodeProject):. Clearly these are only
Tweet media one
3
7
64
@lvwerra
Leandro von Werra
3 years
We are releasing CodeParrot🦜 v1.1! We trained the original model for an additional 30k steps and observe continuous improvement on HumanEval. The model has now seen 40B code tokens which is still far off the 300B+100B of Codex and the 590B of AlphaCode.
Tweet media one
3
14
59
@lvwerra
Leandro von Werra
4 years
I am really excited to see that @OpenAI keeps pushing AI safety at the interface of NLP and RL using human feedback. If you want to take a @huggingface transformer model for a spin check out the TRL library.
@OpenAI
OpenAI
4 years
We've used reinforcement learning from human feedback to train language models for summarization. The resulting models produce better summaries than 10x larger models trained only with supervised learning:
1
20
61
@lvwerra
Leandro von Werra
2 months
Slides here: Inspired by the nice talk from @Thom_Wolf earlier this year and updated with some material we are working on right now:.
0
9
60
@lvwerra
Leandro von Werra
3 years
In my opinion one of the most underappreciated features of the 🤗Hub is that you can add model performance metrics to the model card by simply adding tags to the model's Readme that even links to @paperswithcode🤯. Check out an example here:
Tweet media one
0
15
60
@lvwerra
Leandro von Werra
2 months
The wild part is seeing a Google tutorial of the project that you worked on in a public library to get into NLP a few years ago.
@GoogleCloudTech
Google Cloud Tech
2 months
Join Googler Wietse Venema for a deep dive into fine-tuning open models using Hugging Face Transformer Reinforcement Learning (TRL) using GKE. TRL is an #OpenSource library that provides all the tools you need to train LLMs, developed by @huggingface
4
3
58
@lvwerra
Leandro von Werra
3 years
Eventful day: Evaluation on the Hub launched, CodeGen released in Transformers, and BLOOM model training reaches 100% 🔥.
1
2
55
@lvwerra
Leandro von Werra
3 years
Working on the index of the upcoming transformers for NLP book I've realised that we come across:. 45 model architectures.29 datasets.11 evaluation metrics. I did not expect that many - maybe we went a bit overboard😂 By the way you can now preorder it at:
Tweet media one
2
7
57
@lvwerra
Leandro von Werra
5 years
Check out the new notebook in the trl repo for fine-tuning @OpenAI's GPT-2 for generating text with controlled sentiment:. Using a control token at the beginning of the input one controls the sentiment of the language model output.
Tweet media one
1
10
54
@lvwerra
Leandro von Werra
5 years
Have you ever played Taboo but ran out of cards or wanted to create custom cards? Look no further!. I have fine-tuned @OpenAI's GPT2 with the great @huggingface library to generate new Taboo cards. A few handpicked examples (none of them are in the training set):
Tweet media one
Tweet media two
2
15
53
@lvwerra
Leandro von Werra
3 years
Roughly 10% of BLOOMs training data has been code. So it is pretty good at coding🙂. Since it is multilingual you can even prompt it in arabic!
Tweet media one
@andrew_n_carr
Andrew Carr (e/🤸)
3 years
Pretty cool!! 0.155 Pass@1 is also pretty cool for a non-explicit code model.
2
13
55
@lvwerra
Leandro von Werra
8 months
@LoubnaBenAllal1 test-sets-only.
0
0
52
@lvwerra
Leandro von Werra
2 years
Something oddly satisfying about running . `. , num_proc=60)`. and watching the CPU go brrr for 10min (rather than waiting 10h)
Tweet media one
7
4
53
@lvwerra
Leandro von Werra
1 year
AI is nothing without open source #keepAIopen.
0
6
51
@lvwerra
Leandro von Werra
3 years
Just bumped the github-code dataset to v1.1! . Changes:.- add Scala and TypeScript.- fix issue with deduplication pipeline. Also added some high level statistics of the dataset. A short thread about the deduplication bug and Python hashing:
Tweet media one
5
6
48
@lvwerra
Leandro von Werra
3 years
Looking for a hands-on introduction to Transformers and the🤗ecosystem? @_lewtun and I are giving a workshop at @__MLT__ in 2h! It is free - join and say hi!.
3
9
48
@lvwerra
Leandro von Werra
3 years
A side-effect of all the DALL-2 posts is you can see who on Twitter works at @OpenAI 🙂.
3
1
48
@lvwerra
Leandro von Werra
4 years
I often think about this quote from Jonathan Ive: . “When our tools are broken, we feel broken. And when somebody fixes one, we feel a tiny bit more whole.”. This is so true for software. Using great tools makes every project way more fun and feels like having superpowers.
Tweet media one
1
7
48
@lvwerra
Leandro von Werra
4 years
Who needs a Random Number Generator if you can just try to render a Jupyter notebook on GitHub?.
2
1
45
@lvwerra
Leandro von Werra
1 year
Models trained on public code should be open!. Software Heritage released a statement on training code LLMs on their database with 3 principles:. 1. Resulting models should be open.2. Transparency and attribution for training data.3. Opt-out mechanism.
0
11
45
@lvwerra
Leandro von Werra
1 year
It's been a wild year part 2: Just before the start of the year with @BigCodeProject we released The Stack and started training StarCoder. And this was just the starting point for many community projects that built cool models on top of it! . There is more to come very soon! 👀
Tweet media one
1
9
45
@lvwerra
Leandro von Werra
3 years
If you are curious in what it takes to train a large language model on such a dataset checkout the Transformer book. CodeParrot 🦜 has a dedicated chapter!. It will come out in early 2022 but you can already access it in the early release (online):
1
3
45
@lvwerra
Leandro von Werra
2 years
With Reinforcement Learning from Human Feedback one can tune language models to better align with what humans expect. You can use the same technique to also condition language models to create positive or negative comments about movies (or switch)! .
Tweet media one
5
9
46
@lvwerra
Leandro von Werra
5 years
„With four parameters I can fit an elephant, and with 175bn I can make him write front-end code.“.- John von Neumann (2020).
0
2
44
@lvwerra
Leandro von Werra
1 year
Next week I'll be at NeurIPS! It's the first time I'll go to a big ML conference - any tips and tricks?. Also message me if you'd like to meet and grab a coffee!.
6
1
45