wjb_mattingly Profile Banner
William J.B. Mattingly Profile
William J.B. Mattingly

@wjb_mattingly

Followers
3K
Following
4K
Media
658
Statuses
4K

Digital Nomad · Historian · Data Scientist · NLP · Machine Learning Cultural Heritage Data Scientist @Yale Former @SIDataScience @huggingface Fellow 🤗

Fort Myers, FL
Joined May 2020
Don't wanna be here? Send us removal request.
@wjb_mattingly
William J.B. Mattingly
15 days
Want to do a full-finetune of Dots.OCR? I've got a fork working! It handles the conversion of data from PageXML (Transkribus) to Dots.OCR format for you!.(Link down below). The first models are already on @huggingface and working as expected. Still training them.
Tweet media one
5
5
58
@wjb_mattingly
William J.B. Mattingly
3 days
Throw back to OpenAI 5 years ago. This appeared my YouTube feed. By the way, @twominutepapers is a great channel to watch.
Tweet media one
0
0
3
@grok
Grok
10 days
The most fun image & video creation tool in the world is here. Try it for free in the Grok App.
0
123
978
@wjb_mattingly
William J.B. Mattingly
3 days
I think Claude Sonnet 4 is annoyed with itself.
Tweet media one
0
0
4
@wjb_mattingly
William J.B. Mattingly
3 days
Yessssss.
@NielsRogge
Niels Rogge
3 days
SAM 2 by @AIatMeta has finally been integrated into @huggingface Transformers! 🔥. It's a generalization of SAM 1 to video, allowing you to segment and track something you care about across a sequence of frames. SOTA performance, Apache 2.0 license
0
0
1
@wjb_mattingly
William J.B. Mattingly
4 days
RT @vllm_project: 🚀 Amazing community project!. vLLM CLI — a command-line tool for serving LLMs with vLLM:.✅ Interactive menu-driven UI & s….
0
186
0
@wjb_mattingly
William J.B. Mattingly
5 days
RT @Prince_Canuma: LFM2-VL is done✅. M3 max stats:.- Full precision (~250 tok/s) .- 4 bit quant (~530 tok/s)
0
40
0
@wjb_mattingly
William J.B. Mattingly
5 days
Nice!.
@Prince_Canuma
Prince Canuma
5 days
Just a tiny bump in URAM usage and we go brrr 🚀.
1
0
1
@wjb_mattingly
William J.B. Mattingly
6 days
Rofl!.
@VikParuchuri
Vik Paruchuri
7 days
I wanted to quantify exactly how insane each PDF is making me. So I made a site, WTF PDF, that scores how bad your PDFs are. My high score is 188 - I'm sure someone can beat it.
Tweet media one
0
0
1
@wjb_mattingly
William J.B. Mattingly
7 days
This also auto-generates bounding boxes and PageXML so you can train Dots.OCR =)
Tweet media one
1
1
1
@wjb_mattingly
William J.B. Mattingly
7 days
Found an incredible font package for medieval handwriting by Peter Baker. Synthetic medieval HTR is getting closer.
Tweet media one
2
0
4
@wjb_mattingly
William J.B. Mattingly
7 days
RT @osanseviero: Introducing Gemma 3 270M 🔥. 🤏A tiny model! Just 270 million parameters.🧠 Very strong instruction following.🤖 Fine-tune in….
0
334
0
@wjb_mattingly
William J.B. Mattingly
7 days
I'm working on a few finetunes for LFM2-VL for medieval texts. Any app developers interested in teaming up and building a prototype that would use these models to take pics of a manuscript and transcribe it/generate metadata about it? Also open to non-medieval stuff and general.
@LiquidAI_
Liquid AI
7 days
Two Weeks. $10K. No Excuses. Hack-01 is still live, and you’ve got until Aug 20, 2025 at 12 PM PST to ship your on-device AI build. ⚙️Tools: LFM2 + LEAP.💰Prizes: $10K every 2 weeks.📍Where: Discord (yes, you need to join). Build private, real-time AI on the edge — or just keep
Tweet media one
0
0
1
@wjb_mattingly
William J.B. Mattingly
7 days
This looks really interesting.
@xeophon_
Xeophon
7 days
After thinking about this problem for months, I am so happy to finally introduce DetailBench!. It answers a simple question: How good are current LLMs at finding small errors, when they are *not* explicitly asked to do so?. (Yes, the graph is right!)
Tweet media one
1
0
0
@wjb_mattingly
William J.B. Mattingly
7 days
Something I've realized over the last couple weeks with finetuning various VLMs is that we just need more data. Unfortunately, that takes a lot of time. That's why I'm returning to my synthetic HTR package I originally designed for medieval manuscripts. This will be packaged now
Tweet media one
1
1
12
@wjb_mattingly
William J.B. Mattingly
8 days
I've been getting asked training scripts when a new VLM drops. Instead of scripts, I'm going to start updating this new Python package. It's not fancy. It's for full finetunes. This was how I first trained Qwen 2 VL last year.
Tweet media one
0
0
13
@wjb_mattingly
William J.B. Mattingly
8 days
First finetune of LFM2-VL 1.6B now ready for testing. This model is at 10k steps while the 450m is on 40k. These are training on the same dataset. I'm curious if we will see 1.6B start to do better.
Tweet media one
1
0
0
@wjb_mattingly
William J.B. Mattingly
8 days
This is interesting! With difficult HTR/OCR tasks, LFM2-VL 450m seems to do a lot better than 1.6B. I dug in and this is what I found. The smaller model tends to just go for it and predict on the image, while the 1.6B model is a bit more restrained and explains why it can't.
Tweet media one
0
0
3
@wjb_mattingly
William J.B. Mattingly
8 days
@LiquidAI_ I will be posting my progress throughout the day.
0
0
0
@wjb_mattingly
William J.B. Mattingly
8 days
@LiquidAI_ 3) Speed and size. This is the real appeal of this model. It's tiny and fast. I see this potentially replacing TrOCR for some line-level transcription workflows. Its smaller, faster, and has a broader understanding of language (vibes only here).
Tweet media one
0
0
0
@wjb_mattingly
William J.B. Mattingly
8 days
@LiquidAI_ 3) Speed and size. This is the real appeal of this model. It's tiny and fast. I see this potentially replacing TrOCR for some line-level transcription workflows. Its smaller, faster, and has a broader understanding of language (vibes only here).
Tweet media one
0
0
0
@wjb_mattingly
William J.B. Mattingly
8 days
@LiquidAI_ 2) 2) Word level accuracy is clearly improving. This will only get better as the text decoder learns medieval languages better. This is where I believe future checkpoints will improve.
Tweet media one
0
0
0