William J.B. Mattingly @wjb_mattingly X Profile

William J.B. Mattingly

@wjb_mattingly

Followers

3K

Following

4K

Media

658

Statuses

4K

Digital Nomad · Historian · Data Scientist · NLP · Machine Learning Cultural Heritage Data Scientist @Yale Former @SIDataScience @huggingface Fellow 🤗

Fort Myers, FL

Joined May 2020

Don't wanna be here? Send us removal request.

William J.B. Mattingly

@wjb_mattingly

15 days

Want to do a full-finetune of Dots.OCR? I've got a fork working! It handles the conversion of data from PageXML (Transkribus) to Dots.OCR format for you!.(Link down below). The first models are already on @huggingface and working as expected. Still training them.

5

58

William J.B. Mattingly

@wjb_mattingly

3 days

Throw back to OpenAI 5 years ago. This appeared my YouTube feed. By the way, @twominutepapers is a great channel to watch.

0

3

Grok

@grok

10 days

The most fun image & video creation tool in the world is here. Try it for free in the Grok App.

0

123

978

William J.B. Mattingly

@wjb_mattingly

3 days

I think Claude Sonnet 4 is annoyed with itself.

0

4

William J.B. Mattingly

@wjb_mattingly

3 days

Yessssss.

Niels Rogge

@NielsRogge

3 days

SAM 2 by @AIatMeta has finally been integrated into @huggingface Transformers! 🔥. It's a generalization of SAM 1 to video, allowing you to segment and track something you care about across a sequence of frames. SOTA performance, Apache 2.0 license

0

1

William J.B. Mattingly

@wjb_mattingly

4 days

RT @vllm_project: 🚀 Amazing community project!. vLLM CLI — a command-line tool for serving LLMs with vLLM:.✅ Interactive menu-driven UI & s….

0

186

0

William J.B. Mattingly

@wjb_mattingly

5 days

RT @Prince_Canuma: LFM2-VL is done✅. M3 max stats:.- Full precision (~250 tok/s) .- 4 bit quant (~530 tok/s)

0

40

0

William J.B. Mattingly

@wjb_mattingly

5 days

Nice!.

Prince Canuma

@Prince_Canuma

5 days

Just a tiny bump in URAM usage and we go brrr 🚀.

1

0

1

William J.B. Mattingly

@wjb_mattingly

6 days

Rofl!.

Vik Paruchuri

@VikParuchuri

7 days

I wanted to quantify exactly how insane each PDF is making me. So I made a site, WTF PDF, that scores how bad your PDFs are. My high score is 188 - I'm sure someone can beat it.

0

1

William J.B. Mattingly

@wjb_mattingly

7 days

This also auto-generates bounding boxes and PageXML so you can train Dots.OCR =)

1

William J.B. Mattingly

@wjb_mattingly

7 days

Found an incredible font package for medieval handwriting by Peter Baker. Synthetic medieval HTR is getting closer.

2

0

4

William J.B. Mattingly

@wjb_mattingly

7 days

RT @osanseviero: Introducing Gemma 3 270M 🔥. 🤏A tiny model! Just 270 million parameters.🧠 Very strong instruction following.🤖 Fine-tune in….

0

334

0

William J.B. Mattingly

@wjb_mattingly

7 days

I'm working on a few finetunes for LFM2-VL for medieval texts. Any app developers interested in teaming up and building a prototype that would use these models to take pics of a manuscript and transcribe it/generate metadata about it? Also open to non-medieval stuff and general.

Liquid AI

@LiquidAI_

7 days

Two Weeks. $10K. No Excuses. Hack-01 is still live, and you’ve got until Aug 20, 2025 at 12 PM PST to ship your on-device AI build. ⚙️Tools: LFM2 + LEAP.💰Prizes: $10K every 2 weeks.📍Where: Discord (yes, you need to join). Build private, real-time AI on the edge — or just keep

0

1

William J.B. Mattingly

@wjb_mattingly

7 days

This looks really interesting.

Xeophon

@xeophon_

7 days

After thinking about this problem for months, I am so happy to finally introduce DetailBench!. It answers a simple question: How good are current LLMs at finding small errors, when they are *not* explicitly asked to do so?. (Yes, the graph is right!)

1

0

William J.B. Mattingly

@wjb_mattingly

7 days

Something I've realized over the last couple weeks with finetuning various VLMs is that we just need more data. Unfortunately, that takes a lot of time. That's why I'm returning to my synthetic HTR package I originally designed for medieval manuscripts. This will be packaged now

1

12

William J.B. Mattingly

@wjb_mattingly

8 days

I've been getting asked training scripts when a new VLM drops. Instead of scripts, I'm going to start updating this new Python package. It's not fancy. It's for full finetunes. This was how I first trained Qwen 2 VL last year.

0

13

William J.B. Mattingly

@wjb_mattingly

8 days

First finetune of LFM2-VL 1.6B now ready for testing. This model is at 10k steps while the 450m is on 40k. These are training on the same dataset. I'm curious if we will see 1.6B start to do better.

1

0

William J.B. Mattingly

@wjb_mattingly

8 days

This is interesting! With difficult HTR/OCR tasks, LFM2-VL 450m seems to do a lot better than 1.6B. I dug in and this is what I found. The smaller model tends to just go for it and predict on the image, while the 1.6B model is a bit more restrained and explains why it can't.

0

3

William J.B. Mattingly

@wjb_mattingly

8 days

@LiquidAI_ I will be posting my progress throughout the day.

0

William J.B. Mattingly

@wjb_mattingly

8 days

@LiquidAI_ 3) Speed and size. This is the real appeal of this model. It's tiny and fast. I see this potentially replacing TrOCR for some line-level transcription workflows. Its smaller, faster, and has a broader understanding of language (vibes only here).

0

William J.B. Mattingly

@wjb_mattingly

8 days

@LiquidAI_ 3) Speed and size. This is the real appeal of this model. It's tiny and fast. I see this potentially replacing TrOCR for some line-level transcription workflows. Its smaller, faster, and has a broader understanding of language (vibes only here).

0

William J.B. Mattingly

@wjb_mattingly

8 days

@LiquidAI_ 2) 2) Word level accuracy is clearly improving. This will only get better as the text decoder learns medieval languages better. This is where I believe future checkpoints will improve.

0