David Hershey @DavidSHershey profile

David Hershey

@DavidSHershey

Followers

941

Following

4K

Statuses

208

AI Generalist | Writer of https://t.co/l1jTizWyTv

Seattle, WA

Joined April 2017

Don't wanna be here? Send us removal request.

David Hershey

@DavidSHershey

2 years

🧵 LLM evaluation is broken right now. There are no good objective measures of the "quality" of a model, so everyone is flying blind. (1/n)

2

1

19

David Hershey

@DavidSHershey

4 months

New Sonnet, same hobby

1

David Hershey

@DavidSHershey

8 months

0

1

4

David Hershey

@DavidSHershey

9 months

Anthropic rules 🌉

Anthropic

@AnthropicAI

9 months

This week, we showed how altering internal "features" in our AI, Claude, could change its behavior. We found a feature that can make Claude focus intensely on the Golden Gate Bridge. Now, for a limited time, you can chat with Golden Gate Claude:

0

1

12

David Hershey

@DavidSHershey

9 months

RT @AnthropicAI: New Anthropic research paper: Scaling Monosemanticity. The first ever detailed look inside a leading large language model…

0

562

0

David Hershey

@DavidSHershey

9 months

RT @AnthropicAI: Introducing a new Team plan for Claude. Get increased usage for team members, easily manage users and billing, and tackle…

0

80

0

David Hershey

@DavidSHershey

1 year

RT @mgoblog: I never thought I'd actually get to write this

0

312

0

David Hershey

@DavidSHershey

1 year

@HamelHusain 🔥🔥🔥 your work is appreciated!

0

1

David Hershey

@DavidSHershey

1 year

RT @swyx: fascinating read on finetuning this am: a finetuned 7B model can beat GPT-4 on Magic the Gathering drafting but more importantl…

0

9

0

David Hershey

@DavidSHershey

1 year

@swyx Glad you enjoyed it! This OpenAI bill is the closest I've gotten yet to buying a 4090 for my home 😅

1

0

1

David Hershey

@DavidSHershey

1 year

How high do I have to get on hackernews to get my honorary engineer badge

1

0

9

David Hershey

@DavidSHershey

1 year

@HamelHusain @hugobowne Spent a lot of time fine-tuning models in the last few weeks, and oh boy did it feel like ML in all of the hard ways - mostly data work, lots of experiments to see what data was effective. Takeaway was it seems to depend on how important you think fine-tuning is going forward.

0

1

David Hershey

@DavidSHershey

1 year

What an awesome view into why training LLMs requires so much high-quality talent. "This level of perfection is like eight billion people copy[ing] the complete works of Shakespeare for the 14 billion years the universe has existed and not have a single person make a mistake!"

Adept

@AdeptAILabs

1 year

If your loss curves look sus, join the club! Giant LLM training runs are full of pitfalls. We learned the hard way. We wrote a deep dive for the community on silent data corruptions (SDCs). Problem and mitigations here:

0

1

David Hershey

@DavidSHershey

1 year

RT @jyotibansalsf: Excited to share that @Unusual_VC is opening the next round of Unusual Academy — a hands-on program to equip seed-stage…

0

5

0

David Hershey

@DavidSHershey

2 years

RT @MF_FOOM: mf trained a simple model to translate ada-002 embeddings back to text and found something interesting: sentence embeddings h…

0

167

0

David Hershey

@DavidSHershey

2 years

@MosaicML Awesome work! Really appreciate breaking out evaluation into more human-compatible categories; I think that's exactly what we need to be able to reason about new models.

0

David Hershey

@DavidSHershey

2 years

RT @MosaicML: How can the ML community measure LLM quality in a holistic and standardized manner? The Mosaic Model Gauntlet encompasses 34…

0

41

0

David Hershey

@DavidSHershey

2 years

@swyx @AdeptAILabs as well!

0

1

David Hershey

@DavidSHershey

2 years

RT @haroonchoudery: Last week, a paper talking about massive drops in GPT-4's performance went viral. - GPT-4's accuracy dropped from 97.6…

0

1

0

David Hershey

@DavidSHershey

2 years

RT @bio_bootloader: Introducing Mentat - an open source, GPT-4 powered coding assistant! Mentat runs in your command line, giving it the c…

0

154

0