Justin @AlwaysUhhJustin profile

Justin

@AlwaysUhhJustin

Followers

665

Following

39K

Statuses

10K

Founder/CEO of (funded) AI LegalTech startup. Interested in lots of stuff. Most of my followers are bots. Semi private so I can talk about what I want.

Dallas / Austin TX

Joined June 2011

Don't wanna be here? Send us removal request.

Justin

@AlwaysUhhJustin

2 years

Want to make real progress? Here is a high-level blueprint for 18 big ideas that would substantially improve America, arguably none of which fall into either political party's partisan platform.

2

0

22

Justin

@AlwaysUhhJustin

16 hours

RT @thomasahle: I find Meta’s original approach to hallucinations delightfully counter intuitive: 1. Extract factoid from training dataset…

0

21

0

Justin

@AlwaysUhhJustin

16 hours

@heyshrutimishra First 1-line test: Massive fail

0

3

Justin

@AlwaysUhhJustin

16 hours

We have a few different evals, but the easy one for me to quickly test is basically: Here is complicated fact pattern. Tell me all the relevant cases. We've already done the analysis to bring out the most relevant cases and then assign "points" for each case correctly identified by a model. The current frontier models (GPT-4o, o1, Gemini 2.0 Flash, Gemini 2.0 Pro, Sonnet 3.5) do reasonably well for Federal cases, okay for state-level cases in big states with newsworthy issues, and pretty poorly on smaller state-level issues. Ballpark, I'd say the group above gets a 30%-50% score and Chocolate is probably 60%. But I only tested 1 question of 1 eval to get a directional sense.

1

0

Justin

@AlwaysUhhJustin

17 hours

@iruletheworldmo I do think chocolate probably *is* Grok 3

Justin

@AlwaysUhhJustin

17 hours

Rumor is that Grok 3 is "Chocolate" on Chatbot Arena. Elon Musk said he trained it with law (it was ambiguous whether that meant *all* laws from most countries, all US statutes, all US cases and statutes, etc.) Anyway, I tried a complicated state law eval we have. Got Chocolate eventually. It did okay. Better than anything else in market. Not amazing. The overall analysis was great. I asked for specific cases/statutes. The statutes were consistently right. The cases given were a mix of good/on-point, real but irrelevant, and 1 hallucinated case. It missed many of the best cases. Most of the top models analyze pretty well (Grok was a bit better), miss a couple key statutes (Grok outperformed), and list ~2 real and useful cases (Grok slightly better). Basically, I wouldn't be surprised if this IS or IS NOT Grok 3. I'd bet 70/30 Yes it is. Nothing else has this much pretraining data on law. (It kept crashing so I shot a Loom and regenerated a couple times; here is the case summary at the bottom)

0

7

Justin

@AlwaysUhhJustin

17 hours

Does it look like scaling pretraining data is basically not going to work anymore? And the value is all on RL? And if the latter, any indication of how much more scale is realistically possible before you need more compute? Eg is o3 basically like GPT-2 in scale or is it like > GPT-4?

0

1

Justin

@AlwaysUhhJustin

17 hours

RT @polynoamial: I’m excited to see academics pursuing radically different approaches to scaling inference compute. RL on CoT is one way, b…

0

51

0

Justin

@AlwaysUhhJustin

17 hours

@GaryMarcus @elonmusk You just mean that the US won't sign the Paris Treaty unless those statements are removed, right? Like, everyone can still comment on and talk about it on the platform Musk owns?

0

Justin

@AlwaysUhhJustin

17 hours

This is interesting. I just see it as very intuitive: If AI is at 70 IQ, it can't really do much of value. If it is 100 IQ, maybe there are some great uses. If it is 180 IQ, wow everything is going to be done by AI. And at 300 . . . .and 3,000 . . . But progress from fly to human was a lot more, in % terms, than 70 to 180 IQ.

0

2

Justin

@AlwaysUhhJustin

18 hours

@jaxgriot @slow_developer My CTO lives in SF and I'm out there a lot.. I also trade DMs quite a bit with people at the labs. They're pretty secretive overall but 2023/2024 people actually believed OpenAI had achieved AGI internally and it's just so ridiculous.

0

2

Justin

@AlwaysUhhJustin

1 day

@slow_developer Yeah I remember the video. But then they didn't finish the pretraining until around Jan1. Then everyone was like "it's Elon so they'll do post training in like 2 weeks!" But realistically I'd expect EOM Feb it's ready to go. Maybe more delays with adding a Reasoner.

0

2

Justin

@AlwaysUhhJustin

1 day

@slow_developer Oh I think xAI just needs to finish post-training. It took OpenAI like 7 months for GPT4. xAI started around Jan1. I think they'll finish around EOM. They might wait a bit longer to add a Reasoner before the release.

1

0

Justin

@AlwaysUhhJustin

5 days

RT @VisionaryxAI: Deep Research : Google vs OpenAI

0

1

0

Justin

@AlwaysUhhJustin

5 days

@IamMr_das What is best evidence it will solve the RH?

0

1

Justin

@AlwaysUhhJustin

5 days

@OfficialLoganK Is Pro likely to add on Reasoning? We're excited to br adding on Flash in several areas. Looking forward to adding Flash Thinking when available.

0

Justin

@AlwaysUhhJustin

7 days

@kimmonismus This is substantially unlikely. When these labs have a product that is ready to be released, they're releasing them. They just had to do a shitload of post-training on Sonnet 3.5 (that is part of why it is so good).

0

2