AlwaysUhhJustin Profile Banner
Justin Profile
Justin

@AlwaysUhhJustin

Followers
665
Following
39K
Statuses
10K

Founder/CEO of (funded) AI LegalTech startup. Interested in lots of stuff. Most of my followers are bots. Semi private so I can talk about what I want.

Dallas / Austin TX
Joined June 2011
Don't wanna be here? Send us removal request.
@AlwaysUhhJustin
Justin
2 years
Want to make real progress? Here is a high-level blueprint for 18 big ideas that would substantially improve America, arguably none of which fall into either political party's partisan platform.
Tweet media one
2
0
22
@AlwaysUhhJustin
Justin
16 hours
RT @thomasahle: I find Meta’s original approach to hallucinations delightfully counter intuitive: 1. Extract factoid from training dataset…
0
21
0
@AlwaysUhhJustin
Justin
16 hours
@heyshrutimishra First 1-line test: Massive fail
Tweet media one
0
0
3
@AlwaysUhhJustin
Justin
16 hours
We have a few different evals, but the easy one for me to quickly test is basically: Here is complicated fact pattern. Tell me all the relevant cases. We've already done the analysis to bring out the most relevant cases and then assign "points" for each case correctly identified by a model. The current frontier models (GPT-4o, o1, Gemini 2.0 Flash, Gemini 2.0 Pro, Sonnet 3.5) do reasonably well for Federal cases, okay for state-level cases in big states with newsworthy issues, and pretty poorly on smaller state-level issues. Ballpark, I'd say the group above gets a 30%-50% score and Chocolate is probably 60%. But I only tested 1 question of 1 eval to get a directional sense.
1
0
0
@AlwaysUhhJustin
Justin
17 hours
@iruletheworldmo I do think chocolate probably *is* Grok 3
@AlwaysUhhJustin
Justin
17 hours
Rumor is that Grok 3 is "Chocolate" on Chatbot Arena. Elon Musk said he trained it with law (it was ambiguous whether that meant *all* laws from most countries, all US statutes, all US cases and statutes, etc.) Anyway, I tried a complicated state law eval we have. Got Chocolate eventually. It did okay. Better than anything else in market. Not amazing. The overall analysis was great. I asked for specific cases/statutes. The statutes were consistently right. The cases given were a mix of good/on-point, real but irrelevant, and 1 hallucinated case. It missed many of the best cases. Most of the top models analyze pretty well (Grok was a bit better), miss a couple key statutes (Grok outperformed), and list ~2 real and useful cases (Grok slightly better). Basically, I wouldn't be surprised if this IS or IS NOT Grok 3. I'd bet 70/30 Yes it is. Nothing else has this much pretraining data on law. (It kept crashing so I shot a Loom and regenerated a couple times; here is the case summary at the bottom)
Tweet media one
0
0
7
@AlwaysUhhJustin
Justin
17 hours
Does it look like scaling pretraining data is basically not going to work anymore? And the value is all on RL? And if the latter, any indication of how much more scale is realistically possible before you need more compute? Eg is o3 basically like GPT-2 in scale or is it like > GPT-4?
0
0
1
@AlwaysUhhJustin
Justin
17 hours
RT @polynoamial: I’m excited to see academics pursuing radically different approaches to scaling inference compute. RL on CoT is one way, b…
0
51
0
@AlwaysUhhJustin
Justin
17 hours
@GaryMarcus @elonmusk You just mean that the US won't sign the Paris Treaty unless those statements are removed, right? Like, everyone can still comment on and talk about it on the platform Musk owns?
0
0
0
@AlwaysUhhJustin
Justin
17 hours
This is interesting. I just see it as very intuitive: If AI is at 70 IQ, it can't really do much of value. If it is 100 IQ, maybe there are some great uses. If it is 180 IQ, wow everything is going to be done by AI. And at 300 . . . .and 3,000 . . . But progress from fly to human was a lot more, in % terms, than 70 to 180 IQ.
0
0
2
@AlwaysUhhJustin
Justin
18 hours
@jaxgriot @slow_developer My CTO lives in SF and I'm out there a lot.. I also trade DMs quite a bit with people at the labs. They're pretty secretive overall but 2023/2024 people actually believed OpenAI had achieved AGI internally and it's just so ridiculous.
0
0
2
@AlwaysUhhJustin
Justin
1 day
@slow_developer Yeah I remember the video. But then they didn't finish the pretraining until around Jan1. Then everyone was like "it's Elon so they'll do post training in like 2 weeks!" But realistically I'd expect EOM Feb it's ready to go. Maybe more delays with adding a Reasoner.
0
0
2
@AlwaysUhhJustin
Justin
1 day
@slow_developer Oh I think xAI just needs to finish post-training. It took OpenAI like 7 months for GPT4. xAI started around Jan1. I think they'll finish around EOM. They might wait a bit longer to add a Reasoner before the release.
1
0
0
@AlwaysUhhJustin
Justin
5 days
RT @VisionaryxAI: Deep Research : Google vs OpenAI
Tweet media one
0
1
0
@AlwaysUhhJustin
Justin
5 days
@IamMr_das What is best evidence it will solve the RH?
0
0
1
@AlwaysUhhJustin
Justin
5 days
@OfficialLoganK Is Pro likely to add on Reasoning? We're excited to br adding on Flash in several areas. Looking forward to adding Flash Thinking when available.
0
0
0
@AlwaysUhhJustin
Justin
7 days
@kimmonismus This is substantially unlikely. When these labs have a product that is ready to be released, they're releasing them. They just had to do a shitload of post-training on Sonnet 3.5 (that is part of why it is so good).
0
0
2