goldshtn Profile Banner
Sasha Goldshtein Profile
Sasha Goldshtein

@goldshtn

Followers
4K
Following
124
Statuses
8K

Software Engineer at Google Research. I work on Gemini factuality. Opinions my own. He/him.

Israel
Joined March 2010
Don't wanna be here? Send us removal request.
@goldshtn
Sasha Goldshtein
18 days
@tagir_valeev Нулевый и неправый 😂😂😂
0
0
0
@goldshtn
Sasha Goldshtein
2 months
Yeah. When I was first learning C#, I remember writing an interface called IAlgorithm which was basically a function pointer, and then another IAlgorithmProvider which had a single implementation and… you see where I’m going.
@GrantSlatton
Grant Slatton
2 months
when you teach a kid the rules of chess, they'll start by making legal but basically random moves junior engineers are like this if they ever read a book about "design patterns" too early they'll just apply them randomly — singleton here, factory builder there, etc
0
0
2
@goldshtn
Sasha Goldshtein
2 months
RT @shubadubadub: How do we ensure humans can still effectively oversee increasingly powerful AI systems? In our blog, we argue that achiev…
0
18
0
@goldshtn
Sasha Goldshtein
2 months
@yoavgo Much less.
1
0
1
@goldshtn
Sasha Goldshtein
2 months
@yoavgo Help me help you ;)
1
0
0
@goldshtn
Sasha Goldshtein
2 months
@levelsio LOL. I took a bus to the bank and got the bank check and took a bus back home. Then it was in my lawyer’s wallet while he was eating brunch with his kids 😂
0
0
0
@goldshtn
Sasha Goldshtein
2 months
@yoavgo What would you call “huge”?
1
0
0
@goldshtn
Sasha Goldshtein
2 months
@ZuckermanRoy כן כן מסכים איתך לחלוטין שעדיף לחיות במדינות ליברליות 🫶
0
0
1
@goldshtn
Sasha Goldshtein
2 months
@ZuckermanRoy לא פי 2. סדר גודל של 400 אלף דולר לדעתי. אני חושב שאם משקללים יוקר מחיה וכו׳ זה כנראה ריאלית 30% יותר? פה אתה יודע יותר ממני.
0
0
1
@goldshtn
Sasha Goldshtein
2 months
@ZuckermanRoy כן, אני מסכים שגם חצי מיליון דולר זה חיים טובים. אגב לקונטקסט, בגאמפא בארץ השכר הכולל הוא מאוד גבוה, ולא מייצג את מה שקורה ברוב ההייטק. לדוגמה, לבל 5 בישראל יכול לקבל טוטאל קומפ של מיליון שקל בקלות. ואפשר לחיות ממש יפה מארץ מהשכר הזה 😇
2
0
1
@goldshtn
Sasha Goldshtein
2 months
Today we published FACTS Grounding, a benchmark and leaderboard for evaluating the factuality of LLMs when grounding to the input context. The leaderboard is on Kaggle and we plan to maintain it and track progress.
1
8
26
@goldshtn
Sasha Goldshtein
2 months
@guywiener אתה יכול לשלוח לי CV בפרטי? או לאותו יוזר בג׳ימייל.
0
0
0
@goldshtn
Sasha Goldshtein
2 months
Yoo-hoo ✨🏆
@ofermend
Ofer Mendelevitch
2 months
A big congrats to the Google Gemini team for the release of Gemini-2.0-Flash today - great quality metrics all around, including 1.3% hallucination rate on @vectara HHEM leaderboard
0
0
2
@goldshtn
Sasha Goldshtein
2 months
RT @dipanjand: Throughout this year, we have had a razor focus on improving the factual accuracy of Gemini models' responses in various sce…
0
24
0
@goldshtn
Sasha Goldshtein
2 months
And also, our new Gemini 2.0 Flash model is top of the Vectara hallucination leaderboard. This is a 2.5x drop in hallucination rate since our last release.
Tweet media one
0
0
22
@goldshtn
Sasha Goldshtein
2 months
Such a good model.
@lmarena_ai
lmarena.ai (formerly lmsys.org)
2 months
Breaking News from Chatbot Arena⚡ @GoogleDeepMind Gemini-2.0-Flash debuts at #3 Overall - a massive leap from Flash-002! Highlights (improvement from Flash-002): - Overall: #11#3 - Hard Prompts: #15#2 - Coding: #22#3 - Longer query: #8#1 - Overall style-controlled: #19#3 - Hard style-controlled: #25#2 The pace of improvement is absolutely astounding! Excited to see the new wave of applications powered by Flash. More analysis below👇
Tweet media one
0
0
4
@goldshtn
Sasha Goldshtein
2 months
We have made more improvements since the previous experimental model 🔥 honestly, it is hard to keep track even internally. But this is a really good model on many many things.
@lmarena_ai
lmarena.ai (formerly lmsys.org)
2 months
Big news on Chatbot Arena 🔥 The new @GoogleDeepMind model gemini-exp-1206 is crushing it, and the race is heating up. Google is back in the #1 spot 🏆overall and tied with O1 for the top coding model! Highlights (improvement since gemini-exp-1121 in parentheses) - First place overall (2->1) - Tied with GPT-4o-1120 after style control (4->1) - Tied with O1 on coding leaderboard (3->1) - First place on hard prompts (2->1) Keep it up @GoogleDeepMind! The rate of progress is crazy. For analysis and to test the model, see below 👇
Tweet media one
0
0
3
@goldshtn
Sasha Goldshtein
3 months
@MBlumenblat יו גם אני שמרתי שם! והבאתי את ההורים שלי לצימר באחד הסופ״שים שיהיה לי אוכל 😂😂
0
0
1