Sasha Goldshtein @goldshtn profile

Sasha Goldshtein

@goldshtn

Followers

4K

Following

124

Statuses

8K

Software Engineer at Google Research. I work on Gemini factuality. Opinions my own. He/him.

Israel

Joined March 2010

Don't wanna be here? Send us removal request.

Sasha Goldshtein

@goldshtn

18 days

@tagir_valeev Нулевый и неправый 😂😂😂

0

Sasha Goldshtein

@goldshtn

2 months

Yeah. When I was first learning C#, I remember writing an interface called IAlgorithm which was basically a function pointer, and then another IAlgorithmProvider which had a single implementation and… you see where I’m going.

Grant Slatton

@GrantSlatton

2 months

when you teach a kid the rules of chess, they'll start by making legal but basically random moves junior engineers are like this if they ever read a book about "design patterns" too early they'll just apply them randomly — singleton here, factory builder there, etc

0

2

Sasha Goldshtein

@goldshtn

2 months

RT @shubadubadub: How do we ensure humans can still effectively oversee increasingly powerful AI systems? In our blog, we argue that achiev…

0

18

0

Sasha Goldshtein

@goldshtn

2 months

@yoavgo Much less.

1

0

1

Sasha Goldshtein

@goldshtn

2 months

@yoavgo Help me help you ;)

1

0

Sasha Goldshtein

@goldshtn

2 months

@levelsio LOL. I took a bus to the bank and got the bank check and took a bus back home. Then it was in my lawyer’s wallet while he was eating brunch with his kids 😂

0

Sasha Goldshtein

@goldshtn

2 months

@yoavgo What would you call “huge”?

1

0

Sasha Goldshtein

@goldshtn

2 months

@ZuckermanRoy כן כן מסכים איתך לחלוטין שעדיף לחיות במדינות ליברליות 🫶

0

1

Sasha Goldshtein

@goldshtn

2 months

@ZuckermanRoy לא פי 2. סדר גודל של 400 אלף דולר לדעתי. אני חושב שאם משקללים יוקר מחיה וכו׳ זה כנראה ריאלית 30% יותר? פה אתה יודע יותר ממני.

0

1

Sasha Goldshtein

@goldshtn

2 months

@ZuckermanRoy כן, אני מסכים שגם חצי מיליון דולר זה חיים טובים. אגב לקונטקסט, בגאמפא בארץ השכר הכולל הוא מאוד גבוה, ולא מייצג את מה שקורה ברוב ההייטק. לדוגמה, לבל 5 בישראל יכול לקבל טוטאל קומפ של מיליון שקל בקלות. ואפשר לחיות ממש יפה מארץ מהשכר הזה 😇

2

0

1

Sasha Goldshtein

@goldshtn

2 months

Today we published FACTS Grounding, a benchmark and leaderboard for evaluating the factuality of LLMs when grounding to the input context. The leaderboard is on Kaggle and we plan to maintain it and track progress.

1

8

26

Sasha Goldshtein

@goldshtn

2 months

@guywiener אתה יכול לשלוח לי CV בפרטי? או לאותו יוזר בג׳ימייל.

0

Sasha Goldshtein

@goldshtn

2 months

Yoo-hoo ✨🏆

Ofer Mendelevitch

@ofermend

2 months

A big congrats to the Google Gemini team for the release of Gemini-2.0-Flash today - great quality metrics all around, including 1.3% hallucination rate on @vectara HHEM leaderboard

0

2

Sasha Goldshtein

@goldshtn

2 months

RT @dipanjand: Throughout this year, we have had a razor focus on improving the factual accuracy of Gemini models' responses in various sce…

0

24

0

Sasha Goldshtein

@goldshtn

2 months

And also, our new Gemini 2.0 Flash model is top of the Vectara hallucination leaderboard. This is a 2.5x drop in hallucination rate since our last release.

0

22

Sasha Goldshtein

@goldshtn

2 months

Such a good model.

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months

Breaking News from Chatbot Arena⚡ @GoogleDeepMind Gemini-2.0-Flash debuts at #3 Overall - a massive leap from Flash-002! Highlights (improvement from Flash-002): - Overall: #11 → #3 - Hard Prompts: #15 → #2 - Coding: #22 → #3 - Longer query: #8 → #1 - Overall style-controlled: #19 → #3 - Hard style-controlled: #25 → #2 The pace of improvement is absolutely astounding! Excited to see the new wave of applications powered by Flash. More analysis below👇

0

4

Sasha Goldshtein

@goldshtn

2 months

We have made more improvements since the previous experimental model 🔥 honestly, it is hard to keep track even internally. But this is a really good model on many many things.

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months

Big news on Chatbot Arena 🔥 The new @GoogleDeepMind model gemini-exp-1206 is crushing it, and the race is heating up. Google is back in the #1 spot 🏆overall and tied with O1 for the top coding model! Highlights (improvement since gemini-exp-1121 in parentheses) - First place overall (2->1) - Tied with GPT-4o-1120 after style control (4->1) - Tied with O1 on coding leaderboard (3->1) - First place on hard prompts (2->1) Keep it up @GoogleDeepMind! The rate of progress is crazy. For analysis and to test the model, see below 👇

0

3

Sasha Goldshtein

@goldshtn

3 months

@MBlumenblat יו גם אני שמרתי שם! והבאתי את ההורים שלי לצימר באחד הסופ״שים שיהיה לי אוכל 😂😂

0

1