Hadas Orgad @OrgadHadas profile

Hadas Orgad

@OrgadHadas

Followers

382

Following

468

Statuses

152

PhD student (Natural Language Processing) @ Technion, Israel, Interested in AI interpretability, robustness and safety

Joined April 2019

Don't wanna be here? Send us removal request.

Hadas Orgad

@OrgadHadas

3 months

Our code for "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations" is now available! Utilize our implementation to probe the internal representations of LLMs and explore the insights found in our work. Check it out here:

Hadas Orgad

@OrgadHadas

4 months

Hallucinations are a subject of much interest, but how much do we know about them? In our new paper, we found that the internals of LLMs contain far more information about truthfulness than we knew! 🧵 Project page >> Arxiv >>

2

10

39

Hadas Orgad

@OrgadHadas

5 days

@hsvgbkhgbv Thanks, much appreciated!

0

Hadas Orgad

@OrgadHadas

7 days

This was the result of a fun collaboration with @michael_toker @zorikgekhman @roireichart , Idan Szpektor (Google), Hadas Kotek (Apple) and @boknilev

0

1

4

Hadas Orgad

@OrgadHadas

3 months

RT @jbhuang0604: Me to students: "Writing a research paper sharpens your mind, trains you to think critically, and helps you communicate ef…

0

35

0

Hadas Orgad

@OrgadHadas

3 months

At #EMNLP2024 and interested in LLM hallucinations? Make sure you catch Zorik's talk tomorrow: "Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?"

Zorik Gekhman

@zorikgekhman

3 months

At #EMNLP2024? Join me in the Language Modeling 1 session tomorrow, 11:00-11:15, for a talk on how fine-tuning with new knowledge impacts hallucinations.

0

1

8

Hadas Orgad

@OrgadHadas

3 months

I highly recommend catching Joe for a conversation if you're at #EMNLP2024 and hearing about his work on atomic inference

Joe Stacey

@_joestacey_

3 months

Flying out tomorrow to #EMNLP2024 😁 please come and say hi sometime! I’d love to chat and hear about your research. I look bit like my profile pic except older now 😂😂

0

1

6

Hadas Orgad

@OrgadHadas

3 months

@abuchanlife I hope that the community will find this implementation useful as a starting point for exploring the truthfulness encoding in internal LLM representations

0

Hadas Orgad

@OrgadHadas

3 months

RT @RoyiRassin: How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure base…

0

30

0

Hadas Orgad

@OrgadHadas

3 months

RT @mor_ventura95: 🎉 Our paper "𝐍𝐚𝐯𝐢𝐠𝐚𝐭𝐢𝐧𝐠 𝐂𝐮𝐥𝐭𝐮𝐫𝐚𝐥 𝐂𝐡𝐚𝐬𝐦𝐬: 𝐄𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐚𝐧𝐝 𝐔𝐧𝐥𝐨𝐜𝐤𝐢𝐧𝐠 𝐭𝐡𝐞 𝐂𝐮𝐥𝐭𝐮𝐫𝐚𝐥 𝐏𝐎𝐕 𝐨𝐟 𝐓𝐞𝐱𝐭-𝐭𝐨-𝐈𝐦𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬" is accepted t…

0

12

0

Hadas Orgad

@OrgadHadas

3 months

RT @mariusmosbach: I'll be at #EMNLP2024 next week to present our work on the impact of interpretability and analysis research on NLP. If y…

0

11

0

Hadas Orgad

@OrgadHadas

3 months

RT @BenHagag20: Here’s a thread on Responsible NLP papers. ACL was a couple of months ago, but these @aclmeeting papers on fairness and re…

0

2

0

Hadas Orgad

@OrgadHadas

3 months

RT @amuuueller: I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improve…

0

192

0

Hadas Orgad

@OrgadHadas

3 months

@VentureBeat See full paper thread here

Hadas Orgad

@OrgadHadas

4 months

Hallucinations are a subject of much interest, but how much do we know about them? In our new paper, we found that the internals of LLMs contain far more information about truthfulness than we knew! 🧵 Project page >> Arxiv >>

0

Hadas Orgad

@OrgadHadas

4 months

Excited that @dair_ai liked our recent paper, "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations"

DAIR.AI

@dair_ai

4 months

The Top ML Papers of the Week (Sep 30 - Oct 6): - Movie Gen - RATIONALYST - An Analysis of o1-preview - Were RNNs All We Needed? - LLMs Know More Than They Show - Not All LLM Reasoners Are Created Equal Read on for more:

1

2

10

Hadas Orgad

@OrgadHadas

4 months

Thank you, @omarsar0, for this review of our paper!

elvis

@omarsar0

4 months

LLMs Know More Than They Show We know very little about how and why LLMs "hallucinate" but it's an important topic nonetheless. This new paper finds that the "truthfulness" information in LLMs is concentrated in specific tokens. This insight can help enhance error detection performance and further mitigate some of these issues. They also claim that internal representations can be used to predict the types of errors the LLMs are likely to make. Interesting quote: "We reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one."

0

7

Hadas Orgad

@OrgadHadas

4 months

RT @RoyiRassin: 🚨🚨🚨cool new paper alert 😎 We study the ability of text-to-image models (SD) to learn copyrighted concepts and find the Imit…

0

13

0

Hadas Orgad

@OrgadHadas

4 months

@SumoTail We did not touch CoT in this work. Interesting subject though, errors in CoT have their own specific complications.

0

Hadas Orgad

@OrgadHadas

4 months

@junteng88716710 Thanks, I enjoyed reading your paper! I'll also cite it in a revision of our work. It's encouraging to see that diversifying the training tasks improves the robustness of error detectors. I wonder if we can characterize skill-specific features with these types of experiments.

1

0

1