![Hadas Orgad Profile](https://pbs.twimg.com/profile_images/1529793011554603009/z4xfkv9W_x96.jpg)
Hadas Orgad
@OrgadHadas
Followers
382
Following
468
Statuses
152
PhD student (Natural Language Processing) @ Technion, Israel, Interested in AI interpretability, robustness and safety
Joined April 2019
Our code for "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations" is now available! Utilize our implementation to probe the internal representations of LLMs and explore the insights found in our work. Check it out here:
Hallucinations are a subject of much interest, but how much do we know about them? In our new paper, we found that the internals of LLMs contain far more information about truthfulness than we knew! ๐งต Project page >> Arxiv >>
2
10
39
This was the result of a fun collaboration with @michael_toker @zorikgekhman @roireichart , Idan Szpektor (Google), Hadas Kotek (Apple) and @boknilev
0
1
4
RT @jbhuang0604: Me to students: "Writing a research paper sharpens your mind, trains you to think critically, and helps you communicate efโฆ
0
35
0
At #EMNLP2024 and interested in LLM hallucinations? Make sure you catch Zorik's talk tomorrow: "Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?"
At #EMNLP2024? Join me in the Language Modeling 1 session tomorrow, 11:00-11:15, for a talk on how fine-tuning with new knowledge impacts hallucinations.
0
1
8
I highly recommend catching Joe for a conversation if you're at #EMNLP2024 and hearing about his work on atomic inference
Flying out tomorrow to #EMNLP2024 ๐ please come and say hi sometime! Iโd love to chat and hear about your research. I look bit like my profile pic except older now ๐๐
0
1
6
@abuchanlife I hope that the community will find this implementation useful as a starting point for exploring the truthfulness encoding in internal LLM representations
0
0
0
RT @RoyiRassin: How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure baseโฆ
0
30
0
RT @mor_ventura95: ๐ Our paper "๐๐๐ฏ๐ข๐ ๐๐ญ๐ข๐ง๐ ๐๐ฎ๐ฅ๐ญ๐ฎ๐ซ๐๐ฅ ๐๐ก๐๐ฌ๐ฆ๐ฌ: ๐๐ฑ๐ฉ๐ฅ๐จ๐ซ๐ข๐ง๐ ๐๐ง๐ ๐๐ง๐ฅ๐จ๐๐ค๐ข๐ง๐ ๐ญ๐ก๐ ๐๐ฎ๐ฅ๐ญ๐ฎ๐ซ๐๐ฅ ๐๐๐ ๐จ๐ ๐๐๐ฑ๐ญ-๐ญ๐จ-๐๐ฆ๐๐ ๐ ๐๐จ๐๐๐ฅ๐ฌ" is accepted tโฆ
0
12
0
RT @mariusmosbach: I'll be at #EMNLP2024 next week to present our work on the impact of interpretability and analysis research on NLP. If yโฆ
0
11
0
RT @BenHagag20: Hereโs a thread on Responsible NLP papers. ACL was a couple of months ago, but these @aclmeeting papers on fairness and reโฆ
0
2
0
RT @amuuueller: I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improveโฆ
0
192
0
@VentureBeat See full paper thread here
Hallucinations are a subject of much interest, but how much do we know about them? In our new paper, we found that the internals of LLMs contain far more information about truthfulness than we knew! ๐งต Project page >> Arxiv >>
0
0
0
Excited that @dair_ai liked our recent paper, "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations"
The Top ML Papers of the Week (Sep 30 - Oct 6): - Movie Gen - RATIONALYST - An Analysis of o1-preview - Were RNNs All We Needed? - LLMs Know More Than They Show - Not All LLM Reasoners Are Created Equal Read on for more:
1
2
10
Thank you, @omarsar0, for this review of our paper!
LLMs Know More Than They Show We know very little about how and why LLMs "hallucinate" but it's an important topic nonetheless. This new paper finds that the "truthfulness" information in LLMs is concentrated in specific tokens. This insight can help enhance error detection performance and further mitigate some of these issues. They also claim that internal representations can be used to predict the types of errors the LLMs are likely to make. Interesting quote: "We reveal a discrepancy between LLMsโ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one."
0
0
7
RT @RoyiRassin: ๐จ๐จ๐จcool new paper alert ๐ We study the ability of text-to-image models (SD) to learn copyrighted concepts and find the Imitโฆ
0
13
0
@SumoTail We did not touch CoT in this work. Interesting subject though, errors in CoT have their own specific complications.
0
0
0
@junteng88716710 Thanks, I enjoyed reading your paper! I'll also cite it in a revision of our work. It's encouraging to see that diversifying the training tasks improves the robustness of error detectors. I wonder if we can characterize skill-specific features with these types of experiments.
1
0
1