Hadas Orgad Profile
Hadas Orgad

@OrgadHadas

Followers
382
Following
468
Statuses
152

PhD student (Natural Language Processing) @ Technion, Israel, Interested in AI interpretability, robustness and safety

Joined April 2019
Don't wanna be here? Send us removal request.
@OrgadHadas
Hadas Orgad
3 months
Our code for "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations" is now available! Utilize our implementation to probe the internal representations of LLMs and explore the insights found in our work. Check it out here:
@OrgadHadas
Hadas Orgad
4 months
Hallucinations are a subject of much interest, but how much do we know about them? In our new paper, we found that the internals of LLMs contain far more information about truthfulness than we knew! ๐Ÿงต Project page >> Arxiv >>
Tweet media one
2
10
39
@OrgadHadas
Hadas Orgad
5 days
@hsvgbkhgbv Thanks, much appreciated!
0
0
0
@OrgadHadas
Hadas Orgad
7 days
This was the result of a fun collaboration with @michael_toker @zorikgekhman @roireichart , Idan Szpektor (Google), Hadas Kotek (Apple) and @boknilev
0
1
4
@OrgadHadas
Hadas Orgad
3 months
RT @jbhuang0604: Me to students: "Writing a research paper sharpens your mind, trains you to think critically, and helps you communicate efโ€ฆ
0
35
0
@OrgadHadas
Hadas Orgad
3 months
At #EMNLP2024 and interested in LLM hallucinations? Make sure you catch Zorik's talk tomorrow: "Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?"
@zorikgekhman
Zorik Gekhman
3 months
At #EMNLP2024? Join me in the Language Modeling 1 session tomorrow, 11:00-11:15, for a talk on how fine-tuning with new knowledge impacts hallucinations.
0
1
8
@OrgadHadas
Hadas Orgad
3 months
I highly recommend catching Joe for a conversation if you're at #EMNLP2024 and hearing about his work on atomic inference
@_joestacey_
Joe Stacey
3 months
Flying out tomorrow to #EMNLP2024 ๐Ÿ˜ please come and say hi sometime! Iโ€™d love to chat and hear about your research. I look bit like my profile pic except older now ๐Ÿ˜‚๐Ÿ˜‚
0
1
6
@OrgadHadas
Hadas Orgad
3 months
@abuchanlife I hope that the community will find this implementation useful as a starting point for exploring the truthfulness encoding in internal LLM representations
0
0
0
@OrgadHadas
Hadas Orgad
3 months
RT @RoyiRassin: How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure baseโ€ฆ
0
30
0
@OrgadHadas
Hadas Orgad
3 months
RT @mor_ventura95: ๐ŸŽ‰ Our paper "๐๐š๐ฏ๐ข๐ ๐š๐ญ๐ข๐ง๐  ๐‚๐ฎ๐ฅ๐ญ๐ฎ๐ซ๐š๐ฅ ๐‚๐ก๐š๐ฌ๐ฆ๐ฌ: ๐„๐ฑ๐ฉ๐ฅ๐จ๐ซ๐ข๐ง๐  ๐š๐ง๐ ๐”๐ง๐ฅ๐จ๐œ๐ค๐ข๐ง๐  ๐ญ๐ก๐ž ๐‚๐ฎ๐ฅ๐ญ๐ฎ๐ซ๐š๐ฅ ๐๐Ž๐• ๐จ๐Ÿ ๐“๐ž๐ฑ๐ญ-๐ญ๐จ-๐ˆ๐ฆ๐š๐ ๐ž ๐Œ๐จ๐๐ž๐ฅ๐ฌ" is accepted tโ€ฆ
0
12
0
@OrgadHadas
Hadas Orgad
3 months
RT @mariusmosbach: I'll be at #EMNLP2024 next week to present our work on the impact of interpretability and analysis research on NLP. If yโ€ฆ
0
11
0
@OrgadHadas
Hadas Orgad
3 months
RT @BenHagag20: Hereโ€™s a thread on Responsible NLP papers. ACL was a couple of months ago, but these @aclmeeting papers on fairness and reโ€ฆ
0
2
0
@OrgadHadas
Hadas Orgad
3 months
RT @amuuueller: I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improveโ€ฆ
0
192
0
@OrgadHadas
Hadas Orgad
3 months
@VentureBeat See full paper thread here
@OrgadHadas
Hadas Orgad
4 months
Hallucinations are a subject of much interest, but how much do we know about them? In our new paper, we found that the internals of LLMs contain far more information about truthfulness than we knew! ๐Ÿงต Project page >> Arxiv >>
Tweet media one
0
0
0
@OrgadHadas
Hadas Orgad
4 months
Excited that @dair_ai liked our recent paper, "LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations"
@dair_ai
DAIR.AI
4 months
The Top ML Papers of the Week (Sep 30 - Oct 6): - Movie Gen - RATIONALYST - An Analysis of o1-preview - Were RNNs All We Needed? - LLMs Know More Than They Show - Not All LLM Reasoners Are Created Equal Read on for more:
1
2
10
@OrgadHadas
Hadas Orgad
4 months
Thank you, @omarsar0, for this review of our paper!
@omarsar0
elvis
4 months
LLMs Know More Than They Show We know very little about how and why LLMs "hallucinate" but it's an important topic nonetheless. This new paper finds that the "truthfulness" information in LLMs is concentrated in specific tokens. This insight can help enhance error detection performance and further mitigate some of these issues. They also claim that internal representations can be used to predict the types of errors the LLMs are likely to make. Interesting quote: "We reveal a discrepancy between LLMsโ€™ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one."
Tweet media one
0
0
7
@OrgadHadas
Hadas Orgad
4 months
RT @RoyiRassin: ๐Ÿšจ๐Ÿšจ๐Ÿšจcool new paper alert ๐Ÿ˜Ž We study the ability of text-to-image models (SD) to learn copyrighted concepts and find the Imitโ€ฆ
0
13
0
@OrgadHadas
Hadas Orgad
4 months
@SumoTail We did not touch CoT in this work. Interesting subject though, errors in CoT have their own specific complications.
0
0
0
@OrgadHadas
Hadas Orgad
4 months
@junteng88716710 Thanks, I enjoyed reading your paper! I'll also cite it in a revision of our work. It's encouraging to see that diversifying the training tasks improves the robustness of error detectors. I wonder if we can characterize skill-specific features with these types of experiments.
1
0
1