Jie Huang Profile Banner
Jie Huang Profile
Jie Huang

@jefffhj

Followers
4,998
Following
607
Media
56
Statuses
425

Building intelligence @xAI . PhD from UIUC CS

Joined July 2017
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@jefffhj
Jie Huang
9 months
My research often begins with curiosity-driven questions: 1. Are LLMs Leaking Your Personal Information? () 2. Are LLMs Really Able to Reason? () 3. Can LLMs Self-Correct Their Reasoning? () 4. What is Missing
17
96
595
@jefffhj
Jie Huang
2 months
Life Update: joined @xai recently and witnessed the release of Grok-2. Amazed by the passion of the team and how fast we are moving. Join us at
9
18
459
@jefffhj
Jie Huang
2 years
Certainly, this paper deserves more attention.
Tweet media one
@OpenAI
OpenAI
2 years
We took ChatGPT offline Monday to fix a bug in an open source library that allowed some users to see titles from other users’ chat history. Our investigation has also found that 1.2% of ChatGPT Plus users might have had personal data revealed to another user. 1/2
859
1K
8K
4
78
405
@jefffhj
Jie Huang
1 year
Can LLMs Self-Correct Their Reasoning? Recent studies (self-refine, self-critique, etc.) suggest LLMs possess a great ability to self-correct their responses. However, our research indicates LLMs cannot self-correct their reasoning intrinsically. [1/n]
Tweet media one
5
69
391
@jefffhj
Jie Huang
2 years
New survey paper! We discuss "reasoning" in large language models. Reasoning is a fundamental aspect of human intelligence. We provide an overview of the current state of knowledge on reasoning in LLMs. Survey: Paperlist:
Tweet media one
8
75
340
@jefffhj
Jie Huang
1 year
Interesting papers that share many findings similar to our recent work (), in which we argue that "Large Language Models Cannot Self-Correct Reasoning Yet". I'm happy (as a honest researcher) and sad (as an AGI enthusiast) to see our conclusions confirmed
Tweet media one
@rao2z
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
1 year
Can LLMs really self-critique (and iteratively improve) their solutions, as claimed in the literature?🤔 Two new papers from our group investigate (and call into question) these claims in reasoning () and planning () tasks.🧵 1/
Tweet media one
Tweet media two
22
225
954
7
63
338
@jefffhj
Jie Huang
1 year
I'm on the job market and open to all sorts of fun stuff! Feel free to reach out and excite me 😀 Short intro: I work on various aspects of LLMs, including factuality, reasoning, safety, RAG, etc. (). I'm a time manipulator who handles both research &
23
30
324
@jefffhj
Jie Huang
8 months
Amazed by how fast Groq is? Want to make your LLM inference even faster? We propose Cascade Speculative Drafting, a speculative execution algorithm that comprises multiple draft models through cascades, achieving up to an 81% additional speedup over speculative decoding in our
Tweet media one
8
41
249
@jefffhj
Jie Huang
1 year
📢We are excited to introduce RAVEN, a retrieval-augmented encoder-decoder language model. Despite having substantially fewer parameters than top models, RAVEN delivers competitive results in knowledge-intensive tasks through in-context learning. 👉 🧵⬇️
Tweet media one
5
60
247
@jefffhj
Jie Huang
6 months
Successfully defended my thesis today. It's been an incredible journey, filled with both fun and challenges beyond what I could have imagined as an undergraduate 4 years ago. Haven't decided on my next step yet, but I assume there will be plenty of fun and much to learn there:)
27
1
205
@jefffhj
Jie Huang
29 days
A reflection of the "reflection": - CoT distilled from larger model + Multiple Generations + Voting - CoT + Multiple Generations + Voting - CoT + Single Generation (But Multiple Responses in the Generation) + Voting I guess the "reflection" here is essentially CoT (or thinking
@xiangyue96
Xiang Yue✈️@COLM 2024
30 days
I personally think the key factor may not be the "reflection" technique itself, but rather the additional tokens it forces the model to generate. This concept could be connected to [PAUSE] tokens, as described in . By generating more tokens, the model
11
14
188
3
23
197
@jefffhj
Jie Huang
1 year
Universal Reasons to Reject NLP Papers If your paper is not based on LLMs: not meaningful in the era of LLMs If your paper is based on LLMs: no technical depth If your paper is about training LLMs themselves: no novelty
3
27
187
@jefffhj
Jie Huang
1 year
Just marked the end of my first month's internship at Google @DeepMind by submitting my first project, which also happens to be my first project with potential for direct application in enhancing a real product, #Bard . 😇 This experience differs greatly from merely writing a
Tweet media one
4
9
185
@jefffhj
Jie Huang
7 months
Xinyun ( @xinyun_chen_ ) and I have updated a version of our paper, which will appear at #ICLR2024 . In summary: 1. Current LLMs Cannot Self-Correct Reasoning Intrinsically (tested on ChatGPT/GPT-4 and Llama-2 using various prompts). 2. Multi-Agent Debate Does Not Outperform
@jefffhj
Jie Huang
1 year
Can LLMs Self-Correct Their Reasoning? Recent studies (self-refine, self-critique, etc.) suggest LLMs possess a great ability to self-correct their responses. However, our research indicates LLMs cannot self-correct their reasoning intrinsically. [1/n]
Tweet media one
5
69
391
3
25
177
@jefffhj
Jie Huang
1 year
I authored a critique paper titled "Large Language Models Cannot Self-Correct Reasoning Yet" () 20 days ago. I’ve observed two distinct groups misinterpreting the content in two different ways: For LLM Critics: "LLMs Cannot Self-Correct Reasoning" !=
@ylecun
Yann LeCun
1 year
Anyone who thinks Auto-Regressive LLMs are getting close to human-level AI, or merely need to be scaled up to get there, *must* read this. AR-LLMs have very limited reasoning and planning abilities. This will not be fixed by making them bigger and training them on more data.
73
312
2K
5
30
158
@jefffhj
Jie Huang
6 months
Last year, when I interned at @GoogleDeepMind , my first task was to improve the factuality of Bard (now called Gemini). I'm very happy to see that part of my internship work has now been refined as the SAFE system. To enhance factuality, the first step is to develop a method for
@JerryWeiAI
Jerry Wei
6 months
New @GoogleDeepMind + @Stanford paper! 📜 How can we benchmark long-form factuality in language models? We show that LLMs can generate a large dataset and are better annotators than humans, and we use this to rank Gemini, GPT, Claude, and PaLM-2 models.
Tweet media one
9
77
368
2
13
157
@jefffhj
Jie Huang
1 year
A typical 10 days for a PhD student at UIUC after returning from CA: • Conduct experiments for an ongoing project (2 days) • Draft thesis for preliminary exam from scratch (1 day) • Prepare slides for preliminary exam from scratch (1 day) • Write rebuttals (or assist in
3
5
149
@jefffhj
Jie Huang
1 year
Another typical 10 days for a PhD student after his typical 10 days: • Draft 80% of a paper (2 days) • Conduct experiments (3 days) • Draft a grant proposal with Kevin • Prepare for and pass the PhD preliminary exam • Attend 7 project meetings • Prepare and deliver two
@jefffhj
Jie Huang
1 year
A typical 10 days for a PhD student at UIUC after returning from CA: • Conduct experiments for an ongoing project (2 days) • Draft thesis for preliminary exam from scratch (1 day) • Prepare slides for preliminary exam from scratch (1 day) • Write rebuttals (or assist in
3
5
149
8
5
120
@jefffhj
Jie Huang
1 year
Today is my last day as a research intern at @GoogleDeepMind . It has truly been an incredible experience working with my hosts @xinying_song @denny_zhou & many other brilliant minds. I will be staying in the Bay Area for the next week and will soon be on the job market.
Tweet media one
4
0
118
@jefffhj
Jie Huang
1 year
In a conversation with my advisor: Me: May I graduate even earlier (< 4 years for PhD) Kevin: Absolutely, as long as it aligns with the university's policies. It's a good time for you to start your new journey. This inspired me to engage in more impactful research with Kevin :)
1
1
112
@jefffhj
Jie Huang
1 year
6 papers were accepted to #EMNLP2023 (including findings and demos). Congrats to all of my students and collaborators. Interestingly, all of my PhD work so far has been published at ACL/EMNLP. I rarely submitted papers to other venues, though I do more now :)
5
0
114
@jefffhj
Jie Huang
1 year
New position paper! 🔥 We position "citation" as the key to building responsible and accountable large language models - enhancing content transparency and verifiability, while mitigating IP and ethical dilemmas in LLMs such as #ChatGPT . 👉 🧵⬇️
Tweet media one
4
23
96
@jefffhj
Jie Huang
1 year
A well-written blog by @Francis_YAO_ ! Additionally, our survey on reasoning in LLMs has been accepted to #ACL2023 (Findings, though it received an average review score of 4 and a best paper recommendation 🫠) I will update a new version soon :)
Tweet media one
@Francis_YAO_
Yao Fu
1 year
The core differences between GPT4 and 3.5 is the ability to perform complex tasks. In this post, we present a complete roadmap towards LLMs complex reasoning abilities, covering the full development stages: pretraining, SFT, RL, CoT prompting, and eval.
17
175
751
3
15
86
@jefffhj
Jie Huang
1 year
Descriptive Knowledge Graph in Biomedical Domain () We present a system for biomedical knowledge discovery that automatically extracts and generates descriptive sentences from the biomedical corpus, facilitating efficient search for relational knowledge.
Tweet media one
3
12
85
@jefffhj
Jie Huang
1 year
LLM Evaluation Research? Research on evaluating LLMs requires more than just selecting benchmarks, reporting numbers, and drawing obvious conclusions. One should delve deeper, uncovering truly valuable *insights* that progress our understanding and application of these models.
4
11
82
@jefffhj
Jie Huang
1 year
I'd like to reintroduce a paper I wrote last year (), which mirrors and helps explain a bit of the Reversal Curse (). It found that LLMs perform quite well in verbatim recovery (memorization) but struggle to associate relevant
Tweet media one
@OwainEvans_UK
Owain Evans
1 year
Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!
Tweet media one
175
707
4K
4
10
81
@jefffhj
Jie Huang
1 year
Slides for my presentation today at @UofT . Towards Responsible and Accountable Large Language Models: Navigating Privacy Risks and Ethical Challenges
Tweet media one
1
13
75
@jefffhj
Jie Huang
1 year
Why Does ChatGPT Fall Short in Answering Questions Faithfully? - Insufficient knowledge? - Difficulty associating questions with internal knowledge? - Limited reasoning abilities? 👉 Explore more in our study:
3
13
74
@jefffhj
Jie Huang
1 year
🔥🔥🔥Exciting News!!! I'm thrilled to share that I'll be continuing into the 4th year of my PhD at @IllinoisCS this fall (after finishing the 3rd year of my PhD at @IllinoisCS 👍👍👍)!!!! I am so grateful for the support from my advisors, families, and friends!!!!! 🎉🎉🎉👏👏👏
2
2
69
@jefffhj
Jie Huang
9 months
Thanks for the invitation! Happy to share our work on self-correction & reasoning in LLMs. My presentation will mainly revolve around two questions: 1) Are LLMs really able to self-correct their responses? 2) Are LLMs really able to reason? Note: Both questions don't have simple
@CohereForAI
Cohere For AI
9 months
Our community-led Geo Regional Asia Group is excited to welcome @jefffhj on Wednesday, January 17th to present "Large Language Models Cannot Self-Correct Reasoning Yet." Learn more and add this event to your calendar:
Tweet media one
2
5
16
2
11
67
@jefffhj
Jie Huang
1 month
Grok-2 is now officially in the Chatbot Arena leaderboard and ranked at the #2 spot! Feel free to share your feedback. Stay tuned for what we have coming next ;)
@lmarena_ai
lmarena (formerly lmsys.org)
1 month
Chatbot Arena update❤️‍🔥 Exciting news— @xAI 's Grok-2 and Grok-mini are now officially on the leaderboard! With over 6000 community votes, Grok-2 has claimed the #2 spot, surpassing GPT-4o (May) and tying with the latest Gemini! Grok-2-mini also impresses at #5 . Grok-2 excels in
Tweet media one
194
530
2K
2
3
68
@jefffhj
Jie Huang
1 year
Three papers were accepted to #ACL2023 . See you in Toronto!🍁 👉Preprints: Reasoning in LLMs: Diversified Generative Commonsense Reasoning: Specificity in Languag Models: #LLMs #Reasoning
Tweet media one
1
7
65
@jefffhj
Jie Huang
8 months
Recent advancements in the field have challenged the intrinsic value of most conventional academic pursuits. Regarding this, I feel a lack of motivation to submit papers, share on social media, or engage in paper reviewing, unless there is truly something important: • For
3
9
63
@jefffhj
Jie Huang
1 year
Unfortunately, all three LLM evaluation papers I reviewed for #EMNLP2023 simply selected several benchmarks, reported numbers, and drew obvious conclusions. They provided almost no "insights". To be honest, I could guess the conclusions just from the title of the paper...
@jefffhj
Jie Huang
1 year
LLM Evaluation Research? Research on evaluating LLMs requires more than just selecting benchmarks, reporting numbers, and drawing obvious conclusions. One should delve deeper, uncovering truly valuable *insights* that progress our understanding and application of these models.
4
11
82
2
7
62
@jefffhj
Jie Huang
1 year
A relevant discussion is "Are Large Language Models Really Able to Reason?" – a question I posed in our reasoning survey last year (). My opinion is not to treat this question as a binary yes/no. The model certainly has some ability to reason, though it's
Tweet media one
@jefffhj
Jie Huang
1 year
I authored a critique paper titled "Large Language Models Cannot Self-Correct Reasoning Yet" () 20 days ago. I’ve observed two distinct groups misinterpreting the content in two different ways: For LLM Critics: "LLMs Cannot Self-Correct Reasoning" !=
5
30
158
2
9
59
@jefffhj
Jie Huang
1 year
Have received 50+ emails/DMs and am surprised by everyone’s enthusiasm, lol. This is a crazy era with a lot of exciting things to explore. Thanks to everyone who reached out. I may not be able to reply to all messages, but I’ll read each one over the next two days (I’m
@jefffhj
Jie Huang
1 year
I'm on the job market and open to all sorts of fun stuff! Feel free to reach out and excite me 😀 Short intro: I work on various aspects of LLMs, including factuality, reasoning, safety, RAG, etc. (). I'm a time manipulator who handles both research &
23
30
324
1
1
56
@jefffhj
Jie Huang
9 months
2023 has been an exciting yet distracting year for me. One aspect is research. There are too many exciting things to explore. Just in the realm of LLMs, I've explored reasoning, factuality, hallucination, retrieval-augmentation, ethics, privacy, long-context, agents, fast
0
2
53
@jefffhj
Jie Huang
1 year
Although I am a researcher in large language models, and was an enthusiast for such models in the early days... I find it truly a pleasure to read a *valuable* paper that's NOT about LLMs these days.🤣 If you come across such papers, please do share them with me.
2
2
52
@jefffhj
Jie Huang
10 months
As an AC for #NAACL , I've noticed that half of the papers in my batch are categorized under "Resources and Evaluation". However, these papers are actually focused on LLM prompting😅 From what I recall, this was not a particularly popular track in the past, as it is typically
Tweet media one
3
1
49
@jefffhj
Jie Huang
2 months
Grok-2🚀
@xai
xAI
2 months
2K
2K
9K
0
0
49
@jefffhj
Jie Huang
1 year
LLMs can associate related entities/information, but this raises concerns when it comes to personal information such as phone numbers. We conduct an analysis of the association capabilities of LMs and study its implications on privacy leakage.🧵
Tweet media one
2
10
44
@jefffhj
Jie Huang
1 year
📢📢📢 New paper(s) alert! 🚨🚨🚨 We propose nothing but I believe it certainly warrants attention.🔥🔥🔥 Stay tuned for more updates!🥰🥰🥰 (These papers are truly "new papers" because I just got them from Google's office)
Tweet media one
5
3
44
@jefffhj
Jie Huang
2 years
Thank you, #Bard 🫡
Tweet media one
2
1
41
@jefffhj
Jie Huang
1 year
Gave my first in-person oral presentation at #ACL2023 (only remote oral or in-person poster presentations before). In this day and age, research on entity/concept relationships might not draw the general audience's attention. Nevertheless, it's a unique topic that I have
Tweet media one
2
3
42
@jefffhj
Jie Huang
1 year
🔥Safety is a top priority, and experts make great efforts to prevent LLMs from generating sensitive content. However, our research unveils that through ingeniously crafted Jailbreaking Attacks🕵️, #ChatGPT could potentially leak personal information.🧵 👉
Tweet media one
2
9
37
@jefffhj
Jie Huang
11 months
LLMs are trained to follow instructions and align with human preferences. If you give a wrong instruction or indicate an incorrect preference, and they follow the instruction to do something wrong or not meet your preference, it's your problem, not the problem of LLMs.😅
@AnthropicAI
Anthropic
1 year
AI assistants are trained to give responses that humans like. Our new paper shows that these systems frequently produce ‘sycophantic’ responses that appeal to users but are inaccurate. Our analysis suggests human feedback contributes to this behavior.
Tweet media one
42
209
1K
5
8
35
@jefffhj
Jie Huang
2 years
Welcome to @IllinoisCS !
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
1
36
@jefffhj
Jie Huang
2 years
Thrilled to kick off my internship at @nvidia ! 💻🎉 This summer, I'll also join @Google Brain as a research intern. 🧠💡 Can't wait to dive into solving cutting-edge problems with the power of the most advanced language models. 🤖💻💪 (emoji was added by #ChatGPT )
Tweet media one
1
1
36
@jefffhj
Jie Huang
2 years
Are Large Pre-Trained Language Models Leaking Your Personal Information? Welcome to check the answer in our recent #EMNLP2022 paper on LM *Memorization* vs *Association* Paper: Code:
Tweet media one
@Melissahei
Melissa Heikkilä
2 years
Large language models are trained on vast datasets scraped from the internet. This inevitably includes personal data such as addresses, phone numbers and emails. I wanted to know—what do these models have on me?
7
137
435
1
6
35
@jefffhj
Jie Huang
11 months
Another clarification is, although we claim "Large Language Models Cannot Self-Correct Reasoning Yet", we don't completely refute the concept of self-correction. In the discussion section (I highly recommend reading the full paper if you haven't yet :), we mention that
Tweet media one
@jefffhj
Jie Huang
1 year
I authored a critique paper titled "Large Language Models Cannot Self-Correct Reasoning Yet" () 20 days ago. I’ve observed two distinct groups misinterpreting the content in two different ways: For LLM Critics: "LLMs Cannot Self-Correct Reasoning" !=
5
30
158
3
6
32
@jefffhj
Jie Huang
2 years
Glad to have five papers accepted to #EMNLP2022 on descriptive knowledge graphs, privacy in LMs, definition modeling, topic modeling, and dialogue state tracking!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
5
31
@jefffhj
Jie Huang
1 year
@rao2z The common observation is that the performance on reasoning tasks degrades when LLMs attempt self-correction without external feedback (e.g., oracle labels). The main reason is that LLM itself cannot reliably judge the correctness of its reasoning.
@jefffhj
Jie Huang
1 year
Can LLMs Self-Correct Their Reasoning? Recent studies (self-refine, self-critique, etc.) suggest LLMs possess a great ability to self-correct their responses. However, our research indicates LLMs cannot self-correct their reasoning intrinsically. [1/n]
Tweet media one
5
69
391
1
1
28
@jefffhj
Jie Huang
2 years
Read this paper today. Like the comprehensive analysis :) We also study LMs' memorization on long-tail entities in our #EMNLP2022 paper (specific to email address) Some conclusions are different, e.g., we found that larger LMs can memorize/associate more
@AkariAsai
Akari Asai @ COLM
2 years
Can we solely rely on LLMs’ memories (eg replace search w ChatGPT)? Probably not. Is retrieval a silver bullet? Probably not either. Our analysis shows how retrieval is complementary to LLMs’ parametric knowledge [1/N] 📝 💻
Tweet media one
15
95
534
0
2
28
@jefffhj
Jie Huang
1 month
Accel🚀
@elonmusk
Elon Musk
1 month
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent
5K
8K
75K
1
0
27
@jefffhj
Jie Huang
1 year
Wore a Stanford T-shirt out today and several people smiled at me and gave thumbs up. Will buy another one on my next visit to Stanford.🙂
1
0
25
@jefffhj
Jie Huang
9 months
+ my two cents on proposing and answering a question: 1) Be critical about the question—assess whether it's meaningful enough to warrant solving. 2) Seldom answer a question with a simple binary of yes/no; otherwise, you are likely to become overly optimistic or overly critical.
@jefffhj
Jie Huang
9 months
My research often begins with curiosity-driven questions: 1. Are LLMs Leaking Your Personal Information? () 2. Are LLMs Really Able to Reason? () 3. Can LLMs Self-Correct Their Reasoning? () 4. What is Missing
17
96
595
0
2
26
@jefffhj
Jie Huang
1 year
+1. Sometimes, I even feel that humans hallucinate more than LLMs do, lol.
@denny_zhou
Denny Zhou
1 year
Although our work is about LLMs, I actually don't think humans do better than LLMs on self-correcting. Look at those countless mistakes in AI tweets.
2
11
60
5
1
24
@jefffhj
Jie Huang
2 years
"New Paradigm of Research"
Tweet media one
1
2
23
@jefffhj
Jie Huang
1 year
Found an interesting review comment on one of my 2021 paper submissions: "The demonstration of the effect appears to be quite clear and it's easy to imagine that if "prompt engineering" becomes a major part of modern NLP (I sincerely hope not, but who knows!) this result will be
Tweet media one
1
0
23
@jefffhj
Jie Huang
1 year
Next semester, Kevin and I will be launching a course on Large Language Models at @UofIllinois . Looking forward to having a lot of fun along the way!
Tweet media one
1
1
22
@jefffhj
Jie Huang
1 year
Will appear in @solarneurips . I still believe that "citation" is essential for LLMs from intelligence, transparency, and ethical standpoints, but it seems they still do not receive enough attention. However, my primary purpose here is to convey my guilty for missing the review
@jefffhj
Jie Huang
1 year
New position paper! 🔥 We position "citation" as the key to building responsible and accountable large language models - enhancing content transparency and verifiability, while mitigating IP and ethical dilemmas in LLMs such as #ChatGPT . 👉 🧵⬇️
Tweet media one
4
23
96
0
5
21
@jefffhj
Jie Huang
1 year
Niagara Falls
1
1
20
@jefffhj
Jie Huang
2 years
In academia, chasing SOTA performance isn't the ideal pursuit these days, especially for tasks solvable by #GPT4 or future versions. Small models boasting comparable performance to GPT-4 could simply be overfitting benchmarks. Achieving good results using GPT-4, e.g.,
2
4
20
@jefffhj
Jie Huang
11 months
Now I have observed three loud groups in LLMs: • LLM hype (overclaiming the ability) • LLM risk hype (overstating the risks) • LLM critique hype (exaggerating the inability) (Un)Fortunately, I am someone between these three groups.
@jefffhj
Jie Huang
1 year
Two loud groups in AI: AI hype vs AI risk hype
0
0
1
1
0
19
@jefffhj
Jie Huang
1 year
Another interesting point to share is that in the paper, "Language Models (Mostly) Know What They Know" () from @AnthropicAI , it has already been observed that LLMs struggle to self-evaluate their generated answers without self-consistency (though this
Tweet media one
1
5
19
@jefffhj
Jie Huang
9 months
Have seen several instances where my friends' research work was overshadowed by similar works from bigger names or groups, but they could not release/advertise their work due to the anonymity deadline. Have also seen multiple instances where people rushed to submit a low-quality
@zacharylipton
Zachary Lipton
9 months
Just learned despite everyone voting down *CL's 🤡-y arxiv embargo policy, it's still firmly in place for ACL 2024. If *CL were a company, the board & leadership wd be fired, the talent wd've left 5 years ago, the common stock wd be worth $0, & WSB wd be taking an interest.
10
12
126
0
2
19
@jefffhj
Jie Huang
9 months
The happiest time for me was during my undergraduate years when I knew very little.
1
1
18
@jefffhj
Jie Huang
1 year
Heading to Toronto for #ACL2023 ! Don't hesitate to reach out! Looking forward to reconnecting with old friends and making new friends ;)
1
0
18
@jefffhj
Jie Huang
2 years
Finish my paper review for #ACL2023 . I think *excitement* is very subjective. I feel excited about everything when I started doing research. And now, I'm more critical and rarely get excited. To avoid bias and try to give higher scores, I simplely treat it as overall assessment🙂
Tweet media one
0
0
17
@jefffhj
Jie Huang
1 year
Achieve a milestone of 1234 followers on Twitter. I've only really started using Twitter over the past few months, primarily because I felt that my work wasn't getting enough exposure. However, I've also realized that this environment is quite impetuous. Many people seem fixated
Tweet media one
0
0
17
@jefffhj
Jie Huang
2 years
Fun fact about LLM and related research: If your idea is truly groundbreaking, @OpenAI will perform better and faster than you. The rest: 1) The idea is too difficult, so you can't tackle it either. 2) The idea is too small to capture their attention. 🙃
@sama
Sam Altman
2 years
we are starting our rollout of ChatGPT plugins. you can install plugins to help with a wide variety of tasks. we are excited to see what developers create!
611
3K
18K
2
1
16
@jefffhj
Jie Huang
2 years
My boring life with large language model pretraining: debug => debug => debug => debug => watch GPUs melt => watch tensorboard => repeat 🫠
Tweet media one
2
0
15
@jefffhj
Jie Huang
6 months
I feel deeply fortunate and grateful for the encouragement and support from my advisor, Kevin. I am also honored and lucky to have received the support of my wonderful committee members @haopeng_nlp , @hanghangtong , @tianyin_xu , @Diyi_Yang
2
0
15
@jefffhj
Jie Huang
1 year
I propose that all AI conferences include an item titled "Honesty" in their submission & review forms. 📜 This would encourage authors to critically evaluate whether they are exaggerating or overclaiming their results, which in my opinion, is a form of academic irresponsibility.
3
1
15
@jefffhj
Jie Huang
2 years
The harshest comment to challenge one's research: "Interesting, but is ____ really necessary in the age of GPT-4?"
1
0
14
@jefffhj
Jie Huang
2 years
TBH, I felt very bored on reading papers that focus on achieving marginal improvements on a specific dataset(s) and then try to tell a story (i.e., overfitting + stroytelling). It's time to shift towards goals that can bring about real-world impact and advancements. 🫡 #GPT4
@OpenAI
OpenAI
2 years
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:
2K
17K
63K
1
0
13
@jefffhj
Jie Huang
1 year
We first test self-correction methods that have been shown to significantly improve reasoning in existing literature. However, the results indicate that LLMs struggle to self-correct their reasoning by applying these methods, with performance even dropping post self-correction.
Tweet media one
1
0
13
@jefffhj
Jie Huang
9 months
Depending on the questions and the computational resources available at the time, the research may be conducted as an engineering study, an analytical study, a position paper, a review paper, etc.
0
0
12
@jefffhj
Jie Huang
1 year
Transitioning to another facet of self-correction, we investigate the potential of multi-agent debate as a means to improve reasoning. However, our results reveal that its efficacy is no better than self-consistency when considering an equivalent number of responses.
Tweet media one
1
0
13
@jefffhj
Jie Huang
1 year
Why are there so many similar papers these days?😅
3
1
13
@jefffhj
Jie Huang
10 months
If you would like to discuss ethics in LLMs or need any suggestions about research and graduate studies, please find me at the solar workshop (R06-R09) in the next hour.
0
0
13
@jefffhj
Jie Huang
1 year
This observation is in contrast to prior research such as Kim et al. (2023); Shinn et al. (2023). Upon closer examination, we find that their improvements result from using oracles to guide self-correction, and the improvements vanish when oracle labels are not available.
Tweet media one
3
0
12
@jefffhj
Jie Huang
9 months
The goal of the research is then to answer my questions and offer insights to the community.
1
0
11
@jefffhj
Jie Huang
1 year
@YiMaTweets It's not surprising to me either, lol. However, many people misunderstand it. That's why the critique paper we wrote is crucial: to urge researchers to approach this domain with a discerning and critical perspective and to offer suggestions for future research and practical
1
0
8
@jefffhj
Jie Huang
2 years
🤔Want to understand jargon? We propose to combine definition extraction and definition generation. Experiments demonstrate our method outperforms SOTA for definition modeling significantly, e.g., BLEU 8.76 -> 22.66. 👉 👉 #EMNLP2022
0
2
11
@jefffhj
Jie Huang
10 months
Though I haven't chatted with @hhsun1 , I really enjoy the discussions with the students in @osunlp , like the atmosphere they have, and enjoy reading their recent work on LLMs.
@hhsun1
Huan Sun (OSU)
10 months
Hiring multi Ph.D. students this cycle in areas: #LLM train/eval, trustworthiness of LLMs incl. privacy & safety, LLM for biomedicine/chemistry. see below for representative work. I won't be #EMNLP23 , but pls talk to my (former) students there @xiangyue96 @BoshiWang2 @RonZiruChen
5
23
91
1
1
11
@jefffhj
Jie Huang
1 year
To clarify, this is not a critique of the specific tweet by @rao2z and @ylecun that I quoted (some summaries are really useful, as we also acknowledge in our paper). It's an interesting observation of a (not good) trend I've noticed in the general community. The interpretation
0
0
11
@jefffhj
Jie Huang
1 year
@XueFz Here is an example of my work: An insightful evaluation paper typically transcends mere evaluation. The following pieces can also be regarded as LLM "evaluation" papers: I don't mean
2
1
10
@jefffhj
Jie Huang
1 year
Beat Vincent today and become the co-author with the most collaborative works alongside Kevin in DBLP 🫡
Tweet media one
2
0
11
@jefffhj
Jie Huang
9 months
New Year, Bigger Dreams.
Tweet media one
1
0
10
@jefffhj
Jie Huang
2 years
Awesome takeaways of GPT-4 shared by @JingfengY
@JingfengY
Jingfeng Yang
2 years
Some takeaways from GPT-4 main paper and blog:
Tweet media one
2
7
53
0
0
9
@jefffhj
Jie Huang
2 years
Want to create an open and informative knowledge graph without human annotation? In our #EMNLP2022 paper DEER🦌, we propose Descriptive Knowledge Graph where relations are represented by free-text relation descriptions 👉 🦌
Tweet media one
1
2
9
@jefffhj
Jie Huang
10 months
Congrats @xiangyue96 on the great benchmarking work! Truly impressive.
@xiangyue96
Xiang Yue✈️@COLM 2024
10 months
🚀 Introducing MMMU, a Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. 🧐 Highlights of the MMMU benchmark: > 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks >
Tweet media one
Tweet media two
Tweet media three
Tweet media four
19
184
748
1
0
9
@jefffhj
Jie Huang
1 year
Before authors criticize low-quality reviews, take a moment to consider if your submissions are truly high-quality. Likewise, before reviewers point to low-quality submissions, reflect on the quality of your reviews. Together, we can improve the academic community 🫡
1
2
9