wyu_nd Profile Banner
Wenhao Yu Profile
Wenhao Yu

@wyu_nd

Followers
3K
Following
880
Media
45
Statuses
286

Senior Research Scientist at @TencentGlobal AI Lab in Seattle | Bloomberg PhD Fellow | Ex. @MSFTResearch @TechAtBloomberg and @allen_ai

Seattle
Joined December 2021
Don't wanna be here? Send us removal request.
@wyu_nd
Wenhao Yu
3 months
๐Ÿฅณ We open-sourced Leopard-Instruct, a dataset containing ๐Ÿ๐Œ ๐ก๐ข๐ ๐ก-๐ช๐ฎ๐š๐ฅ๐ข๐ญ๐ฒ, ๐ญ๐ž๐ฑ๐ญ-๐ซ๐ข๐œ๐ก, ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ข๐ฆ๐š๐ ๐ž ๐ข๐ง๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ข๐จ๐ง-๐ญ๐ฎ๐ง๐ข๐ง๐  examples. It significantly improves performance on multi-image understanding!. Github:
Tweet media one
1
47
194
@wyu_nd
Wenhao Yu
5 months
When I tried OpenAI O1-preview on complex Chinese math problems, the model still thinks in English. This behavior aligns with our findings in our #ACL24 paper on "Leveraging Pivot Language in Cross-Lingual Problems". We found that answering non-English questions while thinking in
Tweet media one
34
50
476
@wyu_nd
Wenhao Yu
1 year
My 2023 summary:.๐ŸŽ“ PhD graduation .๐Ÿ† EMNLP Outstanding Paper.๐Ÿ’ฏ Crossed 1000+ citations.๐Ÿฆ™ Met a real Alpaca in Peru ๐Ÿ‡ต๐Ÿ‡ช Tried luring it with ๐Ÿฅ• for a fun photo, but the Alpaca had own thought and DOES NOT follow my instruction at all ๐Ÿคฃ. Embracing 2024 with fresh enthusiasm ๐Ÿš€
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
8
335
@wyu_nd
Wenhao Yu
1 year
๐ŸšขIntroduce WebVoyager -> Building an End-to-End Web Agent with Large Multimodal Models. ๐Ÿ“ŒA GPT-4V powered web agent, can complete user instructions end-to-end on real-world websites.๐Ÿ“ŒGiven [task instruction, trajectory], we show GPT-4V can be a good web agent task evaluator
Tweet media one
6
25
168
@wyu_nd
Wenhao Yu
2 years
๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฅ๐—ฎ๐˜๐—ต๐—ฒ๐—ฟ ๐—ง๐—ต๐—ฎ๐—ป ๐—ฅ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฒ is now ๐—ฎ๐—ฐ๐—ฐ๐—ฒ๐—ฝ๐˜๐—ฒ๐—ฑ to #๐—œ๐—–๐—Ÿ๐—ฅ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฏ ๐ŸŽ‰๐ŸŽ‰ Without using DPR/Google, it achieved SoTA on multiple open-domain QA and knowledge-intensive benchmarks! Work done .@ms_knowledgenlp!.Code and paper:
Tweet media one
3
41
229
@wyu_nd
Wenhao Yu
1 year
๐Ÿ“ขNew paper: Chain-of-Note . Retrieval-Augmented LMs are often misled by noisy, irrelevant documents. Adding IR could even hurt performance in some scenarios๐Ÿ˜…. Chain-of-Note improves +7.5 over standard RALM on NQ when all documents are noisy!. ArXiv:
Tweet media one
3
34
141
@wyu_nd
Wenhao Yu
4 months
๐Ÿš€ Exciting opportunities at ๐“๐ž๐ง๐œ๐ž๐ง๐ญ ๐€๐ˆ ๐’๐ž๐š๐ญ๐ญ๐ฅ๐ž ๐‹๐š๐›! We're hiring interns for Summer 2025 and multiple FTEs in cutting-edge research across LLM Agents, RAG, and Multi-modal LLM. Check out some of my (and my intern's) recent works:. ๐Ÿ”น RAG:.Generate rather than.
5
28
222
@wyu_nd
Wenhao Yu
1 year
๐ŸŽ‰Personal Update: Successfully defend my PhD and now part of @TencentGlobal AI Lab Seattle. Huge thanks to my advisor @Meng_CS for unwavering support. I'll work on frontier NLP research, focusing on novel tech in LLM, IR & Instruction tuning. & free feel reach out for internship
Tweet media one
24
2
203
@wyu_nd
Wenhao Yu
2 years
๐ŸŽ‰๐ŸŽ‰#๐—˜๐— ๐—ก๐—Ÿ๐—ฃ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฎ ๐—” ๐—จ๐—ป๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ-๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ ๐˜„๐—ถ๐˜๐—ต ๐—˜๐—ป๐˜๐—ถ๐˜๐˜† ๐— ๐—ฒ๐—บ๐—ผ๐—ฟ๐˜†: A close-book model with much better performance than ๐—˜๐—ฎ๐—˜, e.g. 47.2 EM on TriviaQA, and outperform open-book on ELI5!. ArXiv:
Tweet media one
1
38
170
@wyu_nd
Wenhao Yu
1 year
๐Ÿ“ข I am actively looking for research interns working with me in summer 2024 at @TencentGlobal AI Lab in Seattle. If you have research backgrounds in IR & RAG, Factuality, Reasoning, Agent and interested in the working with me, feel free to DM me! ๐Ÿ˜Š
Tweet media one
3
21
152
@wyu_nd
Wenhao Yu
2 years
#๐„๐Œ๐๐‹๐๐Ÿ๐ŸŽ๐Ÿ๐Ÿ ๐‘๐ž๐ญ๐ซ๐ข๐ž๐ฏ๐š๐ฅ ๐€๐ฎ๐ ๐ฆ๐ž๐ง๐ญ๐š๐ญ๐ข๐จ๐ง ๐Ÿ๐จ๐ซ ๐‚๐จ๐ฆ๐ฆ๐จ๐ง๐ฌ๐ž๐ง๐ฌ๐ž ๐‘๐ž๐š๐ฌ๐จ๐ง๐ข๐ง๐ : ๐€ ๐”๐ง๐ข๐Ÿ๐ข๐ž๐ ๐€๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก. A simple way that retrieves relevant information from commonsense corpora for reasoning tasks. #NLProc
Tweet media one
2
17
140
@wyu_nd
Wenhao Yu
5 months
๐Ÿ’กIntroducing DSBench: a challenging benchmark to evaluate LLM systems on real-world data science problems. GPT-4o scores only 28% accuracy, while humans achieve 66%. A clear gap, but an exciting challenge for AI advancement! ๐Ÿง. Paper: Project lead by our
Tweet media one
10
27
140
@wyu_nd
Wenhao Yu
2 years
๐ŸŽ‰ New preprint! Generate rather than Retrieve: Large Language Models are Strong Context Generators. Our proposed method achieved new SoTA on open-domain QA! (1/5). Arxiv link:
Tweet media one
1
20
134
@wyu_nd
Wenhao Yu
4 months
Understanding multiple text-rich images is a challenging task! Today, we are thrilled to introduce a new vision-language model capable of processing multiple visual documents, charts, and snapshots as input, outperforming SoTA models by a large margin. Itโ€™s highly useful for
Tweet media one
1
33
134
@wyu_nd
Wenhao Yu
1 month
Our team at Tencent AI Seattle is looking for 1โ€“2 summer research interns for 2025 to work on ๐€๐ ๐ž๐ง๐ญ-๐ซ๐ž๐ฅ๐š๐ญ๐ž๐ projects. These include building better GUI agents (e.g., planning, grounding), developing self-evolving agents (e.g., exploration, critique, reflection), and.
1
10
122
@wyu_nd
Wenhao Yu
3 months
๐Ÿš€Introducing OpenWebVoyager: a multi-modal, LLM-based web agent built on open-source models! It iteratively improves through real-world exploration, followed by visual critique and optimization using successful trajectories after each round!. Paper: Code:
Tweet media one
2
19
109
@wyu_nd
Wenhao Yu
1 year
๐Ÿ“ขNew paper "Sub-sentence Encoder" (led by.@soshsihao), a contrastively-learned contextual embedding model for fine-grained semantic representation of text. ๐Ÿ†Outperform SimCSE, GTR, ST5 and other sentence embedding methods by large margin! . ArXiv:
Tweet media one
@soshsihao
Sihao Chen
1 year
Text embeddings = one embedding for the entire text sequence. But what if the text is long and says many things?.Can encoders produce contextual embedding for an individual piece of meaning in one text sequence?. โ—Check out: Sub-Sentence Embeddings. 1/6
Tweet media one
1
14
105
@wyu_nd
Wenhao Yu
1 year
๐ŸŽ‰๐ƒ๐ž๐ง๐ฌ๐ž ๐— ๐‘๐ž๐ญ๐ซ๐ข๐ž๐ฏ๐š๐ฅ: What Retrieval Granularity Should We Use?. Both passage and sentence level index are not optimal for dense retrieval. We introduce a novel retrieval unit, proposition, for dense retrieval. See details in this thread ~.
Tweet media one
2
16
99
@wyu_nd
Wenhao Yu
1 year
๐ŸŽ‰#EMNLP paper: LLM is greatly influenced by the quality of instructions, and manually written instructions for each task is laborious and unstable. We (led by @zhihz0535) introduce Auto-Instruct, automatically improve the quality of instructions provided to LLMs.
Tweet media one
2
24
105
@wyu_nd
Wenhao Yu
2 years
#EMNLP #NLProc Wanted to share some our new research ๐—ฑ๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€ on ๐—ข๐—ฝ๐—ฒ๐—ป-๐—ฑ๐—ผ๐—บ๐—ฎ๐—ถ๐—ป ๐—ค๐—”๐Ÿ˜:.1. Generate-then-Read: using GPT-3 to generate contexts.2. Entity Memory: attend knowledge from memory, no retrieval.3. KG for QA: using Wikidata to better retrieve and read
Tweet media one
1
15
98
@wyu_nd
Wenhao Yu
1 year
We (Tencent AI Seattle Lab) still has one summer internship position, focused on RAG, Web Agent, or Multi-modal research. Please DM me if you are interested and have a relevant background. ๐Ÿ˜Š.
8
10
93
@wyu_nd
Wenhao Yu
2 years
๐ŸŽ‰๐ŸŽ‰EMNLP 2022: Knowledge Graph Enhanced Passage Reader for Open-domain Question Answering. With the same retriever and the same set of retrieved passages, GRAPE can outperform the state-of-the-art reader FiD by a large margin. ArXiv:
Tweet media one
2
16
90
@wyu_nd
Wenhao Yu
5 months
๐Ÿ“ข Introducing Cognitive Kernel: an open-source agent system towards generalist autopilots. The system can interact with real-world environment, handling user-provided files, access websites (e.g., Amazon), and manage long-term chat history. Our system is fully open-sourced and
1
2
76
@wyu_nd
Wenhao Yu
10 months
๐Ÿ“ข Excited to share that we will organize the 3rd workshop on Knowledge-Augmented NLP at ACL 2024. We will have six amazing speakers! We welcome your submissions and invite you to discuss with our speakers and organizers at the workshop. Looking forward to seeing you in Thailand!
Tweet media one
1
17
76
@wyu_nd
Wenhao Yu
2 years
๐Ÿ“ข Introducing ReFeed: a novel plug-and-play approach to enhance the factuality of large language models via retrieval feedback! Together with @Meng_CS @zhihz0535 @LiangZhenwen @ai2_aristo . Read more:
Tweet media one
1
15
73
@wyu_nd
Wenhao Yu
4 months
Excited to see MLE-Bench out, and a big thanks to @OpenAI for highlighting our DSBench as concurrent effort. Proud to contribute to advancing AI agents for real-world challenges alongside these initiatives!. DSBench:
Tweet media one
@OpenAI
OpenAI
4 months
Weโ€™re releasing a new benchmark, MLE-bench, to measure how well AI agents perform at machine learning engineering. The benchmark consists of 75 machine learning engineering-related competitions sourced from Kaggle.
3
10
72
@wyu_nd
Wenhao Yu
7 months
๐Ÿ“ŒMany LLM systems allow users upload documents, such as GPT-4, Claude, and Kimi. Have you used any of these systems?๐Ÿค” ๐‡๐š๐ฏ๐ž ๐ฒ๐จ๐ฎ ๐ž๐ฏ๐ž๐ซ ๐ฐ๐จ๐ง๐๐ž๐ซ๐ž๐ ๐ฐ๐ก๐ข๐œ๐ก ๐ฌ๐ฒ๐ฌ๐ญ๐ž๐ฆ ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐ฌ ๐ญ๐ก๐ž ๐›๐ž๐ฌ๐ญ ๐ฐ๐ก๐ž๐ง ๐ฒ๐จ๐ฎ ๐š๐ฌ๐ค ๐š ๐ช๐ฎ๐ž๐ฌ๐ญ๐ข๐จ๐ง ๐›๐š๐ฌ๐ž๐ ๐จ๐ง
Tweet media one
5
18
72
@wyu_nd
Wenhao Yu
2 years
๐Ÿ“ข Introducing IfQA - the first large-scale open-domain question answering (ODQA) dataset centered around counterfactual reasoning. Together with @Meng_CS @ai2_aristo!. Paper link:
Tweet media one
4
15
63
@wyu_nd
Wenhao Yu
3 months
Wrapping up a fantastic #EMNLP2024! Great to attend with friends from Tencent AI Seattle, connect with old & new colleagues, and engage with so many who share our interests!. And always happy to chat about intern & working opportunities at @TencentGlobal AI Lab in Seattle!
Tweet media one
0
3
60
@wyu_nd
Wenhao Yu
2 years
Thanks @TechAtBloomberg! It is my great honor to receive the fellowship! Thanks also to my advisor @NDengineering @meng_cs for always giving me the best support!.
@TechAtBloomberg
Tech At Bloomberg
2 years
Congratulations to @NotreDame + @ND_CSE's @wyu_nd on his being named one of the 2022-2023 @Bloomberg #DataScience Ph.D. Fellows!. Learn more about his research focus and the other new Fellows in our fifth cohort: #AI #ML #NLProc
Tweet media one
6
2
59
@wyu_nd
Wenhao Yu
10 months
๐Ÿ“ข New paper: Compared to ๐Œ๐ฎ๐ฅ๐ญ๐ข-๐ฆ๐จ๐๐š๐ฅ ๐‚๐จ๐“, We found ๐ƒ๐ž๐ฌ๐œ๐ซ๐ข๐›๐ž (visual description generation)-then-๐‘๐ž๐š๐ฌ๐จ๐ง (generating ๐Œ๐ฎ๐ฅ๐ญ๐ข-๐ฆ๐จ๐๐š๐ฅ ๐‚๐จ๐“ with the assistance of descriptions) could greatly improve math reasoning on MathVista and MathVerse.
Tweet media one
0
10
58
@wyu_nd
Wenhao Yu
11 months
๐Ÿ“ข Fall semester internship at @TencentGlobal AI Lab in Seattle: We are actively looking for research interns working on IR & RAG, Complex Reasoning, Multi-modal and Language Agent. If you are interested in the working with us, feel free to DM me! ๐Ÿ“ท.
2
5
51
@wyu_nd
Wenhao Yu
3 months
Just arrived in Miami for #EMNLP2024! Excited to present three papers on RAG, LLM Agents, and LLM Reflection. Happy to chat about research and discuss intern / full-time opportunities at Tencent AI Research Lab in Seattle!
Tweet media one
0
2
51
@wyu_nd
Wenhao Yu
1 year
I deeply appreciate of the implementation of WebVoyager and fantastic video that explains how to utilize LangGraph for its construction, as well as the comprehensive discussion surrounding LangGraph. Our team will provide more detailed information and make our source code.
@LangChainAI
LangChain
1 year
โ›ด๏ธย WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models. WebVoyager is a new kind of web-browsing agent, developed by Hongliang He, @wyu_nd, et. al. Powered by large multi-modal models, like GPT-4V, it uses browser screenshots to conduct research, analyze
Tweet media one
0
3
39
@wyu_nd
Wenhao Yu
2 years
Successful conclusion of the first Knowledge-Augmented NLP workshop at #AAAI23! With over 50 in-person attendees and 20 virtual participants, it was a huge success and one of the most well-attended events at #AAAI. Check out the blog and photos below!.
2
7
41
@wyu_nd
Wenhao Yu
2 years
After 3 years, excited to attend my second #AAAI23 with a lot of friends from Notre Dame @ND_CSE!
Tweet media one
0
1
40
@wyu_nd
Wenhao Yu
2 years
Excited to share our #EMNLP2022 #NLProc paper on improving multi-task learning via a very simple but very effective task prefix tuning method!.
@zhangzhuosheng
Zhuosheng Zhang
2 years
#EMNLP2022 ๐ŸงญTask Compass: Scaling Multi-task Pre-training with Task Prefix. ๐Ÿค”When multi-task pre-training in scale, how to explore task relationships?. ๐Ÿ’กWe find that task relationships can be probed by simply adding single-token task prefixes!
Tweet media one
0
4
32
@wyu_nd
Wenhao Yu
1 month
Excited to see our LongMemEval featured by LongBench v2! Tackling long-context challenges in real-world scenarios is key to enabling LLMs to retain user histories and preferences. Canโ€™t wait to see it spark innovation in long-context and memory mechanisms!. Also directly checkout.
@realYushiBai
Yushi Bai
2 months
Introducing ๐Ÿ“š LongBench v2: A benchmark to assess the ability of LLMs to handle long-context problems requiring deep understanding and reasoning across a variety of real-world tasks. ๐Ÿง  Do long-context LLMs truly "understand" the long text they process? Let's find out!. ๐Ÿงต 1/
Tweet media one
0
3
33
@wyu_nd
Wenhao Yu
2 years
My daily routine: star repos -> join waitlist ๐Ÿ˜ถ
Tweet media one
0
5
32
@wyu_nd
Wenhao Yu
2 months
An unforgettable night at NeurIPS 2024! ๐ŸŽ‰ Our @TencentGlobal Starlit Tech Gala brought together 300+ attendees for an evening of innovation, networking, and fun in Vancouver. Thanks all for joining us! ๐Ÿš€โœจ #NeurIPS2024
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
30
@wyu_nd
Wenhao Yu
24 days
Excited to rank 3rd on HuggingFace Top Contributors for Dec. 2024! ๐ŸŽ‰ Mainly driven by ๐‹๐ž๐จ๐ฉ๐š๐ซ๐: ๐Ÿ๐Œ high-quality data for ๐ญ๐ž๐ฑ๐ญ-๐ซ๐ข๐œ๐ก ๐ฆ๐ฎ๐ฅ๐ญ๐ข-๐ข๐ฆ๐š๐ ๐ž tasks like document VQA, charts, slides, and agents. So glad people are finding it useful! Check it out if
Tweet media one
1
1
29
@wyu_nd
Wenhao Yu
2 years
๐๐ž๐ฐ ๐’๐ฎ๐ซ๐ฏ๐ž๐ฒ ๐ฉ๐š๐ฉ๐ž๐ซ in #eacl2023! New perspectives to summarize multi-task learning in NLP from task relatedness and training methods! Also nice future work discussion. #NLProc.
@zhihz0535
Zhihan Zhang
2 years
Our paper "A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods" has been accepted to #eacl2023 main conference! Collaboration with @wyu_nd, @Meng_CS, @Zhichun5 and Mengxia Yu.
Tweet media one
0
3
27
@wyu_nd
Wenhao Yu
1 year
Thanks @_akhaliq for covering our work! WebVoyager๐Ÿšข is a GPT-4V powered web agent that can follow human instructions and complete tasks (e.g. ticket booking, shopping) on various real-world websites (e.g. Google flights, Amazon)! . The paper also present a new benchmark dataset.
@_akhaliq
AK
1 year
Tencent presents WebVoyager. Building an End-to-End Web Agent with Large Multimodal Models. paper page: The advancement of large language models (LLMs) leads to a new era marked by the development of autonomous applications in the real world, which drives
Tweet media one
2
2
27
@wyu_nd
Wenhao Yu
2 years
Excited to announce four highly esteemed keynote speakers @amit_p, @boydgraber, @scottyih, Chandan at our upcoming @knowledgenlp #AAAI23 workshop on Feb 13th! Dive into the cutting-edge topics of neuro-symbolic AI, code understanding, retrieval-augmented LM, and advanced QA.
Tweet media one
1
8
25
@wyu_nd
Wenhao Yu
2 months
Many of us from the Tencent AI Lab Seattle team are at NeurIPS! ๐ŸŽ‰ @hongming110, @LinfengSong1, and others (some not on Twitter) are hereโ€”feel free to say hi to them at the venue or poster!. We have full-time research openings and summer 2025 research internshipsโ€”letโ€™s talk!.
@hongming110
Hongming Zhang
2 months
Excited to revisit Vancouver๐Ÿ˜† Our team at Tencent AI Lab is actively hiring full-time researchers and research interns for next year. Let's chat if you are interested in conducting frontier research in Multi-modal agents, RL, and model architectures.๐Ÿ˜ผ๐Ÿ˜ผ๐Ÿ˜ผ.
0
2
24
@wyu_nd
Wenhao Yu
11 months
๐Ÿ“ฃOur 3rd workshop of knowledge augmented NLP will happen in ACL 2024 this year! Submission ddl: May 17, 2024! Looking forward to seeing you in Thailand!.
@knowledgenlp
KnowledgeNLP Workshop @NAACL 2025
11 months
๐ŸŽ‰Excited to announce the 3rd Workshop on Knowledge-Augmented NLP at ACL 2024 in Thailand!.Submission deadline: May 17, 2024. Eager to reconnect with old friends and welcome new faces in the Knowledge NLP community!.#ACL2024 #NLProc
Tweet media one
0
0
22
@wyu_nd
Wenhao Yu
2 years
๐Ÿ† Our work โ€œEmpowering Language Models with Knowledge Graph Reasoning for Question Answeringโ€ won the best paper award at #SoCalNLP 2022. Paper link:
@ucsbNLP
UC Santa Barbara NLP Group
2 years
Our @MegagonLabs Best Paper Award winner was "Empowering Language Models with Knowledge Graph Reasoning for Question Answering" by Ziniu Hu et al from UCLA!. Paper link: Thank you to award sponsor @MegagonLabs for supporting our event! (4/4)
Tweet media one
1
1
22
@wyu_nd
Wenhao Yu
1 year
PLUG is a novel cross-lingual instruction tuning method which could make LLaMa follow Chinese instructions (and other low resource language) very well!. Check out our paper at
@zhihz0535
Zhihan Zhang
1 year
๐ŸคจLLMs struggle to follow instructions in low-resource languages?.โšก๏ธIntroducing PLUG: leveraging pivot language in cross-lingual instruction tuning.๐Ÿ“ˆImproved LLaMA-2 by 32% on 4 diverse languages!. Check out our new preprint atโžก๏ธ
Tweet media one
0
2
19
@wyu_nd
Wenhao Yu
1 year
Thanks LangChain AI for covering and implementing Chain-Of-Note app as a LangChain template. Chain-Of-Note improves performance when retrieved information contains noise. Check out our paper at
@LangChainAI
LangChain
1 year
๐Ÿ—’๏ธChain-of-Note Template . Chain-of-Note is a new prompting technique by @wyu_nd et al for RAG applications that helps improve performance when the retrieved information might be noisy. We implemented a Chain-of-Note app as a LangChain template. Given a question, query Wikipedia
Tweet media one
0
4
18
@wyu_nd
Wenhao Yu
1 year
Thanks @LangChainAI for finding our methods useful and have put it in your templates!.
@LangChainAI
LangChain
1 year
๐Ÿ”ŽProposition-Based Retrieval. This new paper by @tomchen0 introduces a new retrieval method by changing ๐ŸŽฏwhat is indexed๐ŸŽฏ in the first place. This can easily use our ๐ŸŒฒmulti-vector retriever๐ŸŒฒ, and we've added a template to get started with it easily!. ๐Ÿ’กHow does it work? ๐Ÿ‘‡
Tweet media one
0
2
17
@wyu_nd
Wenhao Yu
2 years
๐Ÿ“ขCalling all #NLP enthusiasts! The 2nd Knowledge Augmented Methods for NLP workshop at #KDD2023 is now accepting paper submissions ๐Ÿ“๐Ÿ‘ฉโ€๐Ÿ’ป! Deadline: May 23rd. Accepted papers will be non-archival. For more info, check out ๐Ÿ‘‰ #AI #MachineLearning #NLProc
Tweet media one
0
4
18
@wyu_nd
Wenhao Yu
6 months
I will be at #ACL2024, will be hosting our 3rd workshop on knowledge-augmented methods for NLP, on August 16. We invited 6 keynote speakers, with 30 accepted oral and poster papers, covering diverse topics on RAG, KG, Agent โ€ฆ. See details at
@knowledgenlp
KnowledgeNLP Workshop @NAACL 2025
6 months
Thrilled to announce our finalized schedule at #ACL2024! We're excited to feature 6 keynote speakers and 30 accepted papers. Join us for an inspiring event!
Tweet media one
0
1
15
@wyu_nd
Wenhao Yu
2 years
Combing Retrieval AND Generation (in step1) can further improve the model performance, as shown in Figure 3. The choice of retrieval or generation is interesting, and their complementarity is worth exploring. Using retriever or generator only where it helps.
@johnjnay
John Nay
2 years
Right now we do:.1. retrieve docs.2. LLM generate output w/ those. But this doesn't fully leverage LLM power for step 1. What if we directly generate contextual docs for a question, instead of retrieving external docs?!. Paper Code
Tweet media one
0
2
14
@wyu_nd
Wenhao Yu
2 years
๐Ÿ“ฃ Check out this awesome survey on mathematical reasoning at poster session 2 #ACL2023.
@lupantech
Pan Lu
2 years
๐ŸงฒPlease stop by our poster on deep learning for math reasoning at Poster Session 2 @aclmeeting #ACL2023NLP. โค๏ธThanks to co-authors for their great contributions: @liangqiu_1994, @wyu_nd, @wellecks, & @kaiwei_chang. abs: github:
Tweet media one
0
3
14
@wyu_nd
Wenhao Yu
7 months
๐ŸงWe introduce a new method: using reflective thoughts to improve the model's reasoning capability, just as we humans often do when we step back to question our assumptions, make analogies, and explore alternative solutions.
@zhihz0535
Zhihan Zhang
8 months
๐ŸงPrevious math augmentation focused on improving single-round QA.๐ŸŽฏWe introduce a new method that1โƒฃaugments standard math settings2โƒฃexcels in reflective thinking scenarios!.๐Ÿ‘‰Check our latest preprint at
Tweet media one
0
1
14
@wyu_nd
Wenhao Yu
1 year
The new paper from our Tencent AI lab identifies 8 valuable insights into the current state of machine translation research in the LLM era, and propose potential avenue for future advances! . Check the paper below ๐Ÿ˜Š.
@wangly0229
Longyue Wang
1 year
๐Ÿ’ก How are Large Language Models reshaping the landscape of Machine Translation? ๐ŸŽˆ. ๐Ÿš€ Check out our latest paper to find interesting findings. We comprehensively revisited Six Classic Challenges of MT in the context of LLM. ๐ŸŽ‰.๐Ÿ‘‰ Dive in here: And
Tweet media one
0
0
12
@wyu_nd
Wenhao Yu
2 years
Pls consider submitting your work to our Knowledge Augmented NLP workshop at #AAAI2023! Looking forward to seeing you at Washington DC next February ๐ŸŽ‰.
@knowledgenlp
KnowledgeNLP Workshop @NAACL 2025
2 years
Hello World! The first workshop on Knowledge Augmented Methods for NLP at #AAAI2023 is welcoming submissions๐Ÿ™Œ! Papers due by Nov. 8! Accepted paper will be non-archival! Details are available ๐Ÿ‘‰
Tweet media one
0
2
13
@wyu_nd
Wenhao Yu
4 months
If youโ€™re attending #COLM2024, feel free to chat with my colleagues! Weโ€™re jointly hiring interns for exciting projects on LLM agents, self-evolving systems, multimodal (vision language), and RAG models. ๐Ÿš€.
@hongming110
Hongming Zhang
4 months
Arriving at #COLM2024 . Thrilled to meet old/new friends. Come find me discussing llm agents, ai systems, and all the excited things beyond.๐Ÿ˜†๐Ÿ˜†๐Ÿ˜†
Tweet media one
0
0
12
@wyu_nd
Wenhao Yu
2 years
โ€œRetrieves non-parametric memories only when necessary.โ€ This is a very insightful conclusion by asking โ€œhow retrieval is complementary to LLM parametric knowledge.โ€ We showed the same observation in paper but did not give detailed analysis. Learned a lot!.
@AkariAsai
Akari Asai
2 years
Can we solely rely on LLMsโ€™ memories (eg replace search w ChatGPT)? Probably not. Is retrieval a silver bullet? Probably not either. Our analysis shows how retrieval is complementary to LLMsโ€™ parametric knowledge [1/N].๐Ÿ“ ๐Ÿ’ป
Tweet media one
0
2
12
@wyu_nd
Wenhao Yu
2 years
Welcome to our presentation at today 11:30-11:45 at Hall B #EMNLP2022! Unified entity memory network have much stronger capabilities than EaE (first released by Googleโ€™s @professorwcohen , which is not restricted to only entity outputs.
@wyu_nd
Wenhao Yu
2 years
๐ŸŽ‰๐ŸŽ‰#๐—˜๐— ๐—ก๐—Ÿ๐—ฃ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฎ ๐—” ๐—จ๐—ป๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ-๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ ๐—™๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ ๐˜„๐—ถ๐˜๐—ต ๐—˜๐—ป๐˜๐—ถ๐˜๐˜† ๐— ๐—ฒ๐—บ๐—ผ๐—ฟ๐˜†: A close-book model with much better performance than ๐—˜๐—ฎ๐—˜, e.g. 47.2 EM on TriviaQA, and outperform open-book on ELI5!. ArXiv:
Tweet media one
1
0
11
@wyu_nd
Wenhao Yu
2 years
Welcome paper submissions to our workshop at #AAAI2023. Please help to share it! ๐Ÿ˜.
@knowledgenlp
KnowledgeNLP Workshop @NAACL 2025
2 years
Call for papers! The first workshop on Knowledge Augmented Methods for NLP (#NLProc) at #AAAI2023 is welcoming submissions๐Ÿ™Œ! Papers due on Nov. 4! Papers will be non-archival, so published papers (e.g.#EMMLP2022) can also present at our workshop! Details๐Ÿ‘‰
Tweet media one
0
1
10
@wyu_nd
Wenhao Yu
1 year
If you are at #NeurIPS2023, feel free to talk with my colleagues for internship opportunities next summer!.
@KaixinMa9
Kaixin Ma
1 year
Hello friends at #NeurIPS2023, our @TencentGlobal AI Lab in Seattle is actively looking for research interns for 2024. If you are interested in topics such as RAG, Reasoning, LLM Agent, and user interfaces, feel free to DM me for a chat!๐Ÿ˜Š.
0
1
11
@wyu_nd
Wenhao Yu
1 year
Congratulations! Welcome back to Tencent AI lab for internship again!.
@muhao_chen
๐ŸŒดMuhao Chen๐ŸŒด
1 year
My awesome student @JamesYHuang36 just received an outstanding paper award at #EMNLP2023! He is looking for summer research intern. Please interview him.
Tweet media one
0
0
10
@wyu_nd
Wenhao Yu
5 months
Asking LLMs to follow complex instructions to programming with function calls precisely is still a challenging task.
@terryyuezhuo
Terry Yue Zhuo
5 months
o1-preview-2024-09-12 on BigCodeBench-Hard. Complete 34.5% (slightly better than Claude-3.5-Sonnet-20240620).Instruct 23.0% (far below other top models).Average 28.8% . o1-preview may follow detailed instructions reasonably well, but not the brief ones. Not sure how consistent
Tweet media one
0
3
10
@wyu_nd
Wenhao Yu
2 years
This is a great new benchmark dataset if you work on scientific QA problems!.
@lupantech
Pan Lu
2 years
๐Ÿ“ข๐Ÿ“ขExcited to have one paper accepted to #NeurIPS2022! We present a new dataset, ScienceQA, and develop large language models to learn to generate lectures and explanations as the chain of thought (CoT). Data and code are public now! Please check๐Ÿ‘‡๐Ÿ‘‡.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
2
10
@wyu_nd
Wenhao Yu
7 months
In this paper, we introduce DocBench, a new benchmark designed to evaluate LLM-based document reading systems. Our benchmark involves a meticulously crafted process, including the recruitment of human annotators and the generation of synthetic questions. It includes 229 real
Tweet media one
1
4
9
@wyu_nd
Wenhao Yu
4 months
Dense X retrieval is accepted to #EMNLP2024! Discover how retrieval granularity can significantly affect your retriever performance -- it makes a big difference! Check it out!. Link:
@tomchen0
Tong Chen
1 year
โ—With dense retrieval, the unit in which you segment a retrieval corpus (passage, sentence, etc) may impact performance by more than you thought!. We introduce a novel retrieval unit, proposition, for dense retrieval. [1/7]
Tweet media one
1
0
9
@wyu_nd
Wenhao Yu
2 years
(2/3) Unified Encoder-Decoder Framework with Entity Memory (#EMNLP2022): The entity knowledge is stored in the memory as latent representations, and the memory is pre-trained on Wikipedia along with encoder-decoder parameters.
2
0
9
@wyu_nd
Wenhao Yu
2 years
In 2021, we wrote a survey ( to hightlight a key LM challenge: augmenting with external knowledge via IR, tools, etc. The introduction of plugins in ChatGPT reaffirms the effectiveness of knowledge augmentation for infusing LLMs with up-to-date information.
@OpenAI
OpenAI
2 years
We are adding support for plugins to ChatGPT โ€” extensions which integrate it with third-party services or allow it to access up-to-date information. Weโ€™re starting small to study real-world use, impact, and safety and alignment challenges:
0
0
9
@wyu_nd
Wenhao Yu
1 year
Thank you, Jerry @jerryjliu0, for highlighting our proposition retrieval work in the llama-index. The LlamaPack truly demonstrates the practical application and effectiveness of proposition-based retrieval systems!.
@jerryjliu0
Jerry Liu
1 year
A big factor for building production RAG is deciding the "chunk" used for retrieval + synthesis: should it be a sentence? Paragraph? . In the "Dense X Retrieval" paper (@tomchen0 et al.), the authors propose a concept that we've advocated for a while: decouple the indexed chunk.
0
3
6
@wyu_nd
Wenhao Yu
4 months
๐“›๐“ธ๐“ท๐“ฐ๐“œ๐“ฎ๐“ถ๐“”๐“ฟ๐“ช๐“ต can be used to evaluate ๐Ÿ๐ข๐ฏ๐ž ๐œ๐จ๐ซ๐ž ๐ฅ๐จ๐ง๐ -๐ญ๐ž๐ซ๐ฆ ๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ ๐š๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ of chat assistants: (1) information extraction, (2) multi-session reasoning, (3) temporal reasoning, (4) knowledge updates, and (5) abstention!.
@DiWu0162
Di Wu
4 months
Introducing LongMemEval: a comprehensive, challenging, and scalable benchmark for testing the long-term memory of chat assistants. ๐Ÿ“Š LongMemEval features:.โ€ข ๐Ÿ“ 164 topics.โ€ข ๐Ÿ’ก 5 core memory abilities.โ€ข ๐Ÿ” 500 manually created questions.โ€ข โณ Freely extensible chat history
Tweet media one
0
0
7
@wyu_nd
Wenhao Yu
2 years
(3/3) KG-enhanced DPR/FiD (#EMNLP2022): Using knowledge graph (Wikidata) to improve the retrieve-then-read pipeline, learn better document representation.
1
0
7
@wyu_nd
Wenhao Yu
2 years
(1/3) Generate-then-read propose a novel pipeline for solving open-domain QA tasks, i.e., replacing the process of retrieving contextual documents from large-scale corpora such as Wikipedia by prompting GPT-3 to generate relevant contextual documents.
1
1
7
@wyu_nd
Wenhao Yu
4 months
Training with reflective thinking is indeed an effective strategy, as shown in both Math and Coding. It may need further refinement, such as scaling the data and domains, improving the critic and verification. Check out our paper for more details:
@_philschmid
Philipp Schmid
5 months
Synthetic data for reflective thinking and Chain of Thought! @OpenAI o1 โ€œproduces a long internal chain of thought before responding to the user.โ€ A recent paper introduces RefAug on how we can augment existing data to embed problem reflection and โ€œthinkingโ€ into the training
Tweet media one
0
0
6
@wyu_nd
Wenhao Yu
28 days
This is an impressive result! Congrats to your team on this milestone! ๐Ÿš€.
@harveyhucal
Harvey Hu
28 days
We just achieved a new state-of-the-art 93% accuracy on the WebVoyager benchmark with and here's how we are thinking about the AI Agent problem. @wyu_nd @ycombinator
Tweet media one
0
0
6
@wyu_nd
Wenhao Yu
1 month
Thrilled to see Googleโ€™s Project Mariner hit 83.5% on our WebVoyager benchmark for real-world web tasks! Excited for more open-source efforts. Explore our benchmark and code here:
@GoogleDeepMind
Google DeepMind
2 months
When evaluated against the WebVoyager benchmark, which tests agent performance on end-to-end real world web tasks, Project Mariner achieved a state-of-the-art result of 83.5% working as a single agent setup.
0
2
6
@wyu_nd
Wenhao Yu
8 months
๐Ÿ’ก๐๐ž๐ฐ ๐Œ๐š๐ญ๐ก ๐๐ž๐ง๐œ๐ก๐ฆ๐š๐ซ๐ค: Different from existing single-turn math QA datasets, MathChat is the first benchmark focusing on multi-turn conversations about math. ๐Ÿ””Existing LLMs exhibit a significant decline in math reasoning ability after multi-turn conversations!.
@LiangZhenwen
Zhenwen Liang
8 months
๐Ÿš€ Excited to share our latest research MathChat! ๐Ÿ“Š We explore the new frontiers in interactive math problem-solving. Check it out! ๐Ÿงต๐Ÿ‘‡. MathChat is a benchmark designed to evaluate LLMs on mathematical multi-turn interaction and open-ended generation.
0
1
5
@wyu_nd
Wenhao Yu
1 year
0
1
5
@wyu_nd
Wenhao Yu
5 months
DSBench requires LLM systems to read user uploaded files, write and execute codes to solve data science problems. This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions. The dataset is available at.
0
1
5
@wyu_nd
Wenhao Yu
1 year
We improve current RALM on two aspects: (1) Noise Robustness: The ability to discern and disregard noisy information present in irrelevant retrieved documents, (2) Unknown Robustness: The ability to acknowledge its limitations by responding with โ€œunknownโ€ (1/4)
Tweet media one
1
0
5
@wyu_nd
Wenhao Yu
2 years
Huge shoutout to our fantastic organizers for making the #KnowledgeAugmented #NLP workshop a reality! Thank you @MS_KnowledgeNLP, @wyu_nd, @Meng_CS, @ChenguangZhu2, @shuohangw, @LuWang__, and @hhsun1.
0
0
5
@wyu_nd
Wenhao Yu
7 months
Try BigCodeBench! It is the next generation of HumanEval.
@terryyuezhuo
Terry Yue Zhuo
8 months
In the past few months, weโ€™ve seen SOTA LLMs saturating basic coding benchmarks with short and simplified coding tasks. It's time to enter the next stage of coding challenge under comprehensive and realistic scenarios! . -- Here comes BigCodeBench, benchmarking LLMs on solving
Tweet media one
0
0
3
@wyu_nd
Wenhao Yu
1 year
@ZhiruoW This is a great work! We also noticed irrelevant context could hurt model performance in industry applications. We just released a paper yesterday, with similar goal, to improve noise robustness in RAG.
0
1
4
@wyu_nd
Wenhao Yu
2 years
@zhihz0535 just presented the work this morning. If you missed it but interested in related research, DM us and we are happy to chat!
Tweet media one
0
1
4
@wyu_nd
Wenhao Yu
12 days
#NAACL2025 notifications are out! Whether your paper was accepted or not, weโ€™d love to see your work at our workshop. If your paper is accepted, you can submit your work to our non-archival track and showcase it to a broader audience. Looking forward to your submissions! โœจ.
@knowledgenlp
KnowledgeNLP Workshop @NAACL 2025
26 days
๐ŸŽ‰Excited to announce the 4th Workshop on Knowledge-Augmented NLP at NAACL 2025 in New Mexico, USA! Submission deadline: Feb 15, 2025. Eager to reconnect with old friends and welcome new faces in the Knowledge NLP community!.#NAACL2025 #NLProc
Tweet media one
0
0
4
@wyu_nd
Wenhao Yu
5 months
@nembal I think mainly due to the imbalance in the language distribution in the pre-training corpus. Knowledge embeddings aren't as well connected across different languages. I remember when I studied abroad, it took me longer to learn a new concept compared to taking a similar class.
0
0
4
@wyu_nd
Wenhao Yu
2 years
Thanks ND research for writing this great article for me ๐Ÿ˜Š.
@Meng_CS
Meng Jiang
2 years
Shout for Notre Dame's iSURE program and CSE PhD program. You may get interested in them, if you get a chance to read my student Wenhao's stories. Wenhao Yu is a rising 4th-year PhD with Bloomberg Fellowship, working on NLP / QA.
0
0
4
@wyu_nd
Wenhao Yu
2 years
If you missed the workshop, you can still find the videos and slides at (AAAI will post the video in around two weeks) and
1
0
4
@wyu_nd
Wenhao Yu
2 years
Code will be at (4/n).
1
1
4
@wyu_nd
Wenhao Yu
2 years
We also present a novel clustering-based prompting approach to generate diverse contextual documents that increases the likelihood of generating a correct answer with more generations. This approach can significantly improve performance on downstream tasks. (3/5).
1
0
4
@wyu_nd
Wenhao Yu
2 years
Code will be at very soon! (4/4).
1
0
4
@wyu_nd
Wenhao Yu
2 years
New paper ๐ŸŽ‰: Check out our new work on adaptive pretraining for logical reasoning, lead by @ssanyal8!.
@ssanyal8
Soumya Sanyal
2 years
Want to teach logical reasoning ๐Ÿ’ญ skills to LMs ๐Ÿค–?. Check out Apollo, our new adaptive pretraining strategy to improve logical reasoning in LMs. It is.(a) Simple to implement.(b) Generalizable across task formats.(c) Needs minimal data processing. Paper:
0
1
4
@wyu_nd
Wenhao Yu
2 years
New paper ๐ŸŽ‰: @lupantech Panโ€™s survey is a good summary and analysis of the recent work of language models in mathematical reasoning. If you are interested in mathematical reasoning, definitely check it out! Feedback welcome!.
@lupantech
Pan Lu
2 years
๐ŸŽ‰New paper! The survey of deep learning for mathematical reasoning (#DL4MATH) is now available. We've seen tremendous growth in this community since 2018, and this review covers the tasks, datasets, and methods from the past decade. Check it out now:
Tweet media one
0
1
4
@wyu_nd
Wenhao Yu
1 year
Work done with my colleagues at Tencent AI Seattle lab Hongming Zhang (@hongming110) Kaixin Ma (@KaixinMa9), Xiaoman Pan, Hongwei Wang, Dong Yu. (4/4).
0
0
3
@wyu_nd
Wenhao Yu
7 months
DocBench construction pipeline. (a) Document Collection: gathering PDF files from five different domains; (b) QA-pair Generation: creating diverse and comprehensive QA pairs through a combination of LLMs and human effort; (c) Quality Check: ensuring data quality through a
Tweet media one
1
0
3
@wyu_nd
Wenhao Yu
2 years
๐Ÿ‘ An impressive open-domain QA method that generalizes well on both single-hop and multi-hop setting!.
@KaixinMa9
Kaixin Ma
2 years
I'm happy to share that our paper "Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge" is now online. We proposed a unified framework for solving single&multi-hop questions that require reasoning over tables and/or text.
0
0
3
@wyu_nd
Wenhao Yu
2 years
[3/n] Empirical results: over +6.0% improvement under zero-shot settings and +2.5% under few-shot settings compared to baselines on multiple open-domain QA, dialogue benchmarks.
0
0
3
@wyu_nd
Wenhao Yu
1 year
Chain-of-note generates a series of reading notes for retrieved documents, enabling a comprehensive assessment of their relevance to the input query. We employed ChatGPT to create training data for CoN, which was subsequently trained on an LLaMa-2 7B model. (2/4).
2
0
3
@wyu_nd
Wenhao Yu
7 months
0
0
3