Yu Su @COLM'24 @ysu_nlp profile

Yu Su @COLM'24

@ysu_nlp

Followers

6,983

Following

883

Media

67

Statuses

1,103

Dist. Assist. Prof. @OhioState , Director @osunlp . I like to think about intelligence, artificial or biological, and manifest my thinking into language agents

https://t.co/rHNZAqg9s5

Columbus, OH

Joined March 2013

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Elon • 931256 Tweets

#गल_काटे_सो_कुफर_कसाई • 170666 Tweets

Michigan • 161554 Tweets

#UFC307 • 127040 Tweets

Vandy • 120254 Tweets

Watch Sant RampalJi YouTube • 119253 Tweets

Tennessee • 112383 Tweets

Dodgers • 107418 Tweets

Vanderbilt • 84022 Tweets

Dark MAGA • 81758 Tweets

トッキュウジャー • 80796 Tweets

#BUS_KnockKnockKnock_Korat • 74482 Tweets

Nico • 73472 Tweets

Pereira • 71472 Tweets

Bama • 71222 Tweets

Ohtani • 70989 Tweets

ドジャース • 66207 Tweets

プリキュア • 49368 Tweets

Khalil • 47556 Tweets

Arkansas • 37667 Tweets

DONBELLE ASAP RESURGENCE • 36863 Tweets

Aldo • 33358 Tweets

ポストシーズン • 30748 Tweets

Bautista • 26629 Tweets

Save Sanatan Dharma • 26347 Tweets

Pennington • 22963 Tweets

Ifigenia Martínez • 18954 Tweets

Machado • 18888 Tweets

京都大賞典 • 17903 Tweets

Poatan • 14451 Tweets

大谷さん • 14355 Tweets

山本由伸 • 11684 Tweets

村上骨折

そくほー

YINWAR X POLYPLUS JAD HAI

Heupel

伊達の流れ

さだちゃん

パドレス

ラウントリー

トライネン

コネライ

ウィープディライト

サブスク解禁

ペレイラ

#صباح_الخير_والحب_والسلام

マテリアクトル

Cam Ward

#الحمد_لك

貞ちゃん

Last Seen Profiles

@DomPedroPerv

@sobedrummer

@Muyamwa_Xo

@SpacifyManters

@dearlovingmasa

@HealhGlyn

@GeneseeHotel

@love_07__

@cIapsx

@IceRose_NSFW

@DemontFrancesca

@Oz_Volley

@christhoqer

@Papara

@nolough14452

@HJosephpau56320

@BobaBrewery

@Muyamwa_Xo

@StAngelasPC

@smeraldoremi

Pinned Tweet

Yu Su @COLM'24

@ysu_nlp

4 months

Super excited to introduce HippoRAG, a method I enjoyed developing the most in 2024. It’s led by my amazing student Bernal @bernaaaljg and joint with @YihengShu @yugu_nlp @michiyasunaga . Bernal’s thread gives a good technical account, so I’ll just share some personal thoughts

Bernal Jiménez @COLM'24

@bernaaaljg

4 months

📣📣 Super proud to present the most exciting project of my PhD so far: “HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models”. HippoRAG, as the title suggests, is a brain-inspired RAG framework that enables LLMs to effectively and efficiently

29

155

872

1

24

108

Yu Su @COLM'24

@ysu_nlp

10 months

Hi @emilymbender , I'm one of the lead authors of MMMU. I can certify that 1) Google didn't fund this work, and 2) Google didn't have early access. They really like the benchmark after our release and worked very hard to get the results. It doesn't take that long to eval on a

@[email protected] on Mastodon

@emilymbender

10 months

Returning to transparency, I see that they point to MMMU, which was published on arXiv (not peer reviewed) on November 27, 2023. Google must have had early access to this work, which I suspect means that Google funded it, but the paper doesn't acknowledge any funding source. /12

5

4

62

10

53

965

Yu Su @COLM'24

@ysu_nlp

9 months

Generalist web agents may get here sooner than we thought---introducing SeeAct, a multimodal web agent built on GPT-4V(ision). What's this all about? > Back in June 2023, when we released Mind2Web () and envisioned generalist web agent, a language agent

19

149

655

Yu Su @COLM'24

@ysu_nlp

10 months

@emilymbender (this will be the last response just for the record; this type of engagement is not why I use this app) 1. Dataset was released along with the paper. again, eval on a dataset of this scale really doesn't take long, especially for google 2. this was an one-off project

9

17

628

Yu Su @COLM'24

@ysu_nlp

10 months

Introducing BioCLIP: A Vision Foundation Model for the Tree of Life A foundation model that strongly generalizes on the tree of life (2M+ species), outperforming OpenAI CLIP by 18% in zero-shot classification, and supports open-ended classification over

9

94

440

Yu Su @COLM'24

@ysu_nlp

8 months

Q* from OpenAI and tree-of-thought reasoning triggered a lot of enthusiasm on augmenting LLMs' reasoning/planning capabilities with search. But is search really the panacea for LLMs? Answer from our new study @osunlp : Not quite yet. TLDR: For advanced planning methods like tree

Ziru Chen

@RonZiruChen

8 months

LLM planning methods, such as tree search, are critical for complex problem solving, but their practical utility can depend on the discriminator used with them. Check out our new findings: (1/6)

5

45

188

5

82

351

Yu Su @COLM'24

@ysu_nlp

7 months

There are numerous environments LLMs are not trained for: enterprise DBs, proprietary knowledge graphs, etc. The de facto solution for applying LLMs in such scenarios is RAG. We envision a new framework: creating a Middleware for LLMs. Just like humans invented tools to help us

Yu Gu

@yugu_nlp

7 months

🧐Question: How to turn your LLM into a generalist agent interacting with complex real-world environments? 🙌Answer: Equip the LLM with specialized tools (Fig 1)! We found such tools boost GPT-4 by about 2.8x on database tasks and 2.2x on KB tasks. (1/n)

5

56

255

1

51

280

Yu Su @COLM'24

@ysu_nlp

16 days

Has OpenAI o1 solved ‘reasoning’? Surprisingly, it’s equally bad as GPT-4 on some simple comparative reasoning tasks, while a grokked transformer can solve near perfectly. Improving implicit/parametric reasoning is still necessary when you have a search space with an

Boshi Wang

@BoshiWang2

16 days

Can OpenAI o1 tackle hard reasoning problems? We tested it on the complex reasoning task in our Grokked Transformers paper. It turns out that o1-preview also struggles a lot like earlier LLMs; on the other hand, a grokked transformer can nail it near-perfectly.

14

78

539

9

46

274

Yu Su @COLM'24

@ysu_nlp

4 months

Honored to receive the Best Student Paper Award from #CVPR2024 !! It’s @samstevens6860 and Lisa’s very first lead work in their PhD. Super glad for the recognition of their work! Also congrats to all the amazing collaborators and support from the NSF @imageomics institute!

#CVPR2024

@CVPR

4 months

#CVPR2024 Best Paper Awards

1

15

153

22

19

203

Yu Su @COLM'24

@ysu_nlp

2 years

❓How to ground language models like ChatGPT to real-world environments 📢📢We present Pangu, a generic neuro-symbolic framework for grounded language understanding, where a symbolic agent and a neural LM work in a concerted way. ➡️ SOTA on KBQA & few-shot KBQA with Codex (1/n)

4

52

195

Yu Su @COLM'24

@ysu_nlp

10 months

🧠Expert AGI is still far away Background Candid and constructive discussions on AGI have been challenging due to a lack of shared operationalizable definitions. @merrierm et al. @GoogleDeepMind recently proposed a leveled taxonomy for AGI that centers around both generality

Xiang Yue✈️@COLM 2024

@xiangyue96

10 months

🚀 Introducing MMMU, a Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. 🧐 Highlights of the MMMU benchmark: > 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks >

19

184

748

5

38

192

Yu Su @COLM'24

@ysu_nlp

8 months

Thanks @_akhaliq for sharing our work. So, let's talk about planning. Planning is a hallmark of human intelligence. It is an evolutionary feat built upon numerous other capacities: > using various tools to iteratively collect information and make decisions > recording

AK

@_akhaliq

8 months

TravelPlanner A Benchmark for Real-World Planning with Language Agents paper page: Planning has been part of the core pursuit for artificial intelligence since its conception, but earlier AI agents mostly focused on constrained settings because many of

12

123

522

3

37

155

Yu Su @COLM'24

@ysu_nlp

9 months

It’s just the start of 2024, and we have already seen many important results from synthetic data. Personally I don’t think it’s just “data augmentation” rebranded. Previous data augmentation efforts rely heavily on “human engineering”, and now it’s more like LLMs “imagination”

Jason Weston

@jaseweston

9 months

🚨New paper!🚨 Self-Rewarding LMs - LM itself provides its own rewards on own generations via LLM-as-a-Judge during Iterative DPO - Reward modeling ability improves during training rather than staying fixed ...opens the door to superhuman feedback? 🧵(1/5)

5

227

1K

6

15

154

Yu Su @COLM'24

@ysu_nlp

1 year

🎉 Thrilled to share @osunlp has 8 papers accepted to #ACL2023 (out of 11 subs), and 3 of the papers received best paper nomination by reviewers. We don't normally submit this many papers and grateful it turns out well 🥰 Equally proud of the papers that didn't get in this time!

3

19

137

Yu Su @COLM'24

@ysu_nlp

2 years

Super honored and excited to be appointed as Distinguished Assistant Professor of Engineering Inclusive Excellence by @OSUengineering ! It will support me to continue pushing towards the democratization of AI in research and teaching.

4

137

Yu Su @COLM'24

@ysu_nlp

2 months

I will be attending #ACL2024nlp from 12th to 16th. Happy to chat about language agents. I'll help present: 1. LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error (led by @BoshiWang2 ) 2. When is Tree Search Useful for LLM Planning? It

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily...

arxiv.org

7

20

127

Yu Su @COLM'24

@ysu_nlp

7 months

I’m looking for a summer research intern at Microsoft Semantic Machines to work on agent-related topics. If you are a current PhD student with a track record in this area and still looking for internship, apply here and let me know (DM):

Yu Su @COLM'24

@ysu_nlp

7 months

Thanks @_akhaliq for sharing our work led by @BoshiWang2 from @osunlp , so let's chat about how LLMs should learn to use tools, a necessary capability of language agents. Tools are essential for LLMs to transcend the confines of their static parametric knowledge and

1

27

114

6

26

122

Yu Su @COLM'24

@ysu_nlp

1 year

For people interested in language agents (aka LLM agents, AI agents, autonomous agents), excited to share that @Diyi_Yang @ShunyuYao12 @taoyds and I will present a tutorial on “Language Agents: Foundations, Prospects, and Risks” at #EMNLP 2024! Stay tuned for materials!

ACL 2025

@aclmeeting

1 year

EMNLP 2024 (4/4) @emnlpmeeting - Language Agents: Foundations, Prospects, and Risks. Yu Su, Diyi Yang, Shunyu Yao and Tao Yu. - Enhancing LLM Capabilities Beyond Scaling Up. Wenpeng Yin, Muhao Chen, Rui Zhang, Ben Zhou, Fei Wang and Dan Roth. #NLProc

0

4

33

0

19

119

Yu Su @COLM'24

@ysu_nlp

7 months

Thanks @_akhaliq for sharing our work led by @BoshiWang2 from @osunlp , so let's chat about how LLMs should learn to use tools, a necessary capability of language agents. Tools are essential for LLMs to transcend the confines of their static parametric knowledge and

AK

@_akhaliq

7 months

Microsoft presents LLMs in the Imaginarium Tool Learning through Simulated Trial and Error Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs

7

46

217

1

27

114

Yu Su @COLM'24

@ysu_nlp

6 months

Excited that we have two papers selected as Oral at #CVPR2024 (90 orals in total, 0.8%). Congrats to all the students and collaborators, and see you in Seattle! - BioCLIP: A Vision Foundation Model for the Tree of Life () led by @samstevens6860 Lisa Wu -

2

15

112

Yu Su @COLM'24

@ysu_nlp

10 months

Cant agree more with #4 and #6 . My personal prediction of AI keywords for 2024 (including ones that are obvious so @Thom_Wolf omitted): 1. Langage agents We will see more robust language agents starting to be deployed for non-toy use cases. After the initial hype in 2023, we

Thomas Wolf

@Thom_Wolf

10 months

Some predictions for 2024 – keeping only the more controversial ones. You certainly saw the non-controversial ones (multimodality, etc) already 1. At least 10 new unicorn companies building SOTA open foundation models in 2024 Stars are so aligned: - a smart, small and dedicated

19

77

408

1

15

104

Yu Su @COLM'24

@ysu_nlp

1 year

Geoffrey Hinton ( #ACL2023NLP keynote address): LLMs’ hallucinations should be called confabulations

2

20

99

Yu Su @COLM'24

@ysu_nlp

3 months

That figure was based on my thoughts from ~1 year ago. After working on language agents extensively in the past year, I have come to realize that I was a bit too naive back then. I have an updated conceptual framework for language agents that I've been talking about lately (at

Pooja Voladoddi

@pvolad

3 months

This slide from @ysu_nlp 's talk is extremely helpful to visualize all the different terminologies related to "language agents" as leveraged within a framework. #NLProc #LLMs #AI

1

0

9

3

19

99

Yu Su @COLM'24

@ysu_nlp

1 year

Excited to give a keynote on "Language agents: a critical evolutionary step of artificial intelligence" tomorrow at the LLM workshop () @IJCAIconf . This is the problem I cannot stop thinking about these days. Join us if you are attending IJCAI!

1

26

95

Yu Su @COLM'24

@ysu_nlp

9 months

Thrilled to share that we @osunlp have 5 papers accepted to #ICLR2024 (2 spotlights/3 posters), covering LLM knowledge conflicts, math LLMs, language agents, interpretable transformer, and instruction tuning. Interestingly it's my first ICLR papers. Glad to get 5 firsts! 🧵

2

13

92

Yu Su @COLM'24

@ysu_nlp

1 year

What would be the most wild environment for grounding & empowering LLMs? 👉The entire Internet! 📢 Mind2Web: Towards a Generalist Agent for the Web () Led by amazing @osunlp student @XiangDeng1 #NLProc

AK

@_akhaliq

1 year

Mind2Web: Towards a Generalist Agent for the Web paper page: introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing

14

107

485

4

27

89

Yu Su @COLM'24

@ysu_nlp

10 months

I will be at NeurIPS (12/10-16). Happy to chat about language agents, multimodality, and LLMs in general. w/ a drink, we can even chat about AGI. I (and @osunlp more broadly) am also recruiting PhDs and postdocs. some recent work for interest alignment👇👇

2

10

88

Yu Su @COLM'24

@ysu_nlp

4 months

thanks @_akhaliq for sharing our work led by the amazing @BoshiWang2 from @osunlp . This is one of my favorite work so far. I'm excited that several important topics: 1) inductive learning of latent deduction rules, 2) implicit/parametric reasoning of neural networks, and 3)

AK

@_akhaliq

4 months

Grokked Transformers are Implicit Reasoners A Mechanistic Journey to the Edge of Generalization We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two

7

49

307

2

19

86

Yu Su @COLM'24

@ysu_nlp

3 years

Overleaf is down. It survived the ACL deadline (7 am ET today) but didn’t survive the CVPR deadline (EOD PT today). That says something about the size of the NLP community vs. the CV community?

6

2

82

Yu Su @COLM'24

@ysu_nlp

4 years

GrailQA: A new large-scale, high-quality dataset for QA on knowledge bases - 64K questions over 3.7K relations across 86 domains - Support evaluation of i.i.d., compositional, and zero-shot generalization paper: leaderboard:

Beyond I.I.D.: Three Levels of Generalization for Question...

Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution....

arxiv.org

1

7

82

Yu Su @COLM'24

@ysu_nlp

5 months

New #ACL2024 paper: LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error ( @BoshiWang2 's internship work at Microsoft Semantic Machines) I like this work because it takes home an important insight: synthetic data + post-training is critical for agents.

Yu Su @COLM'24

@ysu_nlp

7 months

Thanks @_akhaliq for sharing our work led by @BoshiWang2 from @osunlp , so let's chat about how LLMs should learn to use tools, a necessary capability of language agents. Tools are essential for LLMs to transcend the confines of their static parametric knowledge and

1

27

114

1

17

80

Yu Su @COLM'24

@ysu_nlp

3 months

As an AC for the first @COLM_conf , I had the pleasure to observe a good portion of the review and decision making process. I’d say COLM’s review quality is on par with some of the best AI confs, and the PCs spent a ton of time discussing with ACs to make deliberate decisions.

Yoav Artzi (COLM-ing)

@yoavartzi

3 months

Some stats: @COLM_conf 2024 (the first COLM ever!!!) received 1,036 submissions. We have accepted 299 submissions for presentation at the conference, an acceptance rate of 28.8%.

5

14

138

2

9

78

Yu Su @COLM'24

@ysu_nlp

1 year

Honored that our paper Pangu received the Outstanding Paper Award at #ACL2023NLP ! We propose a new framework based on search guided by LMs for language agents It's interesting to notice the conceptual sim. b/w Pangu and recent Tree of Thought (great work by @Shunyu ) except:

Yu Su @COLM'24

@ysu_nlp

2 years

❓How to ground language models like ChatGPT to real-world environments 📢📢We present Pangu, a generic neuro-symbolic framework for grounded language understanding, where a symbolic agent and a neural LM work in a concerted way. ➡️ SOTA on KBQA & few-shot KBQA with Codex (1/n)

4

52

195

8

4

78

Yu Su @COLM'24

@ysu_nlp

2 years

Booked the first in-person conference trip to @CVPR as faculty. Total cost easily exceeds $4,000 for a domestic trip. In my student days $2,000 would be more than enough. Is it normally this expensive or should I blame inflation? Surely need to change the budget in my proposals…

7

0

70

Yu Su @COLM'24

@ysu_nlp

2 months

It's the last day of the conference, and Bangkok has so much to explore. i know, i know.. but if you still have some appetite for a (hopefully thought-provoking) talk, I will talk about web/GUI agents tomorrow at 2 pm in the SpLU-RoboNLP workshop. I will cover: 1. Why the

Parisa Kordjamshidi

@Kordjamshidi

2 months

As ACL kicks off in beautiful Bangkok, check SpLU-RoboNLP workshop , with an exciting lineup of speakers and presentations! @aclmeeting #NLProc #ACL2024NLP @yoavartzi @malihealikhani @ziqiao_ma @zhan1624 @xwang_lk @Merterm @dan_fried @ManlingLi_ @ysu_nlp

0

4

29

1

18

70

Yu Su @COLM'24

@ysu_nlp

6 months

2024 is the year of synthetic data and multimodal LLMs. What if we combine the two? The data on the Internet is just the tip of an iceberg. Lots of relations, structures, meaning are hidden between the lines (the 'dark matter'). We show that (multimodal) LLMs can uncover this

Kai Zhang

@DrogoKhal4

6 months

Proud to present 🔍MagicLens: image retrieval models following open-ended instructions. 🌟Highlights of 🔍MagicLens: >🧐Novel Insights: Naturally occurring image pairs on the same web page contain diverse image relations (e.g., inside and outside views

14

59

187

0

11

70

Yu Su @COLM'24

@ysu_nlp

1 year

Language played a critical role in the evolution of biological intelligence, and artificial intelligence may be following a similar evolutionary path. 👉New blog post: Language agents: a critical evolutionary step of artificial intelligence What are

1

28

66

Yu Su @COLM'24

@ysu_nlp

7 months

Advanced GPU compute is vital to cutting-edge AI. Thrilled that 128 H100 GPUs are being added to the Ohio Supercomputer Center, a major and timely investment by the State of Ohio and @OhioState . The new cluster will substantially propel the research at @osunlp and OSU at large.

Ohio Supercomputer Center

@osc

8 months

OSC is excited to announce 'Cardinal', our major supercomputing cluster launching in 2024. This high performance computing cluster will provide vital resources to support the growing AI needs in Ohio.

2

9

1

6

64

Yu Su @COLM'24

@ysu_nlp

7 months

New multimodal web agent paper to appear at #CVPR2024 👇 While it's now well accepted that visual signals are critical for web agents, as shown in SeeAct and several recent papers, we actually have an even earlier work on multimodality that had been silently under review at

Jihyung Kil

@Jihyung_Kil

7 months

How can we effectively and efficiently encode semantically related and task-related contexts for web agents? Check out our Dual-VCR accepted to #CVPR2024 ! Summary: DUAL-VCR enhances the context of each HTML element in the document by leveraging its

1

12

46

0

12

60

Yu Su @COLM'24

@ysu_nlp

5 months

Quoting @YiMaTweets "It is industry's job to find how to do better, but academia is to find out how to do it right." While I think there're lots of good industry research doing things right, when it comes to reseach on agents, I do think academia has unique freedom to explore how

Graham Neubig

@gneubig

5 months

We're having a big event on agents at CMU on May 2-3 (one week from now), all are welcome! It will feature: * Invited talks from @alsuhr @ysu_nlp @xinyun_chen_ @MaartenSap and @chris_j_paxton * Posters of cutting edge research * Seminars and hackathons

4

29

180

2

5

61

Yu Su @COLM'24

@ysu_nlp

10 days

The Grokked Transformers paper is accepted to #neurips2024 @NeurIPSConf ! If interested in reasoning, mech interp, or grokking, check it out!

Yu Su @COLM'24

@ysu_nlp

4 months

thanks @_akhaliq for sharing our work led by the amazing @BoshiWang2 from @osunlp . This is one of my favorite work so far. I'm excited that several important topics: 1) inductive learning of latent deduction rules, 2) implicit/parametric reasoning of neural networks, and 3)

2

19

86

1

11

67

Yu Su @COLM'24

@ysu_nlp

3 years

We are hiring 20 new tenure-track faculty at all levels and in all CSE areas! Review of materials will start on Sept 15, 2021 on a rolling basis.

Huan Sun (OSU)

@hhsun1

3 years

The Department of Computer Science and Engineering () at The Ohio State University @OSUengineering has *20* tenure-track faculty positions open 👏. Retweet is appreciated! @AcademicJobs @ajobsonline @csfacultyjobs Check more:👇

9

159

289

1

8

56

Yu Su @COLM'24

@ysu_nlp

4 years

Join us for the Workshop on Natural Language Interfaces #NLI2020 on July 10 at #ACL2020 , featuring a stellar multi-disciplinary lineup of speakers: Joyce Chai, H V Jagadish, @MonicaSLam , @percyliang , @LukeZettlemoyer , Imed Zitouni from NLP/DB/PL/Robotics.

1

11

55

Yu Su @COLM'24

@ysu_nlp

10 days

HippoRAG is accepted to #neurips2024 @NeurIPSConf ! If interested in Graph+RAG, memory for LLMs, or neurobiological inspired AI, check it out!

Yu Su @COLM'24

@ysu_nlp

4 months

Super excited to introduce HippoRAG, a method I enjoyed developing the most in 2024. It’s led by my amazing student Bernal @bernaaaljg and joint with @YihengShu @yugu_nlp @michiyasunaga . Bernal’s thread gives a good technical account, so I’ll just share some personal thoughts

1

24

108

0

6

60

Yu Su @COLM'24

@ysu_nlp

1 year

#ACL2023NLP We @osunlp will present 9 papers at ACL (7 main, 1 findings, 1 workshop), covering grounding LMs to real-world environments, chain-of-thought prompting, differential privacy, code generation, federated learning, and information extraction. Come chat w/ us! A thread 🧵

1

8

52

Yu Su @COLM'24

@ysu_nlp

1 year

🎉Super excited that Mind2Web has been accepted to NeurIPS as spotlight! Congrats to the amazing team @XiangDeng1 @yugu_nlp @boyuan__zheng @ShijieChen98 @samstevens6860 @hhsun1

Yu Su @COLM'24

@ysu_nlp

1 year

What would be the most wild environment for grounding & empowering LLMs? 👉The entire Internet! 📢 Mind2Web: Towards a Generalist Agent for the Web () Led by amazing @osunlp student @XiangDeng1 #NLProc

4

27

89

5

6

51

Yu Su @COLM'24

@ysu_nlp

9 months

@Diyi_Yang @taoyds @ShunyuYao12 and I will be giving a tutorial on language agents at EMNLP. Join us at Miami for a warm and agentic winter retreat!

Barbara Plank

@barbara_plank

10 months

Next EMNLP will be in Miami 🌴 #NLProc

1

15

133

0

4

49

Yu Su @COLM'24

@ysu_nlp

2 years

Everyday I appreciate more how #ChatGPT helps daily tasks and saves ton of time: So a certain application requires formatting all of my publications into a specific format. Normally I'd have to spend ~1 hour copy&paste&reformat, but now ChatGPT just gets it done perfectly!

1

3

50

Yu Su @COLM'24

@ysu_nlp

8 months

I don't normally write position papers, but safety of language agents is important enough to warrant an exception: 📜 "A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents" People (including myself) have been enthusiastically developing language

Lingbo Mo

@LingboMo

8 months

🚀 Language agents fueled by LLMs are rapidly advancing. Their capability is being further capitalized by connecting to a wide range of external components such as databases, tools, the Internet, robotic embodiment, etc. However, our understanding of their safety risks lags much

0

17

57

1

7

49

Yu Su @COLM'24

@ysu_nlp

3 months

HippoRAG just became our first repo from @osunlp to cross 1K 🌟! If you are interested in (knowledge) graph + RAG and find GraphRAG costly to run, give HippoRAG a try! We are a small academic team but we will actively maintain and develop better versions. Kudos to @bernaaaljg

1

4

47

Yu Su @COLM'24

@ysu_nlp

1 year

Slides here: Highlights: 1. Analogous evolutionary path of biological and artificial intelligence 2. A conceptual framework for language agents 3. Now is most exciting time for NLP ever, but maybe for natural language programming, not just processing

Yu Su @COLM'24

@ysu_nlp

1 year

Excited to give a keynote on "Language agents: a critical evolutionary step of artificial intelligence" tomorrow at the LLM workshop () @IJCAIconf . This is the problem I cannot stop thinking about these days. Join us if you are attending IJCAI!

1

26

95

0

13

47

Yu Su @COLM'24

@ysu_nlp

1 year

🎉 Super excited to share that MagicBrush has been accepted to #NeurIPS 2023! Congrats to the amazing team @DrogoKhal4 @LingboMo @WenhuChen @hhsun1

AK

@_akhaliq

1 year

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing paper page: Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are

13

160

770

2

11

47

Yu Su @COLM'24

@ysu_nlp

1 month

AI agents have great potential for assisting in sciences, but please don’t hype it up as if they will replace scientists. No, they are best working as copilots for scientists, if done right.

Jimmy Koppel

@jimmykoppel

1 month

Everyone's talking about Sakana's AI scientist. But no-one's answering the big question: is its output good? I spent hours reading its generated papers and research logs. Read on to find out

31

182

1K

2

5

45

Yu Su @COLM'24

@ysu_nlp

1 year

We have openings for postdoc at @osunlp : - (Multimodal) foundation models for sciences - Large language models - Grounding to environments (Web, DBs, KBs, physical world via embodied agents) - Language agents and tool use Email/DM or chat at #ACL2023NLP . Retweet appreciated!

Yu Su @COLM'24

@ysu_nlp

1 year

#ACL2023NLP We @osunlp will present 9 papers at ACL (7 main, 1 findings, 1 workshop), covering grounding LMs to real-world environments, chain-of-thought prompting, differential privacy, code generation, federated learning, and information extraction. Come chat w/ us! A thread 🧵

1

8

52

0

24

44

Yu Su @COLM'24

@ysu_nlp

2 years

“Just throw more data at it” has become a panacea in AI, but what if more data can actually hurt? In our #EMNLP2022 paper, we found that, for conversational AI systems, the more data we have in total, the more data we need to learn a new capability -> a vicious cycle.

Elias Stengel-Eskin

@EliasEskin

2 years

🚨 New paper alert 🚨 Now that it's accepted to EMNLP, I'm excited to share work from my internship last year at @MSFTResearch Semantic Machines! 📄: 🤖: This all started with a troubling trend first noticed by @ysu_nlp 🧵 1/9

1

10

80

1

2

44

Yu Su @COLM'24

@ysu_nlp

10 months

#NeurIPS2023 We will present at the 5-7 pm session tmr: > MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing > Mind2Web: Towards a Generalist Agent for the Web Come chat about agents, multimodality, and

Yu Su @COLM'24

@ysu_nlp

1 year

What would be the most wild environment for grounding & empowering LLMs? 👉The entire Internet! 📢 Mind2Web: Towards a Generalist Agent for the Web () Led by amazing @osunlp student @XiangDeng1 #NLProc

4

27

89

1

8

44

Yu Su @COLM'24

@ysu_nlp

3 years

I'm hiring a postdoc (ML/CV/NLP) for @imageomics Institute. Topics: multimodal foundation models, interpretability, and neuro-symbolic learning. Also work with Profs Berger-Wolf/Chao. Pls help share! Job post: #postdocjobs #postdocposition #NLProc

5

28

43

Yu Su @COLM'24

@ysu_nlp

2 years

Check out Microsoft M365 Co-Pilot, which really shows the power of LLMs + Grounding! Really glad that the Semantic Machines team is a part of this exciting journey!

1

4

45

Yu Su @COLM'24

@ysu_nlp

4 months

I'm at #CVPR2024 /Seattle until 06.22. Happy to share 3 papers on multimodality/LLMs, including two Oral presentations and best paper finalists. You can find me at the corresponding poster sessions. Happy to chat about everything around language agents and career opportunities

2

5

44

Yu Su @COLM'24

@ysu_nlp

5 months

The absolute number is meaningless when it comes to publications, but this is a really proud advisor moment: We @osunlp started submitting to ML conferences last year. In the past cycle, we had 3 NeurIPS, 5 ICLR, and 4 ICML accepted. How lucky I am to work with so many amazing

OSU NLP Group

@osunlp

5 months

First-time ICMLer here. Happy to share that OSU NLP group has 4 papers accepted to #ICML2024 , spanning multimodal web agents (SeeAct), planning (TravelPlanner), open-ended image retrieval (MagicLens), and e-commerce LLMs (eCeLLM) 🧵

1

3

27

0

1

44

Yu Su @COLM'24

@ysu_nlp

4 years

The first paper from Microsoft Semantic Machines is finally out! We introduce a new representation paradigm and model dialogues as dataflow graphs, which supports persistent context, compositional learning, and many more cool things! See more in blog post (leaderboard included)

Microsoft Research

@MSFTResearch

4 years

Researchers at Microsoft Semantic Machines are taking a new approach to conversational AI—modeling dialogues with compositional dataflow graphs. Learn how the framework supports flexible, open-ended conversations, and explore the dataset and leaderboard:

3

72

561

0

7

44

Yu Su @COLM'24

@ysu_nlp

2 years

An applaudable effort from Meta AI (kudos to @LukeZettlemoyer and his team) on releasing the parameters and training logs of GPT-3 scale pre-trained LMs. The extensive evaluation and discussion on ethics and limitations are also impressive. #NLProc

2

5

42

Yu Su @COLM'24

@ysu_nlp

5 months

New #ACL2024 paper that essentially argues that tree search is not the panacea for LLM planning. For tree search to be useful, it needs a strong (90%) discriminator to rank the hypotheses in the search frontier. However, for many problems discrimination is no easier than

Ziru Chen

@RonZiruChen

5 months

Our paper is accepted at #ACL2024NLP main conference! Check out our code and models:

0

8

29

2

10

41

Yu Su @COLM'24

@ysu_nlp

2 years

📢📢 Excited to share that our ArcaneQA paper (led by my fantastic student @yugu_nlp ) for more generalizable and efficient QA over large knowledge graphs has won Outstanding Paper Award at #COLING2022 ! Check thread for more details 👇👇 @osunlp #NLProc

Yu Gu

@yugu_nlp

2 years

I'll be presenting our latest work on KBQA (by me and @ysu_nlp ) at COLING'22 virtually 😃 ArcaneQA is a generation-based model that handles both the large search space and schema linking challenges with a unified framework based on encoder-decoder (1/5)

1

5

28

2

39

Yu Su @COLM'24

@ysu_nlp

1 year

Language agents are a critical evolutionary step of AI, but evaluation has been a bit challenging and ad-hoc. Our @osunlp team is glad to contribute to AgentBench led by @thukeg for developing a comprehensive benchmark, which includes 3 of our datasets, Mind2Web, GrailQA, and

Xiao Liu (Shaw)

@ShawLiu12

1 year

Thanks @arankomatsuzaki for sharing our paper #AgentBench ! 🤯Static NLP datasets are not enough for evaluating existing LLMs 🌟We should test them in practical interactive environments for agents! Find more videos for LLM-as-Agent in AgentBench at !

1

22

67

0

5

40

Yu Su @COLM'24

@ysu_nlp

8 months

I always encourage students to try to first get “golden-path results” as early as possible in a project—preliminary results under the minimal idealized setting that could signal the viability of the core idea. I find this to be particularly important for bold, high-risk

Jason Wei

@_jasonwei

8 months

An incredible skill that I have witnessed, especially at OpenAI, is the ability to make “yolo runs” work. The traditional advice in academic research is, “change one thing at a time.” This approach forces you to understand the effect of each component in your model, and

103

210

2K

0

1

39

Yu Su @COLM'24

@ysu_nlp

9 months

I think 2024 will be when we see many language agents mature enough for practical deployment and generate real value, but research on fundamentally more capable agents are probably still at the start of an exponential curve

anton

@abacaj

9 months

Agents will largely be solved this year. The ones that operate a desktop, browser or phone automatically when given a task. I think it’s just a matter of time

65

69

1K

2

0

40

Yu Su @COLM'24

@ysu_nlp

2 years

For people @EMNLP who don’t know this: Etihad Airlines requires check-in at least 2 hours before international flights, so get there early — from someone who just missed the flight

7

2

40

Yu Su @COLM'24

@ysu_nlp

6 months

@DrJimFan I would argue that even GPT-4V’s visual understanding still has many limitations for complex tasks like embodied or web agents. We need to rethink how we build multimodal LLMs.

3

1

39

Yu Su @COLM'24

@ysu_nlp

4 months

@TsingYoga 写得不错。这也是在simulated trial and error paper () 里我们希望强调的一点。Agent训练需要perception-decision-execution过程的数据，而互联网上的数据大部分只是结果缺乏过程

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily...

arxiv.org

2

0

38

Yu Su @COLM'24

@ysu_nlp

9 months

Many people are enthusiastically waiting for the code release of SeeAct, and now it's finally here, as promised! Even better, we spent a lot of time to make it as ergonomic as possible. A powerful web agent based on GPT-4V (and more) is only 1 click away. Have fun and let us know

Boyuan Zheng @COLM'24

@boyuan__zheng

9 months

🚀Releasing SeeAct v0.1.0, generalist web agents at your fingertips! 💻Github Repo: SeeAct enables everyone to effortlessly run web agents based on GPT-4V (and more) with just one click. We hope it can help make the complex Internet a more accessible

2

47

178

0

5

35

Yu Su @COLM'24

@ysu_nlp

3 months

As I reflect on my experience hosting benchmarks in the past year (MMMU, Mind2Web, TravelPlanner, etc.), I agree with @chrmanning ’s suggestion on how to split a dataset/benchmark in the LLM era. A few thoughts/elaborations: 1. It is important to maintain ‘iid’ across the splits,

Christopher Manning

@chrmanning

3 months

I agree: private test sets sound good but end up a fail.¹ How should you make a dataset? Define train, dev & test sets. Make dev² & test 2x the size for good statistical power. Divide them iid; make half test the public official test set & keep half as private verification set.³

4

17

161

2

9

36

Yu Su @COLM'24

@ysu_nlp

1 year

come join us at Microsoft Semantic Machines for exciting and fun internship projects around LLMs and conversational AI!

Benjamin Van Durme

@ben_vandurme

1 year

Do you think there's more to conversational AI than cutting and pasting text into ChatGPT? Apply for a Summer '24 research internship at Microsoft Semantic Machines!

2

16

61

1

3

37

Yu Su @COLM'24

@ysu_nlp

2 months

GPT-4o mini and Claude-3.5-sonnet are (surprisingly?) strong on a range of agent tasks. What is the minimal model size for a well-functioning generalist agent? Maybe it’s not as large as I thought.

Xiao Liu (Shaw)

@ShawLiu12

2 months

🚨Thrilled to present VisualAgentBench (VAB) with @yugu_nlp and Tianjie, where we enable both TRAINING & TESTING of visual foundation agents across 5 different environments! In all 17 large multimodal models (LMMs) are tested. Find our paper, data, and more insights below 👇

1

16

46

2

6

37

Yu Su @COLM'24

@ysu_nlp

1 year

#ICCV2023 LLM-Planner: LLM + hierarchical planning + grounding = robot brain🧠 Using LLMs for the right task is key > Robot planning (and many similar planning tasks) is highly environment dependent: same instruction needs vastly different plans in

0

10

37

Yu Su @COLM'24

@ysu_nlp

5 months

Glad to see MMMU being integrated into HELM. Gemini 1.5 Pro working (slightly) better than GPT-4V is aligned with our experience in using these models in various vision-language tasks

Percy Liang

@percyliang

5 months

HELM is now fully multimodal! In addition to language models, text-to-image models (HEIM), we now evaluate vision-language models (made possible by MMMU, VQAv2, VizWiz - thanks to the authors!). As usual, the full predictions and prompts are available on the HELM website:

0

4

20

0

5

36

Yu Su @COLM'24

@ysu_nlp

6 months

I whole-heartedly second @NandoDF ’s recommendation. This is THE book I recommend to my students @osunlp interested in getting a first conceptual framework about what is intelligence and how it comes about from evolution.

Nando de Freitas

@NandoDF

6 months

This is by far the best non-technical Natural and Artificial Intelligence book anyone could read. This comprehensive, well-researched, crisply clear, sharply focused and illuminating book is a thing of beauty. It is the book I wish I had had when I started my AI career 30 years

18

131

865

1

3

36

Yu Su @COLM'24

@ysu_nlp

11 months

Table LLMs just got a huge boost with TableLlama! The best part? Everything will be open-source!

Tianshu Zhang

@Tianshu_OSU

11 months

Can #LLMs excellently handle various table-based tasks? 📢Introducing TableLlama and TableInstruct: the FIRST open-source generalist #LLMs and instruction tuning dataset for tables. 🌟Strong performance on both in-domain & out-of-domain settings. #NLProc

7

19

96

2

7

36

Yu Su @COLM'24

@ysu_nlp

3 months

Well said, and precisely the point we want to make in the paper: When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

When is Tree Search Useful for LLM Planning? It Depends on the...

In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method. We...

arxiv.org

jack morris

@jxmnop

4 months

an underdiscussed gotcha behind the “search + LLM = AGI” narrative is search is only valuable when statewide improvements are *quantifiable* this is the case in Go, and coding problems w/ tests, and this ARC benchmark. we can explore the (LLM-generated) state space and leverage

52

28

329

1

4

36

Yu Su @COLM'24

@ysu_nlp

6 months

These folks @taoyds @TianbaoX etc. are serious when it comes to agent benchmarks. Excited to have an agent benchmark with an OS simulator to play with!

Aran Komatsuzaki

@arankomatsuzaki

6 months

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating

4

88

348

1

34

Yu Su @COLM'24

@ysu_nlp

2 years

A great pleasure to host @RishiBommasani 's visit at OSU! Fantastic talk (Castles in the sky: Towards sturdy foundation models) and tons of fun discussion! Lots of people joining in person/online to engage in this intriguing topic. #nlproc @osunlp @stanfordnlp w/ @hhsun1 Harry Chao

1

4

32

Yu Su @COLM'24

@ysu_nlp

1 year

@_akhaliq Thank you for sharing our work from OSU NLP group @osunlp ! Project page: Led by our amazing students @DrogoKhal4 @LingboMo and joint with @WenhuChen @hhsun1

AK

@_akhaliq

1 year

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing paper page: Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are

13

160

770

0

8

28

Yu Su @COLM'24

@ysu_nlp

11 months

Attention as used in Transformer has long been criticized for providing unfaithful interpretation ☹️. Do you know it's possible to make attention a faithful interpretation mechanism? 🤔 📢📢 Check out our new work on INterpretable TRansformer (INTR) 👇👇

Dipanjyoti Paul

@paul_djay

11 months

Can we make a standard classification architecture interpretable and meaningful? We introduce a novel application of Transformers for interpretable image classification, which we refer to as the INterpretable TRansformer (INTR) and is available at . (1/5)

2

9

17

0

7

31

Yu Su @COLM'24

@ysu_nlp

10 months

Congratulations on the long awaited release of Gemini, outperforming GPT-4V(ision) and setting a new SOTA on our MMMU multimodal reasoning benchmark. Now the game is on!

Jeff Dean (@🏡)

@JeffDean

10 months

I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,

273

3K

13K

0

2

30

Yu Su @COLM'24

@ysu_nlp

2 years

I will be at @emnlpmeeting 12/6-12/12. Looking forward to meeting old and new friends! Come chat if interested in language interfaces, grounding large LMs, embodied agents, or academic vs industry research! #NLProc

3

2

29

Yu Su @COLM'24

@ysu_nlp

1 year

Our new work shows that equipping (through instruction tuning) LLMs with the capability of thinking in both natural and programming languages is key to general math problem solving. awesome project led by @WenhuChen and our own @xiangyue96 !

Wenhu Chen

@WenhuChen

1 year

Excited to introduce our latest math generalist model MAmmoTH 🦣, built through instruction tuning. We proposed hybrid "chain-of-thought" & "program-of-thought" training to supercharge LLMs' math reasoning capabilities. 🦣 beats the open SoTA by 20+% on many datasets like MATH.

9

41

256

1

3

29

Yu Su @COLM'24

@ysu_nlp

2 years

This year in my Intro to AI course I experimented with a virtual poster session for final project presentations in Gather and open it up to the entire department. It was so much more interactive! Blown away by the breadth and depth of the projects. Multiple could be research pubs

1

0

29

Yu Su @COLM'24

@ysu_nlp

8 months

A new well-curated web agent benchmark from @sivareddyg ’s team. Check it out!

Siva Reddy

@sivareddyg

8 months

Introducing WebLINX 🐯, a large benchmark for AI agents navigating real websites with multi-turn dialogue. 100K interactions across 2300 demonstrations on 150 real-word websites. Includes HTML, screenshots and videos. Tests unseen sites, tasks, blind users

7

69

258

1

28

Yu Su @COLM'24

@ysu_nlp

16 days

Examples come from our recent paper

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey...

We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning...

arxiv.org

0

28

Yu Su @COLM'24

@ysu_nlp

3 years

My student @yugu_nlp will present our work, "Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases", at The Web Conference Papers Room #62 , 9:30 am, April 22 ET. Stop by if interested! Code, data, and leaderboard available.

3

4

27

Yu Su @COLM'24

@ysu_nlp

8 months

While I may not fully agree with all of @Francis_YAO_ ’s opinions on RAG vs long context (mainly because of my view of how long-term memory should work), I applaud his approach of open debate with clearly articulated definitions (which is surprisingly lack in many AI debates these

Yao Fu

@Francis_YAO_

8 months

Over the last two days after my claim "long context will replace RAG", I have received quite a few criticisms (thanks and really appreciated!) and many of them stand a reasonable point. Here I have gathered the major counterargument, and try to address then one-by-one (feels like

35

88

511

2

0

27

Yu Su @COLM'24

@ysu_nlp

2 years

So in the first lecture of my Intro to AI class I let students play with DALLE-2 and they all loved it! It's amazing how in just the past few years we've got so many great, easy-to-use tools. We should seriously consider how to best use them for education and outreach. #dalle2

3

2

26

Yu Su @COLM'24

@ysu_nlp

3 years

Making AI more accessible to everyone has always been my research goal. Super excited to see that NSF likes our idea of "plug-and-play AI"! I will co-lead the AI team with @EricFos . Also glad that the other proposal from OSU is also selected. Go Bucks! @icicleai @OSUengineering

OSU NLP Group

@osunlp

3 years

OSU NLP Group is super excited to join ICICLE -- a new NSF AI Research Institute dedicated to democratizing AI through the development of "plug-and-play AI"! We will study knowledge graphs, conversational AI, adaptive AI, and more! @icicleai #NLProc

0

6

8

4

1

24

Yu Su @COLM'24

@ysu_nlp

2 months

Jacob has some of the most thought-provoking takes on world models. My fav quotes: 1. The map, the orrery, and the simulator are all models of the same underlying system. Where they differ is in their affordances—the set of questions they enable a user of the model to answer,

Jacob Andreas

@jacobandreas

2 months

Some thoughts on how to think about "world models" in language models and beyond:

10

56

258

0

4

27

Yu Su @COLM'24

@ysu_nlp

9 months

Thank you for highlighting SeeAct @_akhaliq ! A more personal reflection on how we got here can be found here

AK

@_akhaliq

9 months

GPT-4V(ision) is a Generalist Web Agent, if Grounded paper page: The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional

5

84

385

0

5

25

Yu Su @COLM'24

@ysu_nlp

3 months

our intuitions from classic ML make it easy to believe that synthetic data is like interpolation and could lead to model collapse. But such intuitions may not hold and should be carefully re-examined with evidence in the LLM era. For another good example of full synthetic data

Teknium (e/λ)

@Teknium1

3 months

If you believe you can't exceed a teacher model at a task with synthetic data alone, then how is this SOTA? Synthetic data is real and is not something that has to cause a mode collapse or top out at the previous SOTA

14

10

186

3

26

Yu Su @COLM'24

@ysu_nlp

1 year

Thank you @bindureddy for the neat discussion on our work! We took a behaviorist approach to analyzing LLMs and our findings reveal new challenges in popular use cases of LLMs: tool use (easily deceived by malicious tools) and generative search engines (confirmation bias)

Bindu Reddy

@bindureddy

1 year

Science Behind Why LLMs Can Easily Be Tricked And Are Predictably Gullible All the Gen AI hype has elevated LLMs to much more than what they really are - Basically they are just really not much more than big transformer neural networks that are trained on large amounts of data

21

143

628

1

4

24

Yu Su @COLM'24

@ysu_nlp

6 months

@AndrewYNg Simple in-context learning is unlikely to get tool use to the level of accuracy needed for practical use. LLMs should use simulated trial and error to truly master tools, just like how tool-use animals do. We don't master a tool by solely looking at the 'user manual'

3

0

24

Yu Su @COLM'24

@ysu_nlp

5 months

99% attack success rate on GPT-3.5, perhaps the best public jailbreaking method, check it out

Huan Sun (OSU)

@hhsun1

5 months

Finally got a bit time to introduce our recent work on learning to generate adversarial suffixes: : Our generative model, named AmpleGCG, captures the distribution of adversarial suffixes given a harmful query and enables rapid generation of hundreds of

4

35

102

1

24

Yu Su @COLM'24

@ysu_nlp

2 years

Biomedical texts are exploding (1M+ PubMed papers annually, tons of EHRs) and information extraction is needed everywhere. It'll be game-changing if foundation models (eg GPT-3) can do few-shot bioIE. UNF, that's not quite the case yet. Check our new paper at EMNLP'22 (Findings)

Bernal Jiménez @COLM'24

@bernaaaljg

2 years

Thinking about using GPT-3 in-context learning for biomedical information extraction? Think again 🧠 Our work suggests that small PLM fine-tuning might be a better option and points to some general limitations of GPT-3 in-context learning. () #NLProc [1/6]

2

8

20

1

5

24