An easy trick to improve your LLM results without fine-tuning. Many people know "Few-Shot prompting" or "Chain of Thought prompting". A new (better) method was presented by
@FangruLin99
at
#ICML2024
. It is called: Plan Like a Graph (PLaG)
The core idea is simple.
#icml
#icml24
Excited to share our paper with
@iperboreo_
@vjhofmann
@ellemichelley
Anthony Cohn and Janet Pierrehumbert: ! We release a benchmark for asynchronous plan *AsyncHow*. When *Plan Like a Graph*, GPT-4/3.5 get consistent boost over all task complexities. 1/n
💥Our
#ICML2024
camera-ready paper Graph-enhanced Large Language Models in Asynchronous Planning is available on arxiv: ! *Off-the-shelf* method *Plan Like a Graph* gives GPT-3.5/4 Pareto improvement on asynchronous planning tasks of all complexities!🧵
Going to
@icmlconf
London meetup this Friday to present our paper . I also have a oral presentation session in London at 15:00-16:00. Please stop by our poster or oral session and DM me for random research chat in London or Vienna!😆😆
My master thesis ‘Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics’ accepted by LREC-COLING 2024 is online! Many thanks to my supervisors Janet Pierrehumbert and
@Dr_Semantic
for their generous help!
Going to
@icmlconf
London meetup this Friday to present our paper . I also have a oral presentation session in London at 15:00-16:00. Please stop by our poster or oral session and DM me for random research chat in London or Vienna!😆😆
Excited to share our paper with
@iperboreo_
@vjhofmann
@ellemichelley
Anthony Cohn and Janet Pierrehumbert: ! We release a benchmark for asynchronous plan *AsyncHow*. When *Plan Like a Graph*, GPT-4/3.5 get consistent boost over all task complexities. 1/n
💥Our
#ICML2024
camera-ready paper Graph-enhanced Large Language Models in Asynchronous Planning is available on arxiv: ! *Off-the-shelf* method *Plan Like a Graph* gives GPT-3.5/4 Pareto improvement on asynchronous planning tasks of all complexities!🧵
My master thesis ‘Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics’ accepted by LREC-COLING 2024 is online! Many thanks to my supervisors Janet Pierrehumbert and
@Dr_Semantic
for their generous help!
I will be in ICML to present this poster! Happy to chat about anything related to LLMs/neuro-symbolic methods/agents in general!
And our poster is at 1:30-3 pm Tuesday, Hall C 4-9
#700
! Come to check if you are interested in LLM and planning!
What happens when you are trying to make yourself searchable on social media but the conference site is not completely indoor and the transition part is pouring 🥲
@LrecColing
Beyond their use in assisting human evaluation (e.g. CriticGPT), can critiques directly enhance preference learning? During my
@Cohere
internship, we explored using synthetic critiques from large language models to improve reward models.
📑Preprint:
I finally had my master graduation ceremony at
@UniofOxford
! I’m honored to be awarded distinction in MPhil Linguistics, and I want to thank people who offered me invaluable support. It’s a great pleasure to stay at Oxford for a DPhil degree and work with amazing researchers!
Join our global paper reading group on August 24 at 10 AM EST as we dive into "Plan Like a Graph (PLaG): Enhancing LLMs in Asynchronous Plan Reasoning" with
@FangruLin99
from
@UniofOxford
. Don’t miss out! 🌐👩💻Link :
#AI
#LLMs
#MachineLearning
Going to
@icmlconf
London meetup this Friday to present our paper . I also have a oral presentation session in London at 15:00-16:00. Please stop by our poster or oral session and DM me for random research chat in London or Vienna!😆😆
We propose a method *Plan Like a Graph* (PLaG), which we find can be applied to a wide variety of open- and closed-source models to boost their performance. It’s so easy that you can apply it off the shelf.🧵
Why is asynchronous planning so difficult? We consider three key skills required for this task: time summation, time comparison, and constraint analysis. We find that constraint analysis is the key difficulty for our task.🧵
More interesting details are in the paper! This is a fantastic collaboration with
@iperboreo_
@vjhofmann
@ellemichelley
Anthony Cohn and Janet Pierrehumbert. Look forward to presenting in Vienna!🥳🥳🥳
We compare our task with prototypical graph search of more diverse complexities. These tasks share similar downgoing trend in domain, which means the naturalistic task is likely to suffer from continued performance degradation in more complex tasks like the prototypical one.🧵
Congrats to all the authors accepted in
#ICML24
(
@icmlconf
)!!
Now, it's time to apply ICML Socials!
You can find more details:
Socials application form:
Deadline: 26 May 2024 (AoE)
Please share this~.
Last interesting bit especially for linguists, we are inspired by Discourse Representation Theory when designing PLaG. It’s so interesting that LLMs can be improved so much by simply representing natural language prompts in a more structured way!🧵
@Yoann_Buzenet
@florianhoenicke
Details including graph examples can be found in the paper! Our latency is comparable to popular baseline methods but offers a significant boost of performance (please see appendix for details)!😊
We find LLMs are not robust to trivially different prompts: e.g. varying graph types or changing expressions such as ‘Step 1 must precede step 2’ to ‘Step 2 must follow step 1’ result in different performance.🧵
We find that although our method can improve model performance, they still suffer from drastic performance degradation with increasing task complexities.🧵
I’m excited to announce that my master thesis is accepted to
@LrecColing
: Probing Large Language Models for Scalar Adjective Lexical Semantics and Scalar Diversity Pragmatics. See a preview here and the paper will follow afterwards after modification:
@shangbinfeng
Oh that’s very interesting! We find graph prompting to be quite helpful in naturalistic complex planning tasks: , but I’m surprised simple graph tuning does not improve LLM performance in related complex tasks.
We automatically generated and released a benchmark for naturalistic asynchronous plan reasoning *AsyncHow*. It has wide coverage of topics and diverse task complexities! Our generated data has near-human level quality!🧵
We (i) automatically generate and open-source a high-quality dataset of 1.6k datapoints for asynchronous planning which requires both sequential and parallel efficient sheduling. 2/n
4. Despite the performance boost, we still find that LLMs tend to suffer from severe degradataion with increasing task complexities, which highlights the limitations of using LLMs to simulate digital devices.
@compthink
@florianhoenicke
It’s similar to cot in that they can be applied off the shelf but it’s more of an abstraction technique——it casts a natural language problem to some structured representation.
@philipcortes
@rohanpaul_ai
@florianhoenicke
Thanks for the comment! You can hand-craft several in-context-learning examples (we used 3) as PLaG with BaG and in our experience it works like a charm!
If you are a linguist, you might have heard of discourse representation theory. This PLaG idea was initially inspired by DRT and we are just very excited to see it works so well and so neatly!