Tu Vu Profile
Tu Vu

@tuvllms

Followers
2,817
Following
892
Media
40
Statuses
966

Research Scientist @GoogleDeepMind & Assistant Professor @VT_CS . PhD from @UMass_NLP . Google FLAMe/FreshLLMs/Flan Collection/SPoT #NLProc

California, USA
Joined April 2017
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@tuvllms
Tu Vu
3 months
🚨 New @GoogleDeepMind paper 🚨 We trained Foundational Large Autorater Models (FLAMe) on extensive human evaluations, achieving the best RewardBench perf. among generative models trained solely on permissive data, surpassing both GPT-4 & 4o. 📰: 🧵:👇
Tweet media one
Tweet media two
25
99
567
@tuvllms
Tu Vu
1 year
🚨 New @GoogleAI paper: 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔 As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
5
87
386
@tuvllms
Tu Vu
2 years
Enormous LMs like GPT-3 exhibit impressive few-shot performance, but w/ self-training a BERT base sized model can achieve much better results! W/ a new implementation of STraTA, we were able to get ~93% acc on SciTail w/ 8 examples per class! Checkout our recent work @GoogleAI 👇
3
51
355
@tuvllms
Tu Vu
10 months
📢 🌟PhD Openings🌟: I am recruiting PhD students this cycle at Virginia Tech. If you want to dive into: - in-context learning & tool-use LLMs - instruction tuning - parameter-efficient transfer learning - few-shot learning please apply by Dec 15! 👉
5
78
321
@tuvllms
Tu Vu
3 years
Sharing my internship work @GoogleAI : 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8
Tweet media one
Tweet media two
5
48
301
@tuvllms
Tu Vu
9 months
📢📢 I am looking for a student researcher to work with me and my colleagues at @GoogleAI Research on instruction-based text embedding representations and evaluation. Please apply () and reach out to me (ttvu @google .com) if interested.
6
40
274
@tuvllms
Tu Vu
7 months
Great advice for early-career PhD students from the awesome @mrdrozdov . Really liked the saying: “The typical PhD takes 5-7 years to complete, but if you really focus, ignore your friends and family, work late into the night, and dedicate your whole self to your work then it only
@mrdrozdov
Andrew Drozdov
7 months
🌟 PhD Thesis Defended 🌟 1️⃣ Title: Unlocking Natural Language Generalization through Adaptive Retrieval-based Methods 2️⃣ Joining Databricks as a Research Scientist w. focus on generative retrieval / RAG 3️⃣ New Blog Post: Advice for PhD Students
Tweet media one
28
22
227
2
15
198
@tuvllms
Tu Vu
3 years
Excited to announce our #EMNLP2021 paper that shows how to turn a pre-trained language model or even a randomly initialized model into a strong few-shot learner. Paper: w/ amazing collaborators: @lmthang , @quocleix , @GradySimon , @MohitIyyer 1/9👇
Tweet media one
Tweet media two
5
37
186
@tuvllms
Tu Vu
1 year
I successfully defended my Ph.D. thesis. A special thank you to the members of my thesis committee: my wonderful advisor @MohitIyyer , @MajiSubhransu , @HamedZamani , @lmthang , and @colinraffel for their insightful feedback and advice on my research and career plans.
19
6
136
@tuvllms
Tu Vu
8 months
Based on our latest evaluation, LLMs today are still struggle to dynamically adapt to our ever-changing world. Strikingly, open source LLMs such as Mixtral 8x7B, when combined w/ FreshPrompt, can be competitive with closed source models and commercial APIs on search-augmented QA.
Tweet media one
@tuvllms
Tu Vu
1 year
🚨 New @GoogleAI paper: 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔 As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
5
87
386
1
23
119
@tuvllms
Tu Vu
4 years
Excited to share our @emnlp2020 paper on task transferability: 1) a large-scale empirical study w/ over 3,000 combinations of NLP tasks and data regimes within and across different classes of problems 2) task embedding methods to predict task transferability 1/12👇
Tweet media one
Tweet media two
Tweet media three
@colinraffel
Colin Raffel
4 years
I somehow missed this great paper by @tuvuumass et al.: They learn "task embeddings" (a la task2vec) for NLP tasks and show how they can be used to predict the effectiveness of intermediate-task transfer. Lots of experiments and a promising direction!
3
28
227
2
39
117
@tuvllms
Tu Vu
2 years
Q: As of today, what's the best “open-source” LLM for both few-shot prompting & fine-tuning? A: I’d recommend FLAN-T5 if it fits your budget. Q: What if I want to train my own model? A: You should fine-tune it on the FLAN dataset collection! Check out our new work @GoogleAI 👇
@ShayneRedford
Shayne Longpre
2 years
✨New Paper✨What’s the best completely public competitor to #ChatGPT ? Flan-T5 beats all public models we tested: Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning! 1/
Tweet media one
Tweet media two
24
249
1K
2
19
110
@tuvllms
Tu Vu
2 years
While parameter-efficient tuning methods are originally proposed to reduce computation & storage costs, it turns out they can help overcome catastrophic forgetting and thus improve performance on zero-shot cross-lingual generation. Checkout our work @GoogleAI @emnlpmeeting 👇1/10
Tweet media one
1
30
108
@tuvllms
Tu Vu
8 months
I will also be co-hosting a summer research intern at Google Bard with @TsendeeMTS working on long-context modeling. Please reach out to me (ttvu @google .com) if interested.
@tuvllms
Tu Vu
9 months
📢📢 I am looking for a student researcher to work with me and my colleagues at @GoogleAI Research on instruction-based text embedding representations and evaluation. Please apply () and reach out to me (ttvu @google .com) if interested.
6
40
274
0
9
80
@tuvllms
Tu Vu
1 year
Please help repost! My team @GoogleAI is looking for a research scientist. Our focus areas are multimodal/multilingual/multipod models. Past projects incl. prompt tuning, mT5/ByT5/sentence-T5/longT5, universal sentence encoders. Email me (ttvu @google .com) if interested #NLProc
0
25
69
@tuvllms
Tu Vu
9 months
AlphaGeometry's results are groundbreaking, yet I find @thtrieu_ 's hard work and dedication to the project even more impressive. It's rare for a PhD student to persistently work on a single project for 4 years. AFAIK, Trieu is on the job market, so hire him before it's too late.
@thtrieu_
trieu
9 months
Proud of this work. Here's my 22min video explanation of the paper:
39
153
771
0
2
64
@tuvllms
Tu Vu
2 years
Paper: Pre-trained models & code: Thread: W/ awesome collaborators: @lmthang , @quocleix , @GradySimon , @MohitIyyer
@tuvllms
Tu Vu
3 years
Excited to announce our #EMNLP2021 paper that shows how to turn a pre-trained language model or even a randomly initialized model into a strong few-shot learner. Paper: w/ amazing collaborators: @lmthang , @quocleix , @GradySimon , @MohitIyyer 1/9👇
Tweet media one
Tweet media two
5
37
186
1
7
61
@tuvllms
Tu Vu
10 months
🌟LLMs' token-level probabilities are well-calibrated. So why not sample multiple responses (e.g., A, B, C) from an LLM and ask it to choose a single letter? @jessierenjie found this method improved both performance & calibration for open-ended generation. Check out our work!👇
@jessierenjie
Jie Jessie Ren
10 months
Struggling with LLM calibration for open-ended generation? Check out our methods (Sample & Select / Sample & Eval) that reformulate open-ended generation into multiple choice or true/false evaluation to leverage LLMs’ better calibration at the token level.
Tweet media one
1
20
142
1
15
57
@tuvllms
Tu Vu
1 year
Moving forward, I will be splitting my time as a research scientist at @GoogleAI and an assistant professor @VT_CS . I will also be recruiting Ph.D. students starting in Fall 2024 to work on effective and efficient transfer learning in the era of LLMs, please come join me!
4
7
57
@tuvllms
Tu Vu
3 years
Happy to share our soft prompt transfer (SPoT) paper made it to #ACL2022 🎉. On the SuperGLUE leaderboard, SPoT is the first parameter-efficient approach that is competitive with methods that tune billions of parameters. w/ @blester125 , @noahconst , @aboSamoor , @daniel_m_cer
@tuvllms
Tu Vu
3 years
Sharing my internship work @GoogleAI : 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8
Tweet media one
Tweet media two
5
48
301
2
9
57
@tuvllms
Tu Vu
5 years
Excited to share our #acl2019nlp paper () which improves paragraph classification by pretraining the encoder on unlabeled data using our sentence content objective. Work done with my advisor @MohitIyyer . Code: . Summary below [1/5]
1
10
45
@tuvllms
Tu Vu
11 months
📢 Want to adapt your outdated LLM to our ever-changing world? 🌏 Check out our code for FreshPrompt at . Colab: . 🙏 We are grateful to @serp_api for their generous sponsorship of 5000 searches for FreshPrompt's users.
Tweet media one
@tuvllms
Tu Vu
1 year
🚨 New @GoogleAI paper: 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔 As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
5
87
386
0
6
32
@tuvllms
Tu Vu
1 year
💡Let's raise the bar for LLM's factuality! 🚀 We introduce FreshQA: 📚 a dynamic QA benchmark w/ 600 diverse questions, incl. those testing real-time knowledge and debunking false premises. 🔎 a two-mode eval procedure: relaxed & strict (no hallucination & outdated info).
Tweet media one
1
2
27
@tuvllms
Tu Vu
7 months
Check out @ContextualAI 's great work on RAG 2.0 that trains a RAG system end-to-end. I'm glad to see more and more work using freshness (w/ FreshQA) as one of the evaluation criteria.
@ContextualAI
Contextual AI
7 months
Our first set of RAG 2.0 models, Contextual Language Models (CLMs), significantly improve performance over current systems across axes critical for enterprise work: open-domain question answering, faithfulness, and freshness.
Tweet media one
1
2
26
0
1
26
@tuvllms
Tu Vu
7 months
2024: - oh nooo! you can't keep language models up-to-date with real-time knowledge.
@jxmnop
jack morris
7 months
2022: - oh nooo!!! you can't run language models on cpu! you need an expensive nvidia GPU and special CUDA kernels and– - *one bulgarian alpha chad sits down and writes some c++ code to run LLMs on cpu* - code works fine (don't need a GPU), becomes llama.cpp 2023: - oh noo!!
71
328
4K
1
0
25
@tuvllms
Tu Vu
1 year
I would also like to thank all of my labmates @UMass_NLP and friends at @UMassAmherst , my mentors and collaborators at @GoogleAI and @MSFTResearch , and my family and friends all over the world who gave me support and encouragement throughout my Ph.D. journey.
1
1
22
@tuvllms
Tu Vu
10 months
Glad to see FreshLLMs/FreshQA got mentioned in @perplexity_ai & @youSearchEngine 's recent blogs (, ) To facilitate future work, we've developed FreshEval, a reliable automatic evaluation metric for FreshQA 👉
@tuvllms
Tu Vu
1 year
🚨 New @GoogleAI paper: 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔 As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
5
87
386
0
2
18
@tuvllms
Tu Vu
1 year
We present FreshPrompt: Improving LLMs on FreshQA 🚀: 📚 Incorporates all relevant & up-to-date evidences from Google Search, incl. those from relevant questions 💡 Sort evidences chronologically 🧠 Reasons over the evidences to figure out the most relevant & current answer
Tweet media one
Tweet media two
Tweet media three
1
3
15
@tuvllms
Tu Vu
2 years
@jeremyphoward Could be relevant (off the top of my head): Liu et al., 2019 (), Peters et al., 2018 (), Voita et al., 2019 ().
2
0
15
@tuvllms
Tu Vu
1 year
Large-scale instruction tuning is the key to unlocking the power of Mixture of Experts (MoEs) models. Check out our recent work led by the awesome @shengs1123 and @Hou_Le 👇
@shengs1123
Sheng Shen
1 year
A Winning Combination for Large Language Models TL;DR: Did you find MoE models generalize worse than dense models on downstream tasks? Not any more at the age of instruction tuning! Surprisingly, we see the “1 + 1 > 2” effect when it comes to MoE + Instruction Tuning. [1/4]
Tweet media one
8
65
375
0
3
14
@tuvllms
Tu Vu
8 months
Note that FreshQA has been regularly updated on a weekly basis since its release and our new autorater FreshEval allows for quick evaluation and comparison
1
1
13
@tuvllms
Tu Vu
1 year
🚨 50K human judgments to evaluate LLM factuality: 💥No surprise: bigger models ≠ reliable gains on fast-changing facts. 📉flat scaling curves on false-premise questions, though some LLMs can debunk false premises if prompted to VERIFY first 🤯🔍.
Tweet media one
Tweet media two
1
0
11
@tuvllms
Tu Vu
2 years
FYI: our self-training code has been merged into the @huggingface 's Transformers library . Thanks, @GuggerSylvain for helping out!
1
2
10
@tuvllms
Tu Vu
1 year
@WenhuChen Relevant paper by Thomas Wang, @ada_rob et al.
0
0
10
@tuvllms
Tu Vu
3 years
This is joint work with awesome collaborators @blester125 , @noahconst , @aboSamoor , and @daniel_m_cer @GoogleAI . Preprint available at . 8/8
0
0
9
@tuvllms
Tu Vu
1 year
💥 Insights from FreshPrompt's analysis: 🧐 Number of retrieved evidences and their order shape the correctness of LLM's answers. 🧐 Encouraging concise answers = less hallucination (less is more for precision!)
Tweet media one
Tweet media two
1
0
8
@tuvllms
Tu Vu
3 years
We show that task prompts can be interpreted as task embeddings to construct a semantic space of tasks and formalize the similarity between tasks (see Figure 3 👇). 6/8
Tweet media one
1
0
8
@tuvllms
Tu Vu
10 months
This is great!
@GoogleColab
Colaboratory
10 months
This is a small quality of life improvement, but also enables fun features like being able to read @huggingface datasets directly from Pandas!
Tweet media one
6
60
481
0
0
8
@tuvllms
Tu Vu
1 year
🚀📊📈Our experiments show that FreshPrompt substantially boosts the performance of an LLM on FreshQA, outperforming both competing search engine-augmented prompting methods such as Self-Ask as well as commercial systems such as Perplexity AI.
Tweet media one
1
1
8
@tuvllms
Tu Vu
3 years
STraTA starts with task augmentation that uses unlabeled texts from the target domain to synthesize a large amount of in-domain training data for an auxiliary task (i.e., natural language inference), which is then used for intermediate fine-tuning (see the figure below).
Tweet media one
1
1
7
@tuvllms
Tu Vu
4 years
Finally, this work was done with a few hundred thousand GPU jobs in several months. We couldn’t have completed it without the awesome GPU cluster operating on renewable energy at @umasscs . So, please consider doing a Ph.D. here. 😀
1
0
6
@tuvllms
Tu Vu
3 years
Finally, we propose a simple yet efficient retrieval algorithm that measures task embedding similarity, allowing practitioners to identify source tasks that are likely to yield positive transferability for a given novel target task (see Figure 2 👆, right). 7/8
1
0
6
@tuvllms
Tu Vu
2 years
Paper: w/ awesome collaborators Aditya Barua, @blester125 , @daniel_m_cer , @MohitIyyer , and @noahconst We also release LM-adapted mT5 checkpoints , which we hope will spur more research into multilingual prompt-based learning. 10/10
0
0
7
@tuvllms
Tu Vu
3 years
Scale is not necessary for Prompt Tuning to match Model Tuning's performance: Prompt Tuning w/ SPoT yields competitive or significantly better results than Model Tuning across all model sizes while being more parameter-efficient (up to 20Kx fewer task-specific parameters). 4/8
1
0
5
@tuvllms
Tu Vu
2 years
To explicitly tackle catastrophic forgetting, we present two approaches: 1) mixing in unlabeled multilingual data during learning the task, and (2) factoring soft prompts into “task” and “language” components that can be recombined in novel pairings at inference time. 7/10
Tweet media one
1
0
6
@tuvllms
Tu Vu
3 years
Lester et al. (2021) show that, as model size increases, Prompt Tuning (which learns soft prompts to condition a frozen model to perform tasks) becomes competitive with Model Tuning (a.k.a fine-tuning). However, there are still large gaps between them at small model sizes. 2/8
1
2
5
@tuvllms
Tu Vu
5 years
SpanBERT: a new pre-training objective that predicts the content of masked spans of text, significantly outperforming BERT on span selection tasks e.g., question answering and coreference resolution
@arxiv_cs_cl
cs.CL Papers
5 years
SpanBERT: Improving Pre-training by Representing and Predicting Spans. (arXiv:1907.10529v1 []) #NLProc
0
2
19
0
1
5
@tuvllms
Tu Vu
2 years
We show that standard model fine-tuning (Model Tuning) and parameter-efficient Prompt Tuning methods suffer from catastrophic forgetting on a novel zero-shot cross-lingual summarization task, causing them to often generate text in the wrong language. 3/10
Tweet media one
1
0
6
@tuvllms
Tu Vu
3 years
Additionally, we conduct a large-scale and systematic study on task transferability with 26 NLP tasks and 160 combinations of source-target tasks, which demonstrates that tasks can often benefit each other via prompt transfer. 5/8
1
0
5
@tuvllms
Tu Vu
1 year
Bard now helps you code @google
0
1
4
@tuvllms
Tu Vu
2 years
Can current transfer learning methods extend successfully to a zero-shot cross-lingual generation (XGen) setting that requires a multilingual model to learn a generative task from labeled data in one language and then perform this task in another language at inference time? 2/10
1
0
5
@tuvllms
Tu Vu
2 years
Through qualitative analysis, we find that Prompt Tuning tends to stay within the target language, whereas Model Tuning is more prone to code-switching between English and the target language. 9/10
Tweet media one
1
0
5
@tuvllms
Tu Vu
1 year
@mmitchell_ai In Vietnam, the high school day starts at 7:00 AM.
1
0
4
@tuvllms
Tu Vu
3 years
We propose SPOT: Soft Prompt Transfer, a novel prompt-based transfer learning approach that first learns a prompt on one or more source tasks and then uses it to initialize the prompt for a target task (see Figure 2 👆, left). 3/8
1
0
4
@tuvllms
Tu Vu
2 years
We show that both of our approaches can help prevent catastrophic forgetting and provide substantially better results when there is severe catastrophic forgetting, suggesting that robust zero-shot cross-lingual generation is within reach. 8/10
Tweet media one
1
0
5
@tuvllms
Tu Vu
1 year
@YiTayML Very impressive! Congrats, Yi and team!
1
0
3
@tuvllms
Tu Vu
6 years
@VeredShwartz @kaggle why not just the text on the cups? a training example: Verad, Vivi, Veri, Veaid, Vera, Vegan, Venda -> Vered :)
1
0
4
@tuvllms
Tu Vu
2 years
@mrdrozdov @stochasticdoggo Congrats, Andrew!!🎉
0
0
3
@tuvllms
Tu Vu
2 years
Starting from FLAN-T5 not only confers improved downstream performance but also provides significant speedup during fine-tuning. So, we would highly recommend FLAN-T5 as a starting point for your own specific task.
Tweet media one
1
0
3
@tuvllms
Tu Vu
2 years
Finally, if you want to use your own model (e.g., a smaller pre-trained LM), we would recommend fine-tuning it on the FLAN 2022 data collection with 1.8K datasets phrased as instructions:
1
0
3
@tuvllms
Tu Vu
2 years
We find an interesting “paradox of capacity” for Prompt Tuning. On the one hand, greater capacity (longer prompts) helps to better learn the summarization task. On the other hand, the greater the capacity to learn from English, the more the model forgets other languages. 6/10
Tweet media one
1
0
4
@tuvllms
Tu Vu
1 year
@arankomatsuzaki Thanks a lot for sharing our work! More details can be found in this thread
@tuvllms
Tu Vu
1 year
🚨 New @GoogleAI paper: 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔 As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
5
87
386
0
0
1
@tuvllms
Tu Vu
8 months
@zhansheng @kchonyc @hhexiy @JoaoSedoc @tallinzen @sleepinyourhat Big congrats, @zhansheng ! We had similar thesis title, mine was "Effective and Efficient Transfer Learning in the Era of Large Language Models".
1
0
3
@tuvllms
Tu Vu
2 years
We demonstrate that increasing model size and decreasing tunable parameter capacity are key to overcoming catastrophic forgetting. 5/10
Tweet media one
1
0
4
@tuvllms
Tu Vu
2 years
Our experiments show that for both held-in and held-out tasks, fine-tuning FLAN-T5 significantly outperforms fine-tuning the vanilla T5, and even FLAN-T5 without fine-tuning can confer improved performance at times.
Tweet media one
1
0
3
@tuvllms
Tu Vu
3 years
We show that task augmentation alone can significantly improve downstream performance across different tasks, generally outperforming other competing fine-tuning approaches in both high- and low-data regimes.
1
0
3
@tuvllms
Tu Vu
3 years
Other interesting results: 1) randomly initialized model + STraTA outperforms BERT_BASE by a large margin on SST-2 while being competitive on SciTail. 2) BERT_BASE + STraTA substantially outperforms BERT_LARGE on both SST-2 and SciTail.
0
0
3
@tuvllms
Tu Vu
6 years
New paper on adapting pretrained language models to downstream tasks by @mattthemathman , @seb_ruder , and @nlpnoah , showing that the effectiveness of fine-tuning depends on the language model architecture and the similarity of the pretraining/target tasks.
2
1
3
@tuvllms
Tu Vu
2 years
Interestingly, Prompt Tuning can confer a significant boost in performance over Model Tuning during zero-shot inference on languages that are less related to English, e.g., non-Latin script languages like Russian and Thai. 4/10
Tweet media one
1
0
4
@tuvllms
Tu Vu
4 years
Our experiments show that positive transfer can occur in a diverse array of settings. Contrary to the common wisdom, transfer gains are possible even when the source dataset is small. Also, out-of-class transfer succeeds in many cases, some of which are unintuitive.
Tweet media one
1
0
2
@tuvllms
Tu Vu
3 years
We propose STraTA, which stands for Self-Training with Task Augmentation, an approach that combines two complementary methods, task augmentation and self-training, to effectively leverage task-specific unlabeled data, which is comparatively cheaper to obtain.
1
0
2
@tuvllms
Tu Vu
11 months
@SongWang_SW Great survey!! Our recent work aligns with this theme. We inject factual and up-to-date knowledge into LLMs through few-shot in-context learning.
@tuvllms
Tu Vu
1 year
🚨 New @GoogleAI paper: 🤖 LLMs are game-changers, but can they help us navigate a constantly changing world? 🤔 As of now, our work shows that LLMs, no matter their size, struggle when it comes to fast-changing knowledge & false premises. 📰: 👇
Tweet media one
Tweet media two
5
87
386
1
0
2
@tuvllms
Tu Vu
5 years
Very impressive! On IMDB, unsupervised data augmentation + 20 labeled examples can beat the state-of-the-art model trained on 25,000 labeled examples.
@lmthang
Thang Luong
5 years
Introducing UDA, our new work on "Unsupervised data augmentation" for semi-supervised learning (SSL) with Qizhe Xie, Zihang Dai, Eduard Hovy, & @quocleix . SOTA results on IMDB (with just 20 labeled examples!), SSL Cifar10 & SVHN (30% error reduction)!
Tweet media one
3
123
414
0
0
2
@tuvllms
Tu Vu
2 years
Work led by the awesome @ShayneRedford , please find more details in the thread below
@ShayneRedford
Shayne Longpre
2 years
✨New Paper✨What’s the best completely public competitor to #ChatGPT ? Flan-T5 beats all public models we tested: Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning! 1/
Tweet media one
Tweet media two
24
249
1K
0
0
2
@tuvllms
Tu Vu
4 years
@tuvllms
Tu Vu
4 years
Excited to share our @emnlp2020 paper on task transferability: 1) a large-scale empirical study w/ over 3,000 combinations of NLP tasks and data regimes within and across different classes of problems 2) task embedding methods to predict task transferability 1/12👇
Tweet media one
Tweet media two
Tweet media three
2
39
117
0
0
2
@tuvllms
Tu Vu
3 years
Despite their strong performance on many tasks, large-scale pre-trained language models do not perform as well when limited labeled data is available (e.g., on small datasets or in few-shot settings). Collecting more labeled data can help but can also be prohibitively expensive.
1
0
2
@tuvllms
Tu Vu
4 years
@mariusmosbach @zhansheng @yadapruksachatk @sleepinyourhat Also relevant is our work on intermediate task transfer between NLP tasks
1
0
2
@tuvllms
Tu Vu
3 years
STraTA further uses the auxiliary-task model created by task augmentation as a base model for self-training, where it is fine-tuned on the available labeled data for the target task and is then used to infer predictions (pseudo labels) on unlabeled data for subsequent training.
1
0
2
@tuvllms
Tu Vu
5 years
@MohitIyyer Our sentence content objective substantially boosts accuracy and generalization: on Yelp, with only 500 labeled examples, it outperforms training from scratch on 200× more data, which we hope will spur more linguistically-informed research into paragraph embedding methods. [5/5]
0
0
2
@tuvllms
Tu Vu
3 years
With STraTA, we are able to substantially improve sample efficiency across 12 NLP benchmark datasets. Remarkably, when given only 8 labeled examples per class from the SST-2 sentiment dataset, our approach is competitive with standard fine-tuning on all 67K labeled examples.
1
0
2
@tuvllms
Tu Vu
2 years
@YiTayML Good luck moving forward, Yi!
1
0
2
@tuvllms
Tu Vu
4 years
@MShahrad @UBC Congrats, Mohammad! That's awesome!
0
0
1
@tuvllms
Tu Vu
2 years
@iamsteph @lmthang @quocleix @GradySimon @MohitIyyer Thanks a lot, @iamsteph ! Feel free to send me a message or email me!
1
0
1
@tuvllms
Tu Vu
2 years
@HamedZamani @NSF Great news!! Congrats, @HamedZamani !!
0
0
1
@tuvllms
Tu Vu
2 years
@naughtynates @lmthang @quocleix @GradySimon @MohitIyyer Thanks a lot, @naughtynates ! We observed degenerate results without filtering the synthetic data. For filtering, you could use our pre-trained models or a pre-trained NLI model on .
0
0
1
@tuvllms
Tu Vu
1 year
@khanhxuannguyen Congrats and good luck moving forward!
1
0
1
@tuvllms
Tu Vu
2 years
@alexjc Thanks for the question! If you are curious about the performance of larger models, here are the results on MMLU: Flan-T5 XL (3B): 52.4 Flan-T5 XXL (11B): 55.1 Flan-PaLM (540B): 73.5 Flan-U-PaLM (540B): 74.1
0
0
1
@tuvllms
Tu Vu
3 years
0
0
1
@tuvllms
Tu Vu
4 years
@swartchris8 @umasscs Thanks, @swartchris8 ! Yeah, it could be a potential direction. Just found out that a follow-up work has tried out-of-class transfer from natural language inference to biomedical QA and observed a considerable boost in performance.
1
0
1
@tuvllms
Tu Vu
1 year
@najoungkim Thanks a lot, Najoung! Hope (QA)^2 and FreshQA will spur more research into this area.
0
0
1
@tuvllms
Tu Vu
5 years
@MohitIyyer How well do paragraph embeddings encode whether or not a given sentence appears in the paragraph? We extend the notion of probe tasks to the paragraph level and formulate a sentence content task to probe for this basic linguistic property. [2/5]
1
0
1
@tuvllms
Tu Vu
6 years
An Unassuming Genius: The Man behind Google’s AutoML
0
0
1