Edoardo Ponti Profile Banner
Edoardo Ponti Profile
Edoardo Ponti

@PontiEdoardo

Followers
2,264
Following
421
Media
51
Statuses
370

Assistant Professor in #NLP at @EdinburghUni and visiting professor @nvidia | PhD @Cambridge_Uni | Humani nihil a me alienum puto

Edinburgh
Joined August 2018
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@PontiEdoardo
Edoardo Ponti
8 days
2 papers accepted at NeurIPS! SEA 🌊 Spectral Editing of Activations in LLMs ( @yifuqiu98 ) ZeTT ⛓️‍💥 Zero-shot Tokenizer Transfer ( @bminixhofer ) Also, stay tuned for our tutorial on dynamic sparsity with @andre_t_martins !
@bminixhofer
Benjamin Minixhofer
5 months
Introducing Zero-Shot Tokenizer Transfer (ZeTT) ⚡ ZeTT frees language models from their tokenizer, allowing you to use any model with any tokenizer, with little or no extra training. Super excited to (finally!) share the first project of my PhD🧵
Tweet media one
30
148
742
1
10
81
@PontiEdoardo
Edoardo Ponti
3 years
I am delighted to share that I will be joining @EdinburghNLP at @EdinburghUni from 2022 as a lecturer in Natural Language Processing. I am currently recruiting PhD students, so if you are passionate... (1/6)
26
55
399
@PontiEdoardo
Edoardo Ponti
8 months
We scaled sparse fine-tuning (SFT) to LLMs (such as Llama 2) by making it both parameter- and memory-efficient! (q)SFT instruction tuning performance is often better than (q)LoRA with comparable speed and memory load. Paper: Code:
2
70
251
@PontiEdoardo
Edoardo Ponti
4 months
Today I am joining @nvidia part-time as a visiting professor I could not imagine a better place to explore new efficient architectures for LLMs and diffusion I am looking forward to collaborating with so many talented researchers!
13
4
221
@PontiEdoardo
Edoardo Ponti
3 years
Multitask learning by decomposing tasks into sets of fine-grained skills (discrete, reusable, and autonomous facets of knowledge). New work with Yoshua Bengio @sivareddyg from @Mila_Quebec and @murefil from @MSFTResearch 📘: 💻:
Tweet media one
3
31
160
@PontiEdoardo
Edoardo Ponti
8 months
I am still looking for PhD students starting in September 2024! The deadline to apply for the CDT in NLP is the 11th of March. If you wish to do research in modular and efficient LLMs, here are some highlights of my lab's research from the past year ⬇️🧵
@EdinburghNLP
EdinburghNLP
8 months
Interested in training with future leaders in NLP to engage with the cutting edge of the technical, social, design, and legal aspects of these systems? Then apply for our new Centre for Doctoral Training in Designing Responsible NLP! Deadline 11 March 2024
0
17
52
11
52
153
@PontiEdoardo
Edoardo Ponti
11 months
We connect inaccuracies of merging fine-tuned models to the mismatch between their gradients (through a target model), minimising which directly improves the performance. New paper with @ndaheim_ @tmoellenhoff @IGurevych @EmtiyazKhan
Tweet media one
3
31
116
@PontiEdoardo
Edoardo Ponti
2 years
Large language models often generate hallucinated responses. We introduce Elastic Weight Removal (EWR), a novel method for faithful *and* abstractive dialogue. 📃 💻 +other methods! 🧑‍🔬 @ndaheim_ @nouhadziri @IGurevych @mrinmayasachan
1
19
108
@PontiEdoardo
Edoardo Ponti
2 years
I am looking for PhD students to join my group at @EdinburghNLP @EdinburghUni and work on modular NLP, grounding, and typology! The deadline for international applicants is Nov 25th for fully funded PhD programmes at CDT NLP and ILCC. For more info:
1
42
100
@PontiEdoardo
Edoardo Ponti
3 years
A new method for the adaptation of pre-trained models that is modular, expressive, and parameter-efficient: Lottery Ticket Sparse Fine-Tuning 👨‍🔬 Alan Ansell, me, @licwu , and @annalkorhonen 📄 👩‍💻
Tweet media one
2
25
96
@PontiEdoardo
Edoardo Ponti
2 years
Can we increase the efficiency *and* performance of auto-regressive models? We introduce dynamic-pooling Transformers, which jointly perform language modelling and token segmentation. @p_nawrot * @AdrianLancucki @JChorowski 📜 🧑‍💻
2
27
92
@PontiEdoardo
Edoardo Ponti
7 months
Can open-source LLMs execute *chains of instructions* in a single query? Not so well, we found. However, they can learn this ability by: - augmenting examples from public SFT mixtures with chains of instructions automatically - performing *sequential instruction tuning* on them.
Tweet media one
1
21
91
@PontiEdoardo
Edoardo Ponti
2 months
I am attending #ACL2024 in Bangkok and I am giving a keynote talk at RepL4NLP on Thursday (15 Aug), "Efficiency as an Inductive Bias for Language Models" Here is a preview with some hot takes and ideas!
0
15
98
@PontiEdoardo
Edoardo Ponti
4 years
Just passed my viva with minor corrections! Many thanks to my examiners, my supervisors @annalkorhonen and @licwu , and all those who supported me throughout the PhD
Tweet media one
16
3
84
@PontiEdoardo
Edoardo Ponti
8 months
Polytropon is now available on the @huggingface peft library! Consider using it for better generalisation when instruction tuning your LLM Minimal example here (multi-task learning): Many thanks to @taosunvoyage for the implementation!
@PontiEdoardo
Edoardo Ponti
3 years
Multitask learning by decomposing tasks into sets of fine-grained skills (discrete, reusable, and autonomous facets of knowledge). New work with Yoshua Bengio @sivareddyg from @Mila_Quebec and @murefil from @MSFTResearch 📘: 💻:
Tweet media one
3
31
160
4
13
79
@PontiEdoardo
Edoardo Ponti
4 months
Adaper parameters are all you need in modular LLMs! You can *build* inventories of experts by clustering tasks based on their LoRA params You can *reuse* experts by routing zero-shot based on right singular vectors of their LoRA params
@_akhaliq
AK
5 months
Towards Modular LLMs by Building and Reusing a Library of LoRAs The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to
Tweet media one
3
66
275
0
10
73
@PontiEdoardo
Edoardo Ponti
3 years
In our new paper, @KreutzerJulia @licwu @sivareddyg and I present a method to enhance translation-based cross-lingual transfer (gains up to 2.7 per task and 5.6 per language). Pdf: . Code: @Mila_Quebec @CambridgeLTL @GoogleAI
2
9
62
@PontiEdoardo
Edoardo Ponti
1 year
Our paper on multi-head routing in modular LLMs has now been accepted at @NeurIPSConf (), it was fun to work with @LucasPCaccia and @sordonia ! @EdinburghNLP @Mila_Quebec @MSFTResearch
@LucasPCaccia
Lucas Caccia
1 year
New preprint : To promote generalisation to new tasks, modular LLMs reuse and adapt previously acquired skills. We propose a more expressive “multi-head” routing strategy, which achieves consistent gains. Code: Paper:
1
14
58
0
14
55
@PontiEdoardo
Edoardo Ponti
5 months
We introduce the idea of zero-shot *tokenizer* transfer Our vision is to combine your favourite LLM with an arbitrary tokenizer on the fly This means - More efficient encoding for non-English text - Mix experts with different tokenizers Check @bminixhofer 's thread for details!
@bminixhofer
Benjamin Minixhofer
5 months
Introducing Zero-Shot Tokenizer Transfer (ZeTT) ⚡ ZeTT frees language models from their tokenizer, allowing you to use any model with any tokenizer, with little or no extra training. Super excited to (finally!) share the first project of my PhD🧵
Tweet media one
30
148
742
0
11
55
@PontiEdoardo
Edoardo Ponti
4 years
We have created XCOPA, a dataset for commonsense reasoning and knowledge transfer across 11 languages (including Quechua and Haitian Creole). @gg42554 O Majewska @qianchul @licwu @annalkorhonen Download: Paper:
1
13
53
@PontiEdoardo
Edoardo Ponti
1 year
We have re-opened 2 PhD studentships for *2023/24* at @EdinburghNLP (1 home, 1 international), please send me a message by tomorrow if you are interested in this opportunity!
4
22
50
@PontiEdoardo
Edoardo Ponti
2 years
Join us today at 9:20am (Irish time) for @MML_WKSP , the first Multilingual Multimodal Workshop at #acl2022nlp ! We have a fantastic line-up of speakers:
Tweet media one
0
10
49
@PontiEdoardo
Edoardo Ponti
4 months
During the workshop on efficient generative AI at @InfAtEd , we discussed methods to reduce AI's energy costs and environmental impact while fostering AI democratisation and scientific discovery. Here are some lessons I learned from the speakers: 🧵
Tweet media one
Tweet media two
1
5
47
@PontiEdoardo
Edoardo Ponti
1 year
Corpus-based measures reliably discriminate morphological inflection and derivation cross-linguistically! @colemanhaley22 is presenting today at @sig_typ the first large-scale computational study (26 languages from @unimorph_ ) on this topic
Tweet media one
@sig_typ
SIGTYP
1 year
🔥Language-Agnostic Measures Discriminate Inflection and Derivation 🖊️By Coleman Haley, Edoardo M. Ponti and Sharon Goldwater 📽️Talk: 📚Paper: #SIGTYP2023
0
4
7
2
16
45
@PontiEdoardo
Edoardo Ponti
1 year
The applications for the @ELLISforEurope PhD programme are now open! If you'd like to join @EdinburghNLP and do research on modular deep learning (parameter-efficient fine-tuning, routing in mixture-of-experts, model merging, ...) or computational typology, drop me a message!
@ELLISforEurope
ELLIS
1 year
The portal is open: Our #ELLISPhD Program is now accepting applications! Apply by November 15 to work with leading #AI labs across Europe and choose your advisors among 200 top #machinelearning researchers! #JoinELLISforEurope #PhD #PhDProgram #ML
6
175
418
4
15
44
@PontiEdoardo
Edoardo Ponti
7 months
We retrofit LLMs by learning to compress their memory dynamically I find this idea very promising as it creates a middle ground between vanilla Transformers and SSMs in terms of memory/performance trade-offs I'd like to give a shout-out to @p_nawrot and @AdrianLancucki for the
@p_nawrot
Piotr Nawrot
7 months
The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance
Tweet media one
10
73
446
0
7
43
@PontiEdoardo
Edoardo Ponti
2 years
Very excited about this line of research! You can find the conclusions "in a nutshell" at the end of the survey, as well as a list of open challenges
@seb_ruder
Sebastian Ruder
2 years
In our new survey “Modular Deep Learning”, we provide a unified taxonomy of the building blocks of modular neural nets and connect disparate threads of research. 📄 📢 🌐 w/ @PfeiffJo @licwu @PontiEdoardo
Tweet media one
8
97
425
0
4
41
@PontiEdoardo
Edoardo Ponti
2 months
It was fun to meet so many people curious about dynamic memory compression!
Tweet media one
@p_nawrot
Piotr Nawrot
2 months
Tomorrow at @icmlconf , together with @PontiEdoardo and @AdrianLancucki , we'll present an updated version of "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference". You can find an updated paper at . Among others - 1) We trained DMC to
Tweet media one
7
26
85
0
4
41
@PontiEdoardo
Edoardo Ponti
2 months
I am attending @icmlconf in Vienna this week, come find me and my co-authors (tagged below) to discuss ideas for efficient / modular / NeSy LLMs!
1
7
39
@PontiEdoardo
Edoardo Ponti
2 years
Really proud of my student Yifu Qiu (co-supervised with Shay Cohen and @annalkorhonen ), who won the 2023 Apple Scholars in AI ML PhD fellowship! He has a bright future ahead of him. @EdinburghNLP @EdinburghUni @CambridgeLTL @Cambridge_Uni
0
2
39
@PontiEdoardo
Edoardo Ponti
3 years
Multilingual task-oriented dialogue is authentic if it displays natural fluency 🌊 and familiar entities 🛥️. In Cross-Lingual Outline-based Dialogue (COD 🐟), we set out to achieve exactly this! 💻 📝
Tweet media one
3
10
38
@PontiEdoardo
Edoardo Ponti
4 years
"Differentiable Generative Phonology", in collaboration with @EzraWu and @ryandcotterell , is finally out! Tired: Asking linguists to posit discrete underlying forms Wired: learning continuous underlying forms end-to-end
1
12
35
@PontiEdoardo
Edoardo Ponti
5 months
I am crossing Adrian's wall to give a series of invited talks in England! - 27/5 2 pm @OxUniMaths : - 29/5 noon @KingsCollegeLon : Bush House (S) 2.01 - 30/5 5:30 pm @ucl_nlp : - 4-5/6 @CambridgeLTL
1
4
31
@PontiEdoardo
Edoardo Ponti
2 years
Interested in integrating deep learning with symbolic algorithms, knowledge bases, and programmes? Apply for a 2-year postdoc position with me, @PMinervini , and @tetraduzione at ELIAI @EdinburghUni on gradient-based learning of complex latent structures.
0
15
31
@PontiEdoardo
Edoardo Ponti
3 years
This paper required a Herculean effort, but it was worth it! The aspect that I like the most is that it enables transfer learning along 3 different axes: languages, tasks, and modalities
@ebugliarello
Emanuele Bugliarello
3 years
Voilà IGLUE🧊 The Image-Grounded Language Understanding Evaluation benchmark 📈 IGLUE brings together 4 vision-and-language tasks across 20 languages And, brr, is it cold outside the Anglosphere 🥶 📄 👩‍💻 🌐
Tweet media one
4
42
167
0
4
29
@PontiEdoardo
Edoardo Ponti
7 months
If you are curious to discover more about Dynamic Memory Compression, I will give a preview during my keynote talk at the MOOMIN workshop @eaclmeeting See you on Thursday, March 21st at 9:30 AM!
@p_nawrot
Piotr Nawrot
7 months
The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance
Tweet media one
10
73
446
0
4
27
@PontiEdoardo
Edoardo Ponti
3 years
The best part is: you can adapt models from @huggingface with our SFTs in just 3 lines of code: from sft import SFT sft_model = SFT(sft_model_name) sft_model.apply(pretrained_model)
0
4
27
@PontiEdoardo
Edoardo Ponti
3 years
I am committed to selecting a diverse set of candidates with high potential, as the various communities of speakers around the world should also find representation in the NLP & ML scientific communities @Khipu_AI @DeepIndaba @MasakhaneNLP (5/6)
1
1
25
@PontiEdoardo
Edoardo Ponti
3 years
...about multilingual and low-resource NLP, sample-efficient and modular machine learning, computational typology, or grounded language learning, consider applying to my group! (2/6)
2
0
23
@PontiEdoardo
Edoardo Ponti
1 year
It was lovely to visit the other place again and talk about modular deep learning. Thanks @oxfordnlp for the invite!
Tweet media one
0
0
25
@PontiEdoardo
Edoardo Ponti
5 years
Given the paucity of annotated data, how can we perform sample-efficient generalization on unseen task-language combinations? Possible solution: a generative model of the neural parameter space, factorized into variables for several languages and tasks. 1/2
1
5
24
@PontiEdoardo
Edoardo Ponti
3 years
Many *fully funded* studentships (from September 2022) are available: 👩🏻‍🎓12 for a 4-year PhD with integrated study from the NLP CDT: 👨🏿‍🎓10 for a 3-year PhD from ILCC: (3/6)
1
5
23
@PontiEdoardo
Edoardo Ponti
2 years
Grammatical markers are implicitly aligned in pre-trained multilingual encoders by encoding the same grammatical functions through the same subset of neurons across languages. This may help explain the "unreasonable" effectiveness of zero-shot cross-lingual transfer.
@karstanczak
Karolina Stanczak
2 years
Excited to share our new #NAACL2022 paper: "Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models". (1/4) In collaboration with @PontiEdoardo @ltorroba1 @ryandcotterell @IAugenstein #NLProc
Tweet media one
5
30
159
2
9
21
@PontiEdoardo
Edoardo Ponti
2 years
A little gem from my student @p_nawrot : nanoT5, or how to pre-train T5 on 1 GPU, in less than 1 day, in Pytorch. Now it is more important than ever to keep research accessible and reproducible. He conceived the idea and executed it all by himself, quite a remarkable feat!
@p_nawrot
Piotr Nawrot
2 years
Introducing *nanoT5* Inspired by @jonasgeiping 's Cramming and @karpathy 's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget (1xA100 GPU, ~20 hours) in PyTorch 🧑‍💻 @EdinburghNLP
Tweet media one
8
85
457
0
7
19
@PontiEdoardo
Edoardo Ponti
1 year
Fantastic work from my student @yifuqiu98 : - the first metric to measure hallucinations in generated text for *any* language - an empirical study of how cross-lingual transfer amplifies hallucinations - a new method of "soft filtering" / loss weighting to promote faithfulness
@yifuqiu98
Yifu Qiu
1 year
[1/5] Our paper "Detecting and Mitigating Hallucinations for Multilingual Summarisation" is currently available on Arxiv! 📃 💻 🤝 @YftahZ @annalkorhonen @PontiEdoardo and Shay B. Cohen
0
12
36
0
3
19
@PontiEdoardo
Edoardo Ponti
3 years
For any enquiry, feel free to reach out to me via email or talk to me virtually at #EMNLP2021 (and attend our team's best paper award presentation!). I hope there will be a chance to meet some of you and discuss exciting research directions! (6/6)
0
0
13
@PontiEdoardo
Edoardo Ponti
6 years
Third (and last) paper at #EMNLP2018 (actually TACL): @dasgerz and @licwu carefully explaining our novel Language Modeling architecture with output matrix refinement
Tweet media one
0
2
15
@PontiEdoardo
Edoardo Ponti
6 years
Are you working on Natural Language Understanding? Then have a look here: @CambridgeLTL has just released the post-specialised word embeddings for GloVe, fastText, and SGNS. Pre-trained models to specialise new (cross-lingual) WEs are also available!
0
6
14
@PontiEdoardo
Edoardo Ponti
3 years
Do not hesitate to reach out if you are interested!
@YugeTen
Yuge Shi (Jimmy)
3 years
The school of informatics at the University of Edinburgh and DeepMind are offering an ML PhD scholarship for students who identify as gender/racial/ethnic minorities in 2022/23. See thread for details. (1/n)
4
77
235
3
2
13
@PontiEdoardo
Edoardo Ponti
8 months
By the way, @AlanAnsell5 (the first author) is graduating from @Cambridge_Uni and will be on the job market soon. He did amazing research on PEFT and multilingual NLP, make sure to reach out to him if you have a position open!
@PontiEdoardo
Edoardo Ponti
8 months
We scaled sparse fine-tuning (SFT) to LLMs (such as Llama 2) by making it both parameter- and memory-efficient! (q)SFT instruction tuning performance is often better than (q)LoRA with comparable speed and memory load. Paper: Code:
2
70
251
0
1
13
@PontiEdoardo
Edoardo Ponti
5 years
Don't miss the tutorial at @emnlp2019 with @licwu , @gg42554 , and me for the latest developments in semantic specialization (knowledgeable unsupervised pretraining, cross-lingual transfer, and more). Registration is now open:
0
3
13
@PontiEdoardo
Edoardo Ponti
2 years
Ensuring that language technologies are globally equitable is ever more important. I am looking forward to collaborating on this @ERC_Research grant!
@annalkorhonen
Anna Korhonen
2 years
Absolutely thrilled to receive an ERC Advanced Grant to study how to make language technologies globally more equitable. Thanks to the amazing team @roireichart @licwu @anna_barford @PontiEdoardo @CambridgeLTL - and to colleagues & friends for support! #ERCAdG @ERC_Research
5
4
42
0
0
13
@PontiEdoardo
Edoardo Ponti
2 years
Fantastic new work from @nouhadziri : data-centric + modelling solutions can remove most hallucinations from knowledge-grounded dialogue and increase its quality (e.g. abstractiveness)!
@nouhadziri
Nouha Dziri
2 years
📢 Excited to share our new work 💥 FaithDial: A Faithful Benchmark for Information-Seeking Dialogue 📄 🌐 👩‍💻 joint work w. @sivareddyg , @PontiEdoardo , @ehsk0 , @ozaiane , Mo Yu, Sivan Milton #NLProc
2
18
60
1
2
12
@PontiEdoardo
Edoardo Ponti
6 years
I have just accepted an offer for a position as an #ML / #NLP Engineering Intern at #Apple in Cupertino, California. Looking forward to this new adventure! (And curious to admire Norman Foster's #applepark )
1
0
12
@PontiEdoardo
Edoardo Ponti
4 years
You can easily load XCOPA from @huggingface 's dataset library: from datasets import load_dataset xcopa = load_dataset('xcopa') It contains extremely under-documented languages like Southern Quechua and Haitian Creole.
1
1
11
@PontiEdoardo
Edoardo Ponti
4 years
Come and meet me at the #EMNLP2020 Q&A session (6B) about XCOPA, a novel multilingual dataset for common-sense reasoning. When? Nov 17, 9:00 UTC (tomorrow!) Data and leaderboard:
1
2
11
@PontiEdoardo
Edoardo Ponti
8 months
To keep the memory load proportional to the PEFT size instead, we alternate among: 1) updating deltas wrt LLM weights 2) dropping old indices based on their magnitude of change 3) growing new indices based on newly introduced criteria: AG and MA This alternation is inspired
Tweet media one
1
1
10
@PontiEdoardo
Edoardo Ponti
6 months
This is the path that Italy should have followed, too. It's not too late to correct course.
@MaxCRoser
Max Roser
6 months
Until 50 years ago, CO₂ emissions developed in lockstep with economic growth in France. Since the early 1970s, the opposite has been true: emissions declined as people in France got richer.
Tweet media one
158
944
5K
0
1
10
@PontiEdoardo
Edoardo Ponti
8 months
Momentum Approximation (SFT-MA) for even higher memory efficiency - reuses approximate momenta from efficient optimizers like @_arohan_ 's SM3 - performs a dot product between row-wise and column-wise weight statistics - selects the arg top-k subset of indices for growth MA is
Tweet media one
1
1
9
@PontiEdoardo
Edoardo Ponti
2 years
We compare different methods to learn an auto-regressive boundary predictor: - end-to-end (Gumbel) - supervision from subword tokenizers (Unigram) - data boundaries (Whitespaces) We also propose a new segmentation method based on the entropy spikes of the model’s prediction.
Tweet media one
1
1
9
@PontiEdoardo
Edoardo Ponti
3 years
A group of researchers from @AIMS_Next AMMI has devised a promising research project on modelling text and speech in 10 Ghanaian languages. Are you aware of any source of funding (in addition to @LacunaFund ) they could apply to for this project?
1
5
9
@PontiEdoardo
Edoardo Ponti
5 months
@kroscoo @LrecColing Bonus: pay a visit to the Egyptian museum, it's considered the 2nd most important collection in the world after Cairo
0
0
9
@PontiEdoardo
Edoardo Ponti
4 years
@nlpnoah @yoavgo The very concept of business / occupation in Latin and Ancient Greek are only defined by negation: "negotium" and "ἀσχολία" are literally "not leisure" :)
0
0
9
@PontiEdoardo
Edoardo Ponti
3 years
How well do neural models generalise to new image domains, concepts, and languages? Check out MaRVL, a benchmark for grounded language learning created to better reflect the world's cultural and linguistic diversity. 🌐
@ebugliarello
Emanuele Bugliarello
3 years
Is multimodal technology mature enough to be used around the world? We introduce MaRVL, a multilingual and multicultural dataset for vision-and-language reasoning! @hardy_qr @PontiEdoardo @sivareddyg @nigelhcollier @delliott 🗣️ #EMNLP2021 🌐
Tweet media one
2
7
61
0
0
8
@PontiEdoardo
Edoardo Ponti
8 months
SFT (bottom) scatter-adds a sparse matrix to the LLM pre-trained weights LoRA adds a low-rank matrix (top). While more expressive and composable, the memory needed for SFT (DiffPruning, FISH Mask, Lottery Ticket) previously scaled with the model size. This made SFT
Tweet media one
1
0
8
@PontiEdoardo
Edoardo Ponti
5 months
Let's talk research over a glass of wine if you're curious about my lab's recent work on dynamic memory compression, sparse fine-tuning, mixtures of adapters, and zero-shot tokenizer transfer! Thanks to @ShiweiLiu9 @zhengyuan_nlp @oanacamb @annalkorhonen for the invites!
0
0
6
@PontiEdoardo
Edoardo Ponti
4 months
@NandoDF My lab proposed an auto-regressive Transformer architecture that dynamically merges tokens in intermediate layers Promising for multimodal data as 1) tokenizer-free, 2) discards uninformative bits, 3) can learn abstractions at different granularities
@PontiEdoardo
Edoardo Ponti
2 years
Can we increase the efficiency *and* performance of auto-regressive models? We introduce dynamic-pooling Transformers, which jointly perform language modelling and token segmentation. @p_nawrot * @AdrianLancucki @JChorowski 📜 🧑‍💻
2
27
92
0
0
8
@PontiEdoardo
Edoardo Ponti
5 months
A fantastic lineup of speakers to discuss the future of efficient generative AI at @InfAtEd !
@EdinburghNLP
EdinburghNLP
5 months
We are excited to announce that on May 24th and 25th, @InfAtEd will host the *International Workshop on Efficient Generative AI* The event will feature invited talks, panels, posters, and networking sessions. Website and programme:
1
6
19
0
4
8
@PontiEdoardo
Edoardo Ponti
3 years
LT-SFT achieves large gains over adapters (such as MAD-X) in zero-shot transfer to unseen and low-resource languages, including African and American languages @MasakhaneNLP @AmericasNLP
Tweet media one
1
0
7
@PontiEdoardo
Edoardo Ponti
3 years
⚠️The application deadlines are fast approaching! 📅26 November 2021 for international / EU applicants 📅28 January 2022 for UK applicants (4/6)
1
0
7
@PontiEdoardo
Edoardo Ponti
3 years
Finally, our latent-skill model helps interpret the relationships among tasks, as the allocation matrix corresponds to an explicit hierarchy of tasks.
Tweet media one
0
1
6
@PontiEdoardo
Edoardo Ponti
11 months
In ordering events, even SOTA GPT-4 lags behind human performance *and* TemporalBART, a small-scale LM fine-tuned on abundant data for this task Still, conversational tuning of LLaMA 2, instruction tuning with Alpaca, and RLHF are broadly helpful to temporal reasoning
1
1
6
@PontiEdoardo
Edoardo Ponti
3 years
In particular, we learn end-to-end how to 1) allocate subsets of latent skills to multiple tasks; 2) specialise an inventory of parameter-efficient model sub-networks towards individual skills; 3) combine these to dense pre-trained or randomly initialised models.
Tweet media one
1
1
6
@PontiEdoardo
Edoardo Ponti
5 years
I am in Hong Kong for @emnlp2019 , feel free to get in touch if you are interested in few-shot (multilingual) learning, Bayesian neural models, or semantic specialization: I'd be curious to hear your opinions! On a related note, I have a couple of talks on these topics tomorrow 👇
1
1
6
@PontiEdoardo
Edoardo Ponti
2 years
How to suppress negative (or encourage positive) behaviours with EWR? - Create task vectors as the change between (anti)experts fine-tuned on behaviour exemplars and initialisation - Subtract (or add) the task vectors from a pre-trained model, weighted by their Fisher Information
Tweet media one
1
0
6
@PontiEdoardo
Edoardo Ponti
2 years
The gains from dynamic pooling Transformers do not vanish with higher numbers of layers. Hence, they hold promise to further facilitate scaling in language models.
Tweet media one
0
0
6
@PontiEdoardo
Edoardo Ponti
5 years
@TonyZador Brilliant paper! The core idea is reminiscent of Konrad Lorenz's 'Behind the Mirror'. In AI, the inductive bias can also be conceived as a prior over neural parameters . E.g. for learning languages (to appear at @emnlp2019 ):
0
2
6
@PontiEdoardo
Edoardo Ponti
1 year
Two outstanding papers at #ACL2023NLP from @EdinburghNLP , congratulations to all the authors!
@EdinburghNLP
EdinburghNLP
1 year
Congratulations to @nikita_moghe , @tomsherborne , Matthias, @alkoller , @iatitov , @alexandrabirch1 , and Mark for your ACL 2023 Outstanding Papers!! 🚀🧑‍🎓 Extrinsic Evaluation of MT Metrics () Compositional Generalization without Trees ()
1
15
81
0
0
6
@PontiEdoardo
Edoardo Ponti
3 years
Joint work from Olga Majewska*, @erazumovskaia *, me, @licwu , and @annalkorhonen . 🍒🍰 We are also presenting a tutorial on multilingual dialogue at #ACL2022 , don't miss it!
0
2
6
@PontiEdoardo
Edoardo Ponti
11 months
We can’t probe temporal grounding directly as LLMs are incapable of action or perception. So we probe LLMs on textual tasks that require an implicit temporal model: - commonsense knowledge about events - ordering events along a timeline - self-consistency in the temporal model
Tweet media one
1
2
6
@PontiEdoardo
Edoardo Ponti
2 years
This is, I believe, one of our main contributions: most SOTA methods for removing hallucinations in a single repo!
@PontiEdoardo
Edoardo Ponti
2 years
As baselines, we adapt a series of techniques to faithful dialogue generation and we offer their first systematic comparison Task Arithmetic, CaPE, Quark, DExperts, and CTRL are all available in our repository! We welcome external contributions and plan to add more techniques
0
0
4
0
0
6
@PontiEdoardo
Edoardo Ponti
1 year
Inflection and derivation are crucial comparative concepts; yet, their definition is contentious. Linguists proposed several criteria: e.g., Plank (1994) lists 28, which yield contradictory results. @haspelmath even argued that their distinction carries no theoretical weight.
Tweet media one
1
2
5
@PontiEdoardo
Edoardo Ponti
2 months
[3/3] Towards Modular LLMs by Building and Reusing a Library of LoRAs @LucasPCaccia
@_akhaliq
AK
5 months
Towards Modular LLMs by Building and Reusing a Library of LoRAs The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to
Tweet media one
3
66
275
0
0
5
@PontiEdoardo
Edoardo Ponti
6 years
Tomorrow I am giving a talk on AI and language at my alma mater @unipv , in a conference hosted by #CollegiodelMaino . If you are in Pavia, hope to see you there!
@unipv
Università di Pavia
6 years
Giovedì si terrà la conferenza “Prospettive dell’Intelligenza Artificiale”, che si pone l’obiettivo di mettere a confronto punti di vista applicativi diversi dello stesso fenomeno, ormai attualissimo, dell’ #IntelligenzaArtificiale #CollegiodelMaino #unipv
0
2
4
0
0
5
@PontiEdoardo
Edoardo Ponti
2 years
In natural languages, units of meaning (such as words) vary in size. Our model predicts their boundaries, average-pools representations in the same unit, and processes them more efficiently. For a shortening factor K of the input length, attention complexity reduces by K^2.
1
0
6
@PontiEdoardo
Edoardo Ponti
5 months
@karpathy A solution would be to swap tokenizer on the fly to avoid glitch tokens. We've just released our work on zero-shot tokenizer transfer which (coincidentally) does exactly this!
@bminixhofer
Benjamin Minixhofer
5 months
Introducing Zero-Shot Tokenizer Transfer (ZeTT) ⚡ ZeTT frees language models from their tokenizer, allowing you to use any model with any tokenizer, with little or no extra training. Super excited to (finally!) share the first project of my PhD🧵
Tweet media one
30
148
742
0
0
6
@PontiEdoardo
Edoardo Ponti
6 years
@VeredShwartz H is a legitimate note in the German notation, and is equivalent to B. This is how Bach wrote his name as a motif in the Art of Fugue 😉
1
0
5
@PontiEdoardo
Edoardo Ponti
2 years
@nouhadziri Also, it contains the largest-scale audit of gold-standard benchmarks to date, revealing that e.g. 71.4% of turns in Wizards of Wikipedia are hallucinated. Even worse, language models tend to not only 🦜 but even amplify this noise.
0
0
5
@PontiEdoardo
Edoardo Ponti
1 year
We train linear and MLP classifiers on these features and recover most (86% and 90%, respectively) of the classes of the constructions in @unimorph_ (which we take to reflect the intuitions of linguists on what constitutes inflection and derivation)
Tweet media one
1
0
5
@PontiEdoardo
Edoardo Ponti
2 months
@v4rmer @EdinburghUni @johannesbjerva @aautech @CompSciAAU @EdinburghNLP It was truly a pleasure to host you @v4rmer ! And thanks to @johannesbjerva for making your visit possible
0
0
5
@PontiEdoardo
Edoardo Ponti
3 years
A simple modification of @jefrankle and @mcarbin 's algorithm to find "winning tickets" allows for composing (rather than pruning) pre-trained models with sparse, real-valued masks that represent different facets of knowledge (languages, tasks, ...)
Tweet media one
1
0
4