![Goran Glavaš Profile](https://pbs.twimg.com/profile_images/1803554625280200704/ZLdc_gHv_x96.jpg)
Goran Glavaš
@gg42554
Followers
1K
Following
2K
Statuses
405
Professor for #NLProc @Uni_WUE.
Würzburg, Germany
Joined September 2011
Great new work on multilingual news recommendation (NR) by @iana_andreea! New datasets for multilingual and cross-lingual NR as well as a SotA NR model, new domain-adapted from a multilingual sentence encoder!
⚠️Struggling with multilingual news recommendation? We introduce NaSE, a news-adapted sentence encoder!🙌 ✅No costly fine-tuning needed ✅Perfect for cold-start & few-shot scenarios #ecir2025 📰: Try it out @huggingface🤗: 👇
0
0
3
If you're looking for a good recipe for training a multilingual LVLM or a just a very strong multilingual LVLM to use, supporting 100 languages (built following the identifed "optimal" recipe), check our latest work! @GregorGeigle and Florian Schneider as lead authors!
Want to train a *multilingual* LVLM but not sure how? Or looking for a strong model to use? Presenting "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model"! Arxiv: HF Collection:
0
0
2
Great work by @fschmidt! Afaik, it's the first massively multilingual benchmark for spoken language understanding (and not just topical classification of speech utterances :). Ready "out-of-the-box" on HF datasets. Paper coming soon (but all important details already described).
📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at @Mila_Quebec with @davlanade @gg42554 @licwu Datasets: Details to follow👇
1
0
4
Tired of work that probes LLMs or uses them as agents? @iana_andreea will present something cool and different: come check her great work on flexible news recommendation.
Excited to present MANNeR at @emnlpmeeting on Wednesday 4 pm! Drop by our poster to chat about recommender systems, personalization & beyond-accuracy objectives in news recommendation! #EMNLP2024
0
1
4
If you're into Vision-LLMS, come check @GregorGeigle's amazing work! See you in Miami ;)
The monkey's paw worked well, so I will present 2(!) posters at @emnlpmeeting Wednesday at 4pm. I will be easy to spot - just look for the guy with crutches🩼
0
0
1
Yes, come to @fdschmidt's poster on Tuesday! (even I will be there and I haven't been to a conference in 2.5 years :))
Excited to present NLLB-LLM2Vec at @emnlpmeeting Tuesday 2pm! Drop by our poster to chat about multilingual & multimodal research. NLLB-LLM2Vec can now easily be used with @huggingface AutoModels — try it esp. for embedding low-resource languages! 🌐
0
1
8
RT @iana_andreea: 🔎 What's beneath the surface of encoder architectures in news #recsys? 🤔 Our latest work w/ @gg42554 @heikopaulheim goes…
0
2
0
If you're looking on the fly customization of your news recommendation function, then MANNeR is the framework for you! Great work by @iana_andreea!
🚀 Introducing MANNeR, our modular news recommendation 🤖📰 framework that uses ⚖️ metric-based learning to support on-the-fly customization over multiple aspects at inference time. #emnlp2024 findings: w/ @gg42554 @heikopaulheim @dwsunima (1/⏳️)
0
0
3
Intermediate code representations like LLVM can indeed be a great facilitator of cross-programming-language transfer for Code-LLMs! Well deserved Oustanding Paper Award for @androneil54 for this great work! It was a pleasure to be part of the effort!
Many interested questions at @androneil54's poster on #IRCoder, an #ACL2024NLP collaboration with @gg42554 (@Uni_WUE) and @IGurevych (@UKPLab) that was just selected as one of this year's @aclmeeting's Outstanding Papers! 🎉🎉🎉
1
2
20
RT @UKPLab: Code LMs are improving fast 📈, but they are limited in low-resource programming languages (PLs). 😬 In this #ACL2024NLP paper,…
0
5
0
I really enjoyed working with @vjhofmann on this! The highlight of this work for me is Figure 6: rendering toponym names from their embeddings obtained from the LM after geoadaptation, we basically obtained the map (for the BCMS area)!
When we hear someone speak a dialect, we can often tell where they're from. Can LMs do the same? Our #TACL paper addresses this question and shows how to boost LMs' geolinguistic skills. 🌍 This paper has been in the making for almost three years, so glad it's finally out! 🧵
1
0
6
You can now get our multilingual multi-parallel news recommendation dataset from HuggingFace!
🎉 Exciting news! xMIND is now also on @huggingface 🤗 Check it out if you need multi-parallel data for cross-lingual news recommendation or domain-specific text retrieval ⬇️ xMINDlarge: xMINDsmall: w/ @gg42554 @heikopaulheim
0
0
5
Can your Large Vision-Language Model differentiate tell a Keeshond from a Samoyed? We show that fine-grained object classification is a skill quite complementary to image understanding tested by existing benchmarks and that LVLMs don't excel on the task, to say the least.
Could you use your Vision-LLM to help identify dogs, plants, dishes, or other things? We investigated and let's just say, do not rely on them when foraging mushrooms in the wild... Paper: Code: 🧵
0
1
1
Great effort by @GregorGeigle: we test if explicit grounding objectives reduce hallucination of Large Vision-Language Models. We confirm that they yield better fine-grained image understanding performance, but this does not propagate to less hallucination in open captioning!
"Grounding tasks improve fine-grained image understanding which helps reduce visual hallucinations in Vision-LLMs" Intuitive claim and often repeated but is it *true*? We tested it in our recent paper: 🧵 (spoiler: no)
0
0
3
Great work by @iana_andreea who put an immense effort to collect and clean such massively multi-parallel news dataset. I reckon that that such a domain-specific multi-parallel corpus is of quite some interest for the MT folks :)!
‼️ Desperately 👀 for multilingual parallel data for #machinetranslation or text retrieval? Look no further! 🙌 Check out PolyNewsParallel on @huggingface! 📰 w/ 833 language pairs over 64 languages & 17 scripts 🌍 🤗 #NLProc @dwsunima ⬇️⬇️
0
0
5
Check out our massively multilingual and (partially) multi-parallel news dataset PolyNews! Great work by @iana_andreea on compiling this massively multilingual domain-specific data as well as on using it to improve multilingual sentence encoders for news recommendation!
🤔 If you're interested in more #multilingual news data for other #NLProc tasks, check out PolyNews 📰 on @huggingface ! w/ 77 low & high-resource languages in 19 scripts 🌍 🤗 📃 w/ @fdschmidt @gg42554 @heikopaulheim @dwsunima
0
0
12
RT @iana_andreea: 🤔 If you're interested in more #multilingual news data for other #NLProc tasks, check out PolyNews 📰 on @huggingface !…
0
3
0
MT encoder+LLM in a single end-to-end multilingual model! More effective for cross-lingual transfer than discrete pipelining of the two as in "translate-test"! How do you make Llama "understand" NLLB's representations? Via cheap self-distillation on English data only :)!
Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports cross-lingual NLU in 200+ languages🔥 Joint work w/ Philipp Borchert, @licwu, and @gg42554 during my great research stay at @cambridgeltl
0
0
9
RT @fdschmidt: Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports…
0
18
0