Goran Glavaš @gg42554 profile

Goran Glavaš

@gg42554

Followers

1K

Following

2K

Statuses

405

Professor for #NLProc @Uni_WUE.

Würzburg, Germany

Joined September 2011

Don't wanna be here? Send us removal request.

Goran Glavaš

@gg42554

24 days

Great new work on multilingual news recommendation (NR) by @iana_andreea! New datasets for multilingual and cross-lingual NR as well as a SotA NR model, new domain-adapted from a multilingual sentence encoder!

Andreea Iana

@iana_andreea

24 days

⚠️Struggling with multilingual news recommendation? We introduce NaSE, a news-adapted sentence encoder!🙌 ✅No costly fine-tuning needed ✅Perfect for cold-start & few-shot scenarios #ecir2025 📰: Try it out @huggingface🤗: 👇

0

3

Goran Glavaš

@gg42554

1 month

If you're looking for a good recipe for training a multilingual LVLM or a just a very strong multilingual LVLM to use, supporting 100 languages (built following the identifed "optimal" recipe), check our latest work! @GregorGeigle and Florian Schneider as lead authors!

Gregor Geigle

@GregorGeigle

1 month

Want to train a *multilingual* LVLM but not sure how? Or looking for a strong model to use? Presenting "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model"! Arxiv: HF Collection:

0

2

Goran Glavaš

@gg42554

2 months

Great work by @fschmidt! Afaik, it's the first massively multilingual benchmark for spoken language understanding (and not just topical classification of speech utterances :). Ready "out-of-the-box" on HF datasets. Paper coming soon (but all important details already described).

Fabian David Schmidt

@fdschmidt

2 months

📣Happy to (pre-)release my Fleurs-SLU benchmark to evaluate massively multilingual spoken language understanding on SIB & Belebele. Work done at @Mila_Quebec with @davlanade @gg42554 @licwu Datasets: Details to follow👇

1

0

4

Goran Glavaš

@gg42554

3 months

Tired of work that probes LLMs or uses them as agents? @iana_andreea will present something cool and different: come check her great work on flexible news recommendation.

Andreea Iana

@iana_andreea

3 months

Excited to present MANNeR at @emnlpmeeting on Wednesday 4 pm! Drop by our poster to chat about recommender systems, personalization & beyond-accuracy objectives in news recommendation! #EMNLP2024

0

1

4

Goran Glavaš

@gg42554

3 months

If you're into Vision-LLMS, come check @GregorGeigle's amazing work! See you in Miami ;)

Gregor Geigle

@GregorGeigle

3 months

The monkey's paw worked well, so I will present 2(!) posters at @emnlpmeeting Wednesday at 4pm. I will be easy to spot - just look for the guy with crutches🩼

0

1

Goran Glavaš

@gg42554

3 months

Yes, come to @fdschmidt's poster on Tuesday! (even I will be there and I haven't been to a conference in 2.5 years :))

Fabian David Schmidt

@fdschmidt

3 months

Excited to present NLLB-LLM2Vec at @emnlpmeeting Tuesday 2pm! Drop by our poster to chat about multilingual & multimodal research. NLLB-LLM2Vec can now easily be used with @huggingface AutoModels — try it esp. for embedding low-resource languages! 🌐

0

1

8

Goran Glavaš

@gg42554

4 months

RT @iana_andreea: 🔎 What's beneath the surface of encoder architectures in news #recsys? 🤔 Our latest work w/ @gg42554 @heikopaulheim goes…

0

2

0

Goran Glavaš

@gg42554

5 months

If you're looking on the fly customization of your news recommendation function, then MANNeR is the framework for you! Great work by @iana_andreea!

Andreea Iana

@iana_andreea

5 months

🚀 Introducing MANNeR, our modular news recommendation 🤖📰 framework that uses ⚖️ metric-based learning to support on-the-fly customization over multiple aspects at inference time. #emnlp2024 findings: w/ @gg42554 @heikopaulheim @dwsunima (1/⏳️)

0

3

Goran Glavaš

@gg42554

6 months

Intermediate code representations like LLVM can indeed be a great facilitator of cross-programming-language transfer for Code-LLMs! Well deserved Oustanding Paper Award for @androneil54 for this great work! It was a pleasure to be part of the effort!

UKP Lab

@UKPLab

6 months

Many interested questions at @androneil54's poster on #IRCoder, an #ACL2024NLP collaboration with @gg42554 (@Uni_WUE) and @IGurevych (@UKPLab) that was just selected as one of this year's @aclmeeting's Outstanding Papers! 🎉🎉🎉

1

2

20

Goran Glavaš

@gg42554

6 months

RT @UKPLab: Code LMs are improving fast 📈, but they are limited in low-resource programming languages (PLs). 😬 In this #ACL2024NLP paper,…

0

5

0

Goran Glavaš

@gg42554

6 months

I really enjoyed working with @vjhofmann on this! The highlight of this work for me is Figure 6: rendering toponym names from their embeddings obtained from the LM after geoadaptation, we basically obtained the map (for the BCMS area)!

Valentin Hofmann

@vjhofmann

6 months

When we hear someone speak a dialect, we can often tell where they're from. Can LMs do the same? Our #TACL paper addresses this question and shows how to boost LMs' geolinguistic skills. 🌍 This paper has been in the making for almost three years, so glad it's finally out! 🧵

1

0

6

Goran Glavaš

@gg42554

7 months

@karmake2 @Uni_WUE Thanks for visiting Santu! Thanks for the talk and the discussions ;).

0

1

Goran Glavaš

@gg42554

8 months

You can now get our multilingual multi-parallel news recommendation dataset from HuggingFace!

Andreea Iana

@iana_andreea

8 months

🎉 Exciting news! xMIND is now also on @huggingface 🤗 Check it out if you need multi-parallel data for cross-lingual news recommendation or domain-specific text retrieval ⬇️ xMINDlarge: xMINDsmall: w/ @gg42554 @heikopaulheim

0

5

Goran Glavaš

@gg42554

8 months

Can your Large Vision-Language Model differentiate tell a Keeshond from a Samoyed? We show that fine-grained object classification is a skill quite complementary to image understanding tested by existing benchmarks and that LVLMs don't excel on the task, to say the least.

Gregor Geigle

@GregorGeigle

8 months

Could you use your Vision-LLM to help identify dogs, plants, dishes, or other things? We investigated and let's just say, do not rely on them when foraging mushrooms in the wild... Paper: Code: 🧵

0

1

Goran Glavaš

@gg42554

8 months

Great effort by @GregorGeigle: we test if explicit grounding objectives reduce hallucination of Large Vision-Language Models. We confirm that they yield better fine-grained image understanding performance, but this does not propagate to less hallucination in open captioning!

Gregor Geigle

@GregorGeigle

8 months

"Grounding tasks improve fine-grained image understanding which helps reduce visual hallucinations in Vision-LLMs" Intuitive claim and often repeated but is it *true*? We tested it in our recent paper: 🧵 (spoiler: no)

0

3

Goran Glavaš

@gg42554

8 months

Great work by @iana_andreea who put an immense effort to collect and clean such massively multi-parallel news dataset. I reckon that that such a domain-specific multi-parallel corpus is of quite some interest for the MT folks :)!

Andreea Iana

@iana_andreea

8 months

‼️ Desperately 👀 for multilingual parallel data for #machinetranslation or text retrieval? Look no further! 🙌 Check out PolyNewsParallel on @huggingface! 📰 w/ 833 language pairs over 64 languages & 17 scripts 🌍 🤗 #NLProc @dwsunima ⬇️⬇️

0

5

Goran Glavaš

@gg42554

8 months

Check out our massively multilingual and (partially) multi-parallel news dataset PolyNews! Great work by @iana_andreea on compiling this massively multilingual domain-specific data as well as on using it to improve multilingual sentence encoders for news recommendation!

Andreea Iana

@iana_andreea

8 months

🤔 If you're interested in more #multilingual news data for other #NLProc tasks, check out PolyNews 📰 on @huggingface ! w/ 77 low & high-resource languages in 19 scripts 🌍 🤗 📃 w/ @fdschmidt @gg42554 @heikopaulheim @dwsunima

0

12

Goran Glavaš

@gg42554

8 months

RT @iana_andreea: 🤔 If you're interested in more #multilingual news data for other #NLProc tasks, check out PolyNews 📰 on @huggingface !…

0

3

0

Goran Glavaš

@gg42554

8 months

MT encoder+LLM in a single end-to-end multilingual model! More effective for cross-lingual transfer than discrete pipelining of the two as in "translate-test"! How do you make Llama "understand" NLLB's representations? Via cheap self-distillation on English data only :)!

Fabian David Schmidt

@fdschmidt

8 months

Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports cross-lingual NLU in 200+ languages🔥 Joint work w/ Philipp Borchert, @licwu, and @gg42554 during my great research stay at @cambridgeltl

0

9

Goran Glavaš

@gg42554

8 months

RT @fdschmidt: Introducing NLLB-LLM2Vec! 🚀 We fuse the NLLB encoder & Llama 3 8B trained w/ LLM2Vec to create NLLB-LLM2Vec which supports…

0

18

0