Marzena Karpinska @mar_kar_ profile

Marzena Karpinska

@mar_kar_

Followers

718

Following

2K

Statuses

356

nlp evaluation of long-form input/output, mt/multilingual nlp, creative text generation 🇵🇱 ➯ 🇯🇵 ➯ 🇺🇸 Former: Postdoc @UMASS_NLP

Amherst, MA

Joined October 2011

Don't wanna be here? Send us removal request.

Marzena Karpinska

@mar_kar_

8 months

Can #LLMs truly reason over loooong context? 🤔 NoCha asks LLMs to verify claims about *NEW* fictional books 🪄 📚 ⛔ LLMs that solve needle-in-the-haystack (~100%) struggle on NoCha! ⛔ None of 11 tested LLMs reach human performance → 97%. The best, #GPT-4o, gets only 55.8%.

30

95

454

Marzena Karpinska

@mar_kar_

1 day

RT @aclanthology: If you're working on an ACL paper and citing any papers with extremely long author lists, consider updating your acl_natb…

0

17

0

Marzena Karpinska

@mar_kar_

5 days

Test-time scaling has lead to seemingly better #llms such as #o1 or #r1. BUT the models remain vulnerable to prompt injection attacks. ➡️Injecting decoy tasks into the prompt significantly increases reasoning tokens without affecting answer accuracy. This means HIGHER cost💸& MORE computing resources 🌎😩& MORE time ⌛️to get THE SAME results!

Jaechul Roh

@JaechulRoh

6 days

🧠💸 "We made reasoning models overthink — and it's costing them big time." Meet 🤯 #OVERTHINK 🤯 — our new attack that forces reasoning LLMs to "overthink," slowing models like OpenAI's o1, o3-mini & DeepSeek-R1 by up to 46× by amplifying number of reasoning tokens.

0

2

12

Marzena Karpinska

@mar_kar_

6 days

RT @JaechulRoh: 🧠💸 "We made reasoning models overthink — and it's costing them big time." Meet 🤯 #OVERTHINK 🤯 — our new attack that forces…

0

1

0

Marzena Karpinska

@mar_kar_

6 days

RT @mykocyigit: Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on…

0

20

0

Marzena Karpinska

@mar_kar_

16 days

Application link can be found at:

0

3

Marzena Karpinska

@mar_kar_

16 days

RT @mjpost: The Microsoft Translator group is looking for a Ph.D. student intern this summer to work with us in Redmond on machine translat…

0

7

0

Marzena Karpinska

@mar_kar_

16 days

RT @TuhinChakr: Really cool work on AI and writing with some great findings 1) Experts are infinitely better at detecting AI generated slop…

0

3

0

Marzena Karpinska

@mar_kar_

16 days

@srivatsamath @jxmnop seems so, check out the paper :)

0

1

Marzena Karpinska

@mar_kar_

16 days

Congrats @jennajrussell on the first paper!

0

1

Marzena Karpinska

@mar_kar_

20 days

@yoavgo i think i prefer benchmarks which tells us something about the models even if/when they saturate (other than we asked such a niche question that the models failed)

0

4

Marzena Karpinska

@mar_kar_

1 month

RT @MohitIyyer: Please consider submitting your research to our upcoming Workshop on Narrative Understanding! Deadline Feb 17.

0

3

0

Marzena Karpinska

@mar_kar_

1 month

RT @MaartenSap: CMU LTI is hosting predoc interns this summer, centered around "Language Technologies for All"! Please apply and circulate!…

0

18

0

Marzena Karpinska

@mar_kar_

1 month

RT @TuhinChakr: If you are looking to do a PhD in Generative AI , Creativity and Human Behavior , please apply to @sbucompsc PhD program by…

0

41

0

Marzena Karpinska

@mar_kar_

1 month

RT @jacobandreas: Are you an undergrad interested in NLP research? Intern with us through the MIT summer research program! Includes stipend…

0

32

0

Marzena Karpinska

@mar_kar_

1 month

@EhudReiter reminds me of this paper:

0

1

Marzena Karpinska

@mar_kar_

1 month

RT @EhudReiter: New blog: We need better LLM benchmarks Current benchmark (suites) for evaluating LLMs are disappointing. I describe the p…

0

18

0

Marzena Karpinska

@mar_kar_

1 month

@EhudReiter 3. also the pairwise design showed not only which llms have label bias, but also that a model can validate one claim but then totally contradict itself (hence shouldn't be given credit)

1