mar_kar_ Profile Banner
Marzena Karpinska Profile
Marzena Karpinska

@mar_kar_

Followers
718
Following
2K
Statuses
356

nlp evaluation of long-form input/output, mt/multilingual nlp, creative text generation 🇵🇱 ➯ 🇯🇵 ➯ 🇺🇸 Former: Postdoc @UMASS_NLP

Amherst, MA
Joined October 2011
Don't wanna be here? Send us removal request.
@mar_kar_
Marzena Karpinska
8 months
Can #LLMs truly reason over loooong context? 🤔 NoCha asks LLMs to verify claims about *NEW* fictional books 🪄 📚 ⛔ LLMs that solve needle-in-the-haystack (~100%) struggle on NoCha! ⛔ None of 11 tested LLMs reach human performance → 97%. The best, #GPT-4o, gets only 55.8%.
Tweet media one
30
95
454
@mar_kar_
Marzena Karpinska
1 day
RT @aclanthology: If you're working on an ACL paper and citing any papers with extremely long author lists, consider updating your acl_natb…
0
17
0
@mar_kar_
Marzena Karpinska
5 days
Test-time scaling has lead to seemingly better #llms such as #o1 or #r1. BUT the models remain vulnerable to prompt injection attacks. ➡️Injecting decoy tasks into the prompt significantly increases reasoning tokens without affecting answer accuracy. This means HIGHER cost💸& MORE computing resources 🌎😩& MORE time ⌛️to get THE SAME results!
@JaechulRoh
Jaechul Roh
6 days
🧠💸 "We made reasoning models overthink — and it's costing them big time." Meet 🤯 #OVERTHINK 🤯 — our new attack that forces reasoning LLMs to "overthink," slowing models like OpenAI's o1, o3-mini & DeepSeek-R1 by up to 46× by amplifying number of reasoning tokens.
Tweet media one
0
2
12
@mar_kar_
Marzena Karpinska
6 days
RT @JaechulRoh: 🧠💸 "We made reasoning models overthink — and it's costing them big time." Meet 🤯 #OVERTHINK 🤯 — our new attack that forces…
0
1
0
@mar_kar_
Marzena Karpinska
6 days
RT @mykocyigit: Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on…
0
20
0
@mar_kar_
Marzena Karpinska
16 days
Application link can be found at:
0
0
3
@mar_kar_
Marzena Karpinska
16 days
RT @mjpost: The Microsoft Translator group is looking for a Ph.D. student intern this summer to work with us in Redmond on machine translat…
0
7
0
@mar_kar_
Marzena Karpinska
16 days
RT @TuhinChakr: Really cool work on AI and writing with some great findings 1) Experts are infinitely better at detecting AI generated slop…
0
3
0
@mar_kar_
Marzena Karpinska
16 days
@srivatsamath @jxmnop seems so, check out the paper :)
0
0
1
@mar_kar_
Marzena Karpinska
16 days
Congrats @jennajrussell on the first paper!
0
0
1
@mar_kar_
Marzena Karpinska
20 days
@yoavgo i think i prefer benchmarks which tells us something about the models even if/when they saturate (other than we asked such a niche question that the models failed)
0
0
4
@mar_kar_
Marzena Karpinska
1 month
RT @MohitIyyer: Please consider submitting your research to our upcoming Workshop on Narrative Understanding! Deadline Feb 17.
0
3
0
@mar_kar_
Marzena Karpinska
1 month
RT @MaartenSap: CMU LTI is hosting predoc interns this summer, centered around "Language Technologies for All"! Please apply and circulate!…
0
18
0
@mar_kar_
Marzena Karpinska
1 month
RT @TuhinChakr: If you are looking to do a PhD in Generative AI , Creativity and Human Behavior , please apply to @sbucompsc PhD program by…
0
41
0
@mar_kar_
Marzena Karpinska
1 month
RT @jacobandreas: Are you an undergrad interested in NLP research? Intern with us through the MIT summer research program! Includes stipend…
0
32
0
@mar_kar_
Marzena Karpinska
1 month
@EhudReiter reminds me of this paper:
0
0
1
@mar_kar_
Marzena Karpinska
1 month
RT @EhudReiter: New blog: We need better LLM benchmarks Current benchmark (suites) for evaluating LLMs are disappointing. I describe the p…
0
18
0
@mar_kar_
Marzena Karpinska
1 month
@EhudReiter 3. also the pairwise design showed not only which llms have label bias, but also that a model can validate one claim but then totally contradict itself (hence shouldn't be given credit)
Tweet media one
1
1
1