Marzena Karpinska
@mar_kar_
Followers
718
Following
2K
Statuses
356
nlp evaluation of long-form input/output, mt/multilingual nlp, creative text generation 🇵🇱 ➯ 🇯🇵 ➯ 🇺🇸 Former: Postdoc @UMASS_NLP
Amherst, MA
Joined October 2011
RT @aclanthology: If you're working on an ACL paper and citing any papers with extremely long author lists, consider updating your acl_natb…
0
17
0
Test-time scaling has lead to seemingly better #llms such as #o1 or #r1. BUT the models remain vulnerable to prompt injection attacks. ➡️Injecting decoy tasks into the prompt significantly increases reasoning tokens without affecting answer accuracy. This means HIGHER cost💸& MORE computing resources 🌎😩& MORE time ⌛️to get THE SAME results!
🧠💸 "We made reasoning models overthink — and it's costing them big time." Meet 🤯 #OVERTHINK 🤯 — our new attack that forces reasoning LLMs to "overthink," slowing models like OpenAI's o1, o3-mini & DeepSeek-R1 by up to 46× by amplifying number of reasoning tokens.
0
2
12
RT @JaechulRoh: 🧠💸 "We made reasoning models overthink — and it's costing them big time." Meet 🤯 #OVERTHINK 🤯 — our new attack that forces…
0
1
0
RT @mykocyigit: Thrilled to share our latest findings on data contamination, from my internship at @Google! We trained almost 90 Models on…
0
20
0
RT @TuhinChakr: Really cool work on AI and writing with some great findings 1) Experts are infinitely better at detecting AI generated slop…
0
3
0
RT @MohitIyyer: Please consider submitting your research to our upcoming Workshop on Narrative Understanding! Deadline Feb 17.
0
3
0
RT @MaartenSap: CMU LTI is hosting predoc interns this summer, centered around "Language Technologies for All"! Please apply and circulate!…
0
18
0
RT @TuhinChakr: If you are looking to do a PhD in Generative AI , Creativity and Human Behavior , please apply to @sbucompsc PhD program by…
0
41
0
RT @jacobandreas: Are you an undergrad interested in NLP research? Intern with us through the MIT summer research program! Includes stipend…
0
32
0
RT @EhudReiter: New blog: We need better LLM benchmarks Current benchmark (suites) for evaluating LLMs are disappointing. I describe the p…
0
18
0
@EhudReiter 3. also the pairwise design showed not only which llms have label bias, but also that a model can validate one claim but then totally contradict itself (hence shouldn't be given credit)
1
1
1