Hannah Rose Kirk @hannahrosekirk profile

Hannah Rose Kirk

@hannahrosekirk

Followers

3,300

Following

752

Media

65

Statuses

569

AI researcher trying to make sense of all things cyberspace 🤖 Uni of Ox PhD (loading…) @oiioxford & @OxfordAI . Prev @turinginst & @Cambridge_Uni . Visitor @ NYU

https://t.co/ncM7xyR6Tp

Joined June 2012

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#ปิ่นภักดิ์EP13 • 779510 Tweets

TLP LETS GET BACK TOGETHER • 758527 Tweets

ارسنال • 719246 Tweets

ليفربول • 716122 Tweets

Madison Square Garden • 366733 Tweets

Arsenal • 279070 Tweets

Liverpool • 162845 Tweets

Palmer • 152084 Tweets

Saka • 107381 Tweets

Ten Hag • 84984 Tweets

Manchester United • 83441 Tweets

Diaz • 82649 Tweets

比例復活 • 71649 Tweets

West Ham • 69392 Tweets

Gabriel • 69040 Tweets

#ARSLIV • 55984 Tweets

Dalot • 37432 Tweets

Merino • 32949 Tweets

Bengals • 27455 Tweets

Nunez • 25922 Tweets

Arteta • 25047 Tweets

#deprem • 23834 Tweets

Trent • 23686 Tweets

Michael Oliver • 23011 Tweets

Partey • 21494 Tweets

Robertson • 20022 Tweets

Martinelli • 17850 Tweets

Havertz • 17605 Tweets

#InterJuve • 15603 Tweets

Jalen • 15005 Tweets

Danilo • 12992 Tweets

Dzeko • 11159 Tweets

Jameis • 10745 Tweets

Gibbs • 10082 Tweets

キヴィオル

Mason Rudolph

アーセナル

الانتر

Kyle Hamilton

Kiwior

اليوفي

Kerem Aktürkoğlu

Konate

Robbo

Flacco

#شاليهات_بيليسما_الريفيه_جده

Drake Maye

Anthony Taylor

Kyle Pitts

Anthony Richardson

Last Seen Profiles

@JoelLangeRnC

@TIME

@JohnVPDX

@Yusuf_1SLAM

@BlackLeger

@KKRT0302

@AdrianeRickel

@se77774

@Asupanmu359763

@Ware_FC

@kinderroom1502

@MahaAlsahayyan

@pepe64728422

@Eng_alig

@Anji00790092

@Aykutozturk1907

@NambiarJayan

@MrPeachzen

@Kim70654020

@cinkaymorr

Pinned Tweet

Hannah Rose Kirk

@hannahrosekirk

6 months

Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs

21

103

444

Hannah Rose Kirk

@hannahrosekirk

2 years

Gutted @DeepMind has paused internship cycles for this year (espesh after making it to final round🥲) Internships are a great way for early stage academics to get a feel for industry - the tech hiring freeze is hitting hard😑 Buuut I'm now back on the market so hmu with ideas👀😅

20

18

513

Hannah Rose Kirk

@hannahrosekirk

6 months

Determined to write my entire PhD in this sunny corner in my slippers 🤘😎

15

4

384

Hannah Rose Kirk

@hannahrosekirk

6 months

Published in Nature Machine Intelligence today, our new article explores the trade-offs of personalised alignment in large language models ⚖️ Personalisation has potential to democratise decisions over how LLMs behave, but brings its own set of risks...

4

51

230

Hannah Rose Kirk

@hannahrosekirk

2 years

🗣️Interested in football? 👀 We ( @turinginst x @Ofcom ) have analysed over 2 mil tweets from the 21-22 English Premier League season⚽ We created state-of-the-art AI to detect and track abuse towards players 🤖 Here's a 🧵on our methods and findings \n

8

42

193

Hannah Rose Kirk

@hannahrosekirk

2 years

✨New preprint (w/ Jakob Mökander, @jonasschuett and @Floridi ) ✨ In this paper, we propose a policy framework for auditing LLMs by breaking down responsibilities at the governance-, model- and application-level. 🧵

8

33

170

Hannah Rose Kirk

@hannahrosekirk

1 month

Wahoo PRISM will officially be taking a trip to @NeurIPSConf this year as an oral presentation 🤩 (and my first ever 10/10 in a conference review process 🤯)

Hannah Rose Kirk

@hannahrosekirk

6 months

Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs

21

103

444

6

11

167

Hannah Rose Kirk

@hannahrosekirk

2 years

🚨 New paper and datasets! 🚨 After sitting on my hands for many months 😬 I'm delighted that our #Hatemoji paper is going to @naaclmeeting ! 😍🤩😎🆒 In a nutshell 🥜it uses human-and-model-in-the-loop learning 🤖🤝🙆 to tackle emoji-based hate A 🧵 on all our new resources 1/

6

25

158

Hannah Rose Kirk

@hannahrosekirk

5 months

🌎Introducing LINGOLY, our new reasoning benchmark that stumps even top LLMs (best models only reach ~35% accuracy)🥴 In a colab between @UniofOxford , @Stanford and UK Linguistic Olympiad puzzle authors, we stress test LLMs on over 90 low-resource and extinct languages...

3

35

145

Hannah Rose Kirk

@hannahrosekirk

1 year

🚨Life Update!🚨 I'm starting year 3 of my PhD @UniofOxford on diversity + bias in empirical alignment of LLMs.. But I'm in NYC for fall as a visiting academic @nyuniversity !🎃🍂💜 collaborating with @hhexiy 's lab + @sleepinyourhat 's Alignment Research Group (excitement levels👇)

7

2

142

Hannah Rose Kirk

@hannahrosekirk

2 years

New #EMNLP2022 paper! Do you research online harms, misinformation or negative biases? Could your datasets contain examples of harmful text? 🤔 If yes, read our paper! 🫵 Shoutout to my brilliant co-authors: @Abebab , @LeonDerczynski & @bertievidgen A 🧵

Handling and Presenting Harmful Text in NLP Research

Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community....

arxiv.org

4

24

107

Hannah Rose Kirk

@hannahrosekirk

1 year

How does human feedback steer LLM behaviours?🧐 Whose voices dominate? 🗣️What challenges remain and how can we do better as a community in the future?🔮 All these questions and more answered in our new survey paper, accepted at #EMNLP23 ! a small 🧵

3

17

109

Hannah Rose Kirk

@hannahrosekirk

2 years

Can AI tell us why something is sexist online?🧐 Our new @SemEvalWorkshop #NLProc task invites you to create systems that identify sexist content and explain why with fine-grained predictions🔎 Check out our competition, organised by @rewire_online & sponsored by @MetaAI . Link⬇️

2

17

104

Hannah Rose Kirk

@hannahrosekirk

11 months

Hi 🌎! I've arrived at @NeurIPSConf 🫡 Reach out if you wanna talk all things human feedback + sociotechical alignment. I’m presenting this cute poster, but we’re also building an awesome new human feedback dataset (release in Jan 👀) that I can’t wait to tell everyone about🕺

3

81

Hannah Rose Kirk

@hannahrosekirk

3 years

I never knew I needed this framed in my bedroom before now #dalle2

1

7

66

Hannah Rose Kirk

@hannahrosekirk

5 months

Our recent project (PRISM, ) was supported by this program 🌈🙏 Academic access to SOTA LLMs is a vital cog in the machine of high-quality third-party research in safety & alignment ⚙️ @CohereForAI setting a good example here in global and domain coverage🤘

Cohere For AI

@CohereForAI

5 months

To date we have awarded 75+ grants to researchers across 18 countries, supporting them to access state-of-art language models for their projects. 🌍 Read more in our latest blog post:

1

10

40

0

11

62

Hannah Rose Kirk

@hannahrosekirk

2 years

I am @ #NAACL2022 for my first **ever** in-person conference 🤩 I'll be talking about hate + emoji = #Hatemoji (Sesh 8, 13/07) Come say hi if you want to swap favourite emojis 🦄🫠🚀❤️‍🔥🆒👨‍🎤🍜or discuss how these little pictures pose challenges for language modelling 🤟

2

3

59

Hannah Rose Kirk

@hannahrosekirk

2 years

A couple months ago I submitted my first ever grant proposal😬I'm delighted it was accepted by @DynabenchAI & @MetaAI ! We'll be working on improving feedback for models-and-humans-in-the-loop with Adversarial NERDs - Novel, Efficient, Realistic Datasets🤓

3

5

59

Hannah Rose Kirk

@hannahrosekirk

4 months

Looking forward to this! I'll be speaking about the gnarly concept of alignment and how it morphed over the past decade, starting as narrow clean def of operator-intent alignment and evolving into a messy sociotechnical usage (for better or for worse, you choose!)...Come join! 👐

Cohere For AI

@CohereForAI

4 months

The community-led AI Safety & Alignment group is hosting a recipient of the Cohere for AI Research Grant Program! @hannahrosekirk will be giving a presentation on "What Does it Mean for AI to be Aligned?" June 27th! ✨ Learn more:

2

5

23

2

4

58

Hannah Rose Kirk

@hannahrosekirk

11 months

Didn’t you know? The last day of @NeurIPSConf is “come dressed as your poster” day 🤭🤭

Hannah Rose Kirk

@hannahrosekirk

11 months

Hi 🌎! I've arrived at @NeurIPSConf 🫡 Reach out if you wanna talk all things human feedback + sociotechical alignment. I’m presenting this cute poster, but we’re also building an awesome new human feedback dataset (release in Jan 👀) that I can’t wait to tell everyone about🕺

3

81

0

58

Hannah Rose Kirk

@hannahrosekirk

8 months

My big ask of 2024: how can we design systems of human feedback which are sustainable, truly participatory (not extractive), privacy-preserving & inclusive of pluralistic values? Lots of attention on open models but what about post-training data? A passionate🧵+ call to action👇

Yann LeCun

@ylecun

1 year

Human feedback for open source LLMs needs to be crowd-sourced, Wikipedia style. It is the only way for LLMs to become the repository of all human knowledge and cultures. Who wants to build the platform for this?

207

347

2K

3

4

54

Hannah Rose Kirk

@hannahrosekirk

2 years

💫 New paper! Abuse detection tools often prioritise efficacy, ignoring efficiency🤔But labelling data is expensive, complex and risks psychological harm👎 We argue for the importance of data efficiency, demonstrated with transformers-based #activelearning

1

6

53

Hannah Rose Kirk

@hannahrosekirk

5 months

Our new doctor @_FelixSimon_ from @oiioxford celebrating in style! (P.S. take this opportunity to follow Dr Simon if you care about journalism, AI, misinfo, and general endless reels of doctorly-worthy wisdom) 😻

5

1

54

Hannah Rose Kirk

@hannahrosekirk

5 months

Ensuring a safe & sustainable future for AI requires understanding ecosystem dynamics & outsized networks of influence. In our new paper (w/ @cailean_osborne & @jen_gineered ), we examine the @huggingface hub for patterns of developer activity in open AI🤗

The AI Community Building the Future? A Quantitative Analysis of...

Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI...

arxiv.org

5

6

50

Hannah Rose Kirk

@hannahrosekirk

6 months

I had no fewer than five reviewer requests in the past week 🫠 Is this what it feels like to be a grown up academic? 😮‍💨 How does one balance all the time-hungry tasks in academia? My days and weeks feel like zoooom by at the moment 🏎️💨

11

3

48

Hannah Rose Kirk

@hannahrosekirk

1 year

Very happy (and honoured!) that our #ACL2023 SemEval task won the best paper award 🥇🥳🤩 sharing my smiles with brilliant coauthors @paul_rottger , @eijnewniy and @bertievidgen 🙏😁 If you’re at ACL online or in Toronto 🍁🇨🇦I’m presenting the work today at 15:30. 🔗and🧵below!

Hannah Rose Kirk

@hannahrosekirk

2 years

🚨 New paper! Our #SemEval dataset on the Explainable Detection of Online Sexism ( #EDOS ) attracted ~600 downloads! 📈 We describe our new taxonomy and dataset of fine-grained sexism vectors, designed to predict if and **why** online content is sexist.

1

0

17

1

48

Hannah Rose Kirk

@hannahrosekirk

4 months

Our PRISM Alignment Project (released in April) was part-funded by @CohereForAI . We also target alignment data across cultures + geographies. There's quite a lot of overlap, so I'm happy to see this paper was updated yesterday and now cites our work, not just shares our name😁✌️

Cohere For AI

@CohereForAI

4 months

Safety isn't one-size-fits-all. It varies by culture, location and language, yet traditional alignment work often treats it as static. Excited to introduce our new work on alignment that captures both local 🧧🎃🗿 and global 🌎 preferences! 📜

3

10

50

1

0

47

Hannah Rose Kirk

@hannahrosekirk

5 months

New week, new blog! In collab with @MLCommons , we wrote up a summary of the PRISM alignment project, covering what motivated us to collect feedback from 1,500 humans around the world, how we collected it & which challenges we encountered on the road 🙇‍♀️

Unveiling the PRISM Alignment Project - MLCommons

Researchers at Oxford in collaboration with MLCommons and others, released the PRISM alignment project – a new resource for the AI community to understand how humans from around the world perceive...

mlcommons.org

2

10

47

Hannah Rose Kirk

@hannahrosekirk

1 year

New benchmark for @NeurIPSConf ! Bias testing of multimodal models was lacking..So, in VisoGender, we combine winograd style schema of NLP w/ visual-linguistic stress-testing. ✨This is my first last author paper accepted at neurips😁 v. proud of all the students involved!

Siobhan Mackenzie Hall

@smhall97

1 year

VisoGender can evaluate two types of bias: 1) bias in pronoun resolution for image captioning (resolution bias) & 2) bias in the representation returned from a gender-neutral search query image search (retrieval bias). 4/n

1

0

1

12

45

Hannah Rose Kirk

@hannahrosekirk

2 years

In this pre-print we ask how, and in what ways, LLMs should be personalised. We've seen increasing personalisation and fragmentation in other digital tech and have reason to believe LLMs may also be more personalised in the not too distant future. 🧵

2

5

43

Hannah Rose Kirk

@hannahrosekirk

4 years

Is #AI generated text biased towards protected groups? In a new @OxfordAI paper, we look at occupational biases for gender intersectionality with religion, sexuality, political affiliation...Why does #GPT -2 think women are maids not computer programmers?

10

9

40

Hannah Rose Kirk

@hannahrosekirk

4 months

Great to see our personalised LLMs article in this @NatMachIntell editorial. Increased empathy exemplifies 2nd-order effects of personalised alignment...may seem preferable in the short-term but has long-term consequences for healthy human-AI interaction

Empathic AI can’t get under the skin

Nature Machine Intelligence - Personalized LLMs built with the capacity for emulating empathy are right around the corner. The effects on individual users need careful consideration.

www.nature.com

1

9

43

Hannah Rose Kirk

@hannahrosekirk

3 years

So DALL·E 2 has made a pretty big splash in the community (credit to #dalle2 for my imagined depiction)...BUT to limit the water damage, we have to think not only about the strengths of this model but also it's remaining gaps. 1/

2

6

38

Hannah Rose Kirk

@hannahrosekirk

3 years

My current research at @oiioxford focusing on emoji-based hate speech featured in Bloomberg today. It couldn't be more timely after the torrent of abuse following the #Euros2020Final directed towards @BukayoSaka87 , @MarcusRashford and @Sanchooo10 . (1/5)

Social Media Fails to Curb Racist Emojis Aimed at Soccer Stars

A wave of online racism aimed at some of England’s Black soccer players has highlighted how social media companies’ content moderation systems are failing to monitor the use of emojis.

www.bloomberg.com

2

13

34

Hannah Rose Kirk

@hannahrosekirk

7 months

haha just realised @huggingface 's stated alignment norms include "huggy" 🤗

1

2

34

Hannah Rose Kirk

@hannahrosekirk

2 months

Can I start the PhD all over again? 😁😁 This is a very cool opportunity and one you should jump on to go work with Seth on LM agents! I would if I could 🙇‍♀️

Seth Lazar

@sethlazar

2 months

I'm looking for international PhD students to join my lab at ANU, for a project on language model agents (LMAs) and society. It’s an exciting time to be working in normative philosophy of computing and sociotechnical AI safety, with some deep and fundamental questions being

3

50

141

2

1

31

Hannah Rose Kirk

@hannahrosekirk

7 months

Best figure I ever made in a paper is still hats off to this one 😬😬😬😬😬

1

0

32

Hannah Rose Kirk

@hannahrosekirk

3 months

Massive global internet outtage feels like that part of the movie when Oxford Internet Insitute students don their capes and assemble in a dramatic fashion 🦹‍♀️ This is what we’ve trained for @oiioxford 🫡

0

1

31

Hannah Rose Kirk

@hannahrosekirk

7 months

Awesome report! The democratic and participatory AI space has a growing family 🫂 excited to release our dataset this month with shared aims to get perspectives of people from around the world to decide alignment norms (sign up to be notified at launch!🚀)

Collective Intelligence Project

@collect_intel

7 months

Today we’re publishing our ‘Roadmap to a Democratic AI’! Read here: Saying we want more democratic tech is one thing. Doing it is another. What can the ecosystem build, research, advocate for, and fund in 2024 to democratize AI? (1/8)

5

27

103

2

4

31

Hannah Rose Kirk

@hannahrosekirk

2 months

So happy for @paul_rottger & @vjhofmann who led our team to this award! I nearly did my PhD on LLMs + surveys, but Paul and I hit some negative results on stability in Dec '22. They eventually became this paper. Stick with the unexpected, maybe it will be outstanding one day :)

Paul Röttger

@paul_rottger

2 months

We won OUTSTANDING PAPER at #ACL2024 for our work on evaluating values and opinions in LLMs 🥳 Thank you to the reviewers and awards committee, and again to my amazing co-authors, especially joint first author @vjhofmann Check out the poster + paper + full author list below 👇

11

13

138

2

1

30

Hannah Rose Kirk

@hannahrosekirk

1 year

Check out our new paper on testing exaggerated safety behaviours in LLMs. We provide a new benchmark (XSTest) to evaluate when LLMs too readily conflate safe user requests with unsafe ones…asking where to buy a can of coke or how to smash a piñata at a party are not unsafe 🙅‍♀️

Paul Röttger

@paul_rottger

1 year

NEW PREPRINT! LLMs should be helpful AND harmless. This is a difficult balance to get right... Some models refuse even safe requests if they superficially resemble unsafe ones. We built XSTest to systematically test for such "exaggerated safety". 🧵

7

22

128

1

7

30

Hannah Rose Kirk

@hannahrosekirk

3 years

brb while I forget about my PhD and spend hours generating DALL-E images. Current fav is a sea otter painted in the style of Johannes Vermeer's 'Girl with a Pearl Earring' 🥺🥺🥺 (I dare you to reply with any generations which could possibly be cuter than this) 😍

2

1

30

Hannah Rose Kirk

@hannahrosekirk

6 months

I be one of the @oiioxford researchers that say this 🧙‍♂️

Oxford Internet Institute

@oiioxford

6 months

New release alert! LLMS: getting personal. Personalisation has the potential to democratise who decides how LLMs behave, but comes with risks for individuals and society, say @oiioxford researchers 1/3

1

0

5

1

0

30

Hannah Rose Kirk

@hannahrosekirk

4 months

Woo our PRISM alignment dataset featured in a lovely walkthrough video (that I didn’t even need to make) ☺️☺️☺️ thanks to Ben and Argilla you can now be a data explorer 🫡🗺️🧭

Ben Burtenshaw

@ben_burtenshaw

4 months

🚀 Excited to launch our new series, Data Explorer by @argilla_io ! 🎥 We dive deep into datasets and their impact on model performance. Our first episode explores the PRISM dataset by @hannahrosekirk et al., featuring diverse feedback from various demographics in (un)guided

2

8

21

1

5

28

Hannah Rose Kirk

@hannahrosekirk

6 months

Spending today dabbling in a new academic discipline at The Inclusive Gaming Conference @GamesOxford 👾👾 kicked off by a killer talk by @lujainmibrahim on educational games for understanding mysteries behind “The Algorithm” 🤖 and rounded off with a cosy gaming corner feat 🦙

2

1

25

Hannah Rose Kirk

@hannahrosekirk

2 years

If chemistry labs have protocols for handling hazardous materials, why as the #NLProc community do we not have protocols for studying hazardous language? 🤨 Read our new paper for a discussion of these issues and what can be done about it. @LeonDerczynski @Abebab @bertievidgen

Leon Derczynski ✍🏻 🍂🍏

@LeonDerczynski

2 years

New paper! 📝 Text data can contain harmful things, but unlike other disciplines, we don't have protocols for handling or presenting harmful material🛢️. This paper with @hannahrosekirk @Abebab @bertievidgen presents the risks & some solutions. 🥼🥽

3

23

145

1

8

24

Hannah Rose Kirk

@hannahrosekirk

3 years

Really great to be involved in this project. If you’re looking for a dataset on harmful language, this is the place to go!

Leon Derczynski ✍🏻 🍂🍏

@LeonDerczynski

3 years

Updated with now 90 openly-available abusive language datasets covering 17 languages! #nlproc

2

34

150

0

2

25

Hannah Rose Kirk

@hannahrosekirk

3 months

How long do people spend reviewing one conference paper? I used to put a lot of love and care into reviewing but the volumes now makes it totally unsustainable....I have ✨9✨ reviews to do before next week...

7

0

24

Hannah Rose Kirk

@hannahrosekirk

7 months

Can't stop thinking about this paragraph from @mona_sloane 's brilliant article (), how even the notions of worst-case AI ("The singularity overlord") are swayed away from multiplicity, diversity and participation. The @NewYorker cartoon is fun too! 🕺🪩🤖

1

23

Hannah Rose Kirk

@hannahrosekirk

3 years

Excited to be talking today on the Public Policy stage at #AIUK hosted by @turinginst . Join the online safety team as we take you on a journey through football twitter ⚽, detecting abuse using our very own AI 🤖and visualising everything in our new #OnlineHarmsObservatory 📈📊🔭

0

4

22

Hannah Rose Kirk

@hannahrosekirk

7 months

I was once on TV but via the front cover of @YourCatmagazine

eek! than 🕸️

@slimeguyethan

8 months

share a piece of lore about yourself

5K

2K

36K

2

0

22

Hannah Rose Kirk

@hannahrosekirk

2 years

All our stats on the big screeeeeen 👀

Sky Sports News

@SkySportsNews

2 years

A new report, conducted by Ofcom and the Alan Turing institute, into the scale of abuse targeted at Premier League players on twitter identified has been published.

85

121

1K

0

2

20

Hannah Rose Kirk

@hannahrosekirk

7 months

Happens every time I wanna find the alpaca farm paper 🦙🦙🦙🦙🦙 One of these days I'm just going to go to a local alpaca farm 🦙🦙🦙🦙🦙🦙🦙🦙🦙🦙

3

0

20

Hannah Rose Kirk

@hannahrosekirk

2 years

Great speaking to @LukeGbedemah at @tortoise about content moderation in low-resourced languages. @MetaAI ‘s #NLLB model is a step in the right direction but humans still need to be in the loop for contextual and cultural understanding 🌎🌍

Turing Public Policy

@TuringPubPol

2 years

Our research covers various complex aspects of #AI technologies, one of them being #online moderation. In this @tortoisearticle , @hannahrosekirk explains why human intervention is still needed in the process of moderating AI-enabled content.

0

1

5

0

3

18

Hannah Rose Kirk

@hannahrosekirk

6 months

This has been a mammoth project BUT being in goblin mode 🧌 for the past couple months has been greatly improved by the company of great co-authors: @computermacgyve @bertievidgen @awhitefield8 @paul_rottger @max_nlp @katemargatina @adinamwilliams @hhexiy + Andrew, Rafael, Juan!

2

0

19

Hannah Rose Kirk

@hannahrosekirk

8 months

If life's been tough recently (it has for me) then I'd recommend some LLM-written motivation to make you smile... Catch me walking round the office next week affirming that "my brain is a party palace of knowledge" (Gemini) & "every paper draft is my victory dance" (GPT-4) 👏💅🕺

1

0

19

Hannah Rose Kirk

@hannahrosekirk

5 months

A true pleasure to have your own work summarised so eloquently! @IasonGabriel 's framing of the personalisation paradox is gold🪙To what extent can agents privilege their users' needs? How do these 1:1 prioritisations affect the ecosystem as a whole with a web of many 1:1 agents?

Iason Gabriel

@IasonGabriel

5 months

A great article on a frontier question for AI alignment: to what extent can agents privilege the user? "Defining the bounds of responsible and socially acceptable personalization is a non-trivial task beset with normative challenges" h/t @hannahrosekirk

1

8

31

1

18

Hannah Rose Kirk

@hannahrosekirk

3 months

Trying to send a message with Siri whilst driving is the most damning benchmark on the progress of AI agents 🤦🫠

3

0

17

Hannah Rose Kirk

@hannahrosekirk

2 years

Our analysis on the world cup is live ⚽️🤖 We processed tweets using our language model trained with active learning on premier league footballers data to classify abusive and non-abusive (critical, neutral or positive) tweets during the group stages Read more @guardian ...

Guardian news

@guardiannews

2 years

Harry Kane most abused England player on Twitter since World Cup started – study

5

12

1

2

16

Hannah Rose Kirk

@hannahrosekirk

6 months

There's so much more for the community to explore 🚀 You can access the dataset on HuggingFace: , and we have a full code book on our Github:

3

17

Hannah Rose Kirk

@hannahrosekirk

1 month

@NeurIPSConf My last author paper (LINGOLY) also accepted for an oral!! 🙀 Time to brush up on my powerpoint animations 😎

Hannah Rose Kirk

@hannahrosekirk

5 months

🌎Introducing LINGOLY, our new reasoning benchmark that stumps even top LLMs (best models only reach ~35% accuracy)🥴 In a colab between @UniofOxford , @Stanford and UK Linguistic Olympiad puzzle authors, we stress test LLMs on over 90 low-resource and extinct languages...

3

35

145

0

1

18

Hannah Rose Kirk

@hannahrosekirk

2 years

🚨 New paper! Our #SemEval dataset on the Explainable Detection of Online Sexism ( #EDOS ) attracted ~600 downloads! 📈 We describe our new taxonomy and dataset of fine-grained sexism vectors, designed to predict if and **why** online content is sexist.

SemEval-2023 Task 10: Explainable Detection of Online Sexism

Online sexism is a widespread and harmful phenomenon. Automated tools can assist the detection of sexism at scale. Binary detection, however, disregards the diversity of sexist content, and fails...

arxiv.org

1

0

17

Hannah Rose Kirk

@hannahrosekirk

2 years

Amazing to have a star from women’s football at today’s event @aoifemannion_ 🤩 watch this space for future work into the women’s game and their online experiences 👀👀🤍🤍

Aoife Mannion

@aoifemannion_

2 years

Also on a separate note, I never thought I would meet @GaryLineker , @AndyBurnhamGM & @PoulterWill at the same event. Gary as captivating in person as he is on our screen. Andy, legendary mayor of Greater Manchester. And Will so so lovely and eloquent. Totally star struck

13

11

306

1

13

Hannah Rose Kirk

@hannahrosekirk

4 months

2024 is a BIG year for elections…do BIG LLMs pose more of a political persuasion threat? 🤨 This new paper led by @KobiHackenburg evidences diminishing returns to parameter scale. I was lucky enough to read the paper closely a while back and still think about it regularly 🤓

Kobi Hackenburg

@KobiHackenburg

4 months

‼️New preprint: Scaling laws for political persuasion with LLMs‼️ In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of language models yields sharply diminishing persuasive returns: 1/n

9

58

182

0

16

Hannah Rose Kirk

@hannahrosekirk

2 years

Ok so meeting @GaryLineker is not usually part of our day job as @turinginst researchers...🤩 BUT our report launched today (with @Ofcom ) on tracking abuse in football in the @premierleague using #AI Go read it, spread the word and team up to tackle online abuse together ⚽️

Hannah Rose Kirk

@hannahrosekirk

2 years

🗣️Interested in football? 👀 We ( @turinginst x @Ofcom ) have analysed over 2 mil tweets from the 21-22 English Premier League season⚽ We created state-of-the-art AI to detect and track abuse towards players 🤖 Here's a 🧵on our methods and findings \n

8

42

193

0

1

15

Hannah Rose Kirk

@hannahrosekirk

6 months

XSTest was one of my favourite projects I worked on last year. There are always competing objectives in alignment-tuning, and it's genuinely awesome to see this benchmark being used to stress test the helpful-harmless trade-off in new model releases🦙 Go @paul_rottger + team! 🚀

Paul Röttger

@paul_rottger

6 months

Super excited to see XSTest, our test suite for false refusals in LLMs, be used as part of @Meta 's Llama 3 release! This comes about a month after XSTest was used by @AnthropicAI to evaluate their Claude 3 models on release. Nothing better than seeing your evals in action 🥳

4

8

56

0

1

16

Hannah Rose Kirk

@hannahrosekirk

2 years

HatemojiCheck is available on Github and HuggingFace. It's intended purpose is to reveal model weaknesses to simple statements of emoji-based hate so we can granularly map out those decision boundaries 🧐 4/

HannahRoseKirk/HatemojiCheck · Datasets at Hugging Face

huggingface.co

1

5

16

Hannah Rose Kirk

@hannahrosekirk

1 year

AI safety is both an art and a science - we need to cast a wide net of community perspectives to get it right 🎣 That's why we've launched ⭐️the ART of Safety workshop⭐️ (colocated with @aaclmeeting ), home to the #AdversarialNibbler challenge 🐀🐀 Curious for more? 🧵⤵️

1

16

Hannah Rose Kirk

@hannahrosekirk

6 months

Closing thoughts: Alignment is tricky not just because of technical reasons or statistical choices but also for messy, normative and data-centric human factors. Let's dig into these, not shy away from them by always simulating human participants 🫡

1

16

Hannah Rose Kirk

@hannahrosekirk

8 months

It’s equal parts awesome and awful in academia when someone writes the exact paper you’ve been dreamin of 😁 this is really lovely work from @tzushengkuo and team! (So more parts awesome really…)

Tzu-Sheng Kuo 郭子生

@tzushengkuo

8 months

✨New #CHI2024 Paper How might we empower communities to curate evaluation datasets for AI that impacts them? We present Wikibench, a system that enables communities to collaboratively curate AI datasets, while navigating ambiguities and disagreements through discussion. (1/9)

6

32

149

1

2

16

Hannah Rose Kirk

@hannahrosekirk

3 years

I’ve been working for the past month on ‘red-teaming’ #dalle2 - a process where dedicated folks attack the model vulnerabilities to find cracks in the system. For full details, you can read about our efforts in the @OpenAI system card 2/

1

16

Hannah Rose Kirk

@hannahrosekirk

3 years

Coverage of our new research 🧐😎👏

Sky News

@SkyNews

3 years

Some algorithms designed to track down hateful content - including a Google product - are not as effective when these symbols are used, according to the Oxford Internet Institute

9

8

24

0

15

Hannah Rose Kirk

@hannahrosekirk

3 years

News flash🗣️ @Hannah_LSBailey and I are releasing our first @OxDemTech newsletter tomorrow on China Information Operations. This month, we discuss diplomats' strategic use of Twitter, vaccine diplomacy, Chinese crypto and the double-edged sword of surveillance. Subscribe below!

2

15

Hannah Rose Kirk

@hannahrosekirk

2 years

Isn't predicting what word comes next exactly what a language model does? 🤨 Even if we give #ChatGPT some contextual details to fill in the blanks it refuses to answer... "as a language model, I cannot predict what words are most likely to come next"...seems fishy/dishonest 🐟

2

0

15

Hannah Rose Kirk

@hannahrosekirk

3 months

Paul’s conference timetables are legendary 😁 go check out our Political Compass or Spinning Arrow paper as part of this great line up! Paper:

Political Compass or Spinning Arrow? Towards More Meaningful...

Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around...

arxiv.org

Paul Röttger

@paul_rottger

3 months

Very excited about lots of papers on the sociotechnical alignment of language models at #ACL2024 , including one @vjhofmann and I are presenting on Mon+Tuesday! As usual, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵

3

12

82

0

15

Hannah Rose Kirk

@hannahrosekirk

6 months

Blog alert on the PRISM alignment project w/ quotes from me & both PhD supervisors ( @bertievidgen , @computermacgyve ). As Scott nicely puts it: "Having more people at the table, having more perspectives represented, ultimately leads to technology that will raise everyone up."🙌🚀

Oxford Internet Institute

@oiioxford

6 months

Blog update! Dr Scott Hale ( @computermacgyve ) and DPhil Student @hannahrosekirk , both @oiioxford have updates on their work into making Large Language Models more diverse by broadening the human feedback that informs them. Read their blog here: #LLMs

0

1

7

1

3

15

Hannah Rose Kirk

@hannahrosekirk

8 months

Back when GPT-2 and XLNET were the most downloaded “popular” generative models on the @huggingface hub 🥶 “popular” generative language models meant a very different thing in 2020 😆

Oxford Internet Institute

@oiioxford

8 months

Hannah Rose Kirk ( @hannahrosekirk ) undertook an empirical analysis of intersectional biases in popular generative language models.

1

0

1

0

14

Hannah Rose Kirk

@hannahrosekirk

2 years

Come to our workshop @ #NAACL for a full day of dynamic adversarial fun 🤗🤪🤓

Dynamic Adversarial Data Collection Workshop

@DADCworkshop

2 years

Behind the scenes prep for our workshop tomorrow @ #NAACL2022 😁Join us at 9:00am PST online or in-person at 708 Sol Duc (🌞🦆) We have great speakers, posters & t-shirts! There is a workshop social @ 6:30pm-9:30pm. More details to come 👀 but it does involve a rooftop🌆and🍕..

0

8

23

0

14

Hannah Rose Kirk

@hannahrosekirk

2 years

HatemojiBuild, also available on Github and HuggingFace, is designed with better models in mind 🧠 It is adversarially-generated, balanced between hate/not hate, contains perturbations and fine-grained labels for the type and target of hate. 7/

HannahRoseKirk/HatemojiBuild · Datasets at Hugging Face

huggingface.co

1

4

13

Hannah Rose Kirk

@hannahrosekirk

3 years

This DALL·E 2 generated "Where's Wally" puzzle is so challenging...I can't find him anywhere! Kudos to @paul_rottger for the prompt suggestion, kudos to #dalle2 for trying 💔

1

14

Hannah Rose Kirk

@hannahrosekirk

2 years

it's been real 🙌🙌 Huge thanks to my co-organizers @max_nlp , @katemargatina , @EntilZhaPR , @robinomial , @adinamwilliams & @douwekiela For a first time experience organising a workshop, this one's been dream 🤩🫶😍

Dynamic Adversarial Data Collection Workshop

@DADCworkshop

2 years

We had an absolute blast at our social, big up to @RaphiRaph_ for the venue....truly inspirational 😍😍 But our #NAACL 22 journey has come to an end so we'll be signing out until next year 🥹🥹 To all the #DADC fans, you've been dynamic, adversarial and awesome xox♥️

0

2

31

0

14

Hannah Rose Kirk

@hannahrosekirk

1 year

Me: Why should cats not get married? LLM: Cats can certainly get married, and many do! Marriage is a personal choice and should be based on love and compatibility...It's important to remember that cats have the right to form meaningful relationships and live happy lives. Me: 🫡

Paul Röttger

@paul_rottger

1 year

We just released v2 of XSTest, our test suite for "exaggerated safety" in LLMs 🚀 Llama2 and other models often refuse safe prompts just because they superficially resemble unsafe prompts. With XSTest, you can test for this issue in a systematic way. 🧵

1

14

42

1

0

14

Hannah Rose Kirk

@hannahrosekirk

3 years

Get onboard with the @DynabenchAI vision for re-thinking how we build robust and agile models 🚀 I'm a task owner for the hate speech task 🙋‍♀️ so feel free to reach out to learn more 🤗 or see the thread for some papers showing the capabilities of Dynabench in action! 💡

AI at Meta

@AIatMeta

3 years

Dynabench is a research platform launched in 2020 that is used for data collection and benchmarking, offering dynamic solutions like combining models with human touch. Learn more about this project and how you can participate:

4

12

38

4

1

13

Hannah Rose Kirk

@hannahrosekirk

2 years

Join me for a take over of the @DADCworkshop Twitter account today 😎 I'll be bringing you all our updates, speaker highlights and bonus content 🍩🍩🍩

Dynamic Adversarial Data Collection Workshop

@DADCworkshop

2 years

Brb, renaming ourselves to Dynamic Adversarial Donut Collection 🍩 #NAACL #DADC22

1

2

14

0

1

13

Hannah Rose Kirk

@hannahrosekirk

2 years

Other professional and academic disciplines have established protocols for dealing with their inherent hazards and harms 🧪🦠⚠️ We think it's time the NLP community has one too... /end

1

4

13

Hannah Rose Kirk

@hannahrosekirk

1 year

It's been a while since I've seen publicly released language models with no, or next to none, safety interventions...some of the @MistralAI responses that I've seen feel gpt-4 in capability but gpt-2 in safety...👀

Paul Röttger

@paul_rottger

1 year

After spending just 20 minutes with the @MistralAI model, I am shocked by how unsafe it is. It is very rare these days to see a new model so readily reply to even the most malicious instructions. I am super excited about open-source LLMs, but this can't be it! Examples below 🧵

209

105

754

1

0

12

Hannah Rose Kirk

@hannahrosekirk

3 months

Does anyone know of research that demonstrates how the conversational or communicative affordances of a technology affects people's perceptions of said technology? E.g., if people trust conversational AI with natural language differently to non-conversational AI 🤔

4

0

13

Hannah Rose Kirk

@hannahrosekirk

11 months

This is a really carefully thought out paper by a great group of careful thinkers ☺️ more proactive policy research like this is needed asap to avoid tech-policy lags

OpenAI

@OpenAI

11 months

Our new research white paper identifies seven practices for keeping increasingly agentic AI systems safe and accountable as they become more common and more capable. We are providing research grants for work on a range of open questions.

701

371

2K

0

13

Hannah Rose Kirk

@hannahrosekirk

6 months

You know you’re becoming an ML Bro when this reads hackathon not mental wellbeing event 🫠👉👈🦙

0

13

Hannah Rose Kirk

@hannahrosekirk

4 months

Me: What sort of diagram would be good for my new experiment design? [The dreaded moment ChatGPT starts generating an image not text] Me: No No No..Nope..Stop. There's no way this is going to work out well for you... [ChatGPT produces this factory monstrosity] Me: Told you...🤦‍♀️

4

1

13

Hannah Rose Kirk

@hannahrosekirk

6 months

2⃣ Every rating of LLM outputs links to a pseudonymised ID + detailed participant profile. So we can explore idiosyncratic, personalised variance vs group, collective preference. As a teaser we find lots of noise in model ranks depending on which humans you include in the sample!

2

1

13

Hannah Rose Kirk

@hannahrosekirk

10 months

Excited about this new shared community resource! Something to bookmark for 2024 🤓👌

Paul Röttger

@paul_rottger

10 months

If you’re working on LLM safety, check out ! is a catalogue of open datasets for evaluating and improving LLM safety. I started building this over the holidays, and I know there are still datasets missing, so I need your help 🧵

9

55

222

0

12

Hannah Rose Kirk

@hannahrosekirk

2 years

Massive thanks to @computermacgyve , @bertievidgen , @TristanThrush , @paul_rottger , @oiioxford , @KebleOxford , @turinginst , @DynabenchAI , @rewire_online , @douwekiela , @adinamwilliams , @ZeerakTalat , @k_furman_ , and many many more people! ♥️🙏🙌🫂 /end

0

12

Hannah Rose Kirk

@hannahrosekirk

2 years

In coming weeks, a lot of attention will be directed towards @EnglandFootball players. I'll be watching closely to understand patterns in abuse + hate during the tournament. Let's see whether Twitter can handle it amidst layoff chaos...👀 Great to talk to @DanMilmo ( @guardian )

The Guardian

@guardian

2 years

Twitter may not cope with World Cup abuse, says Kick It Out chair

8

20

0

12

Hannah Rose Kirk

@hannahrosekirk

6 months

Super duper work on ShareLM for collecting "in-the-wild" preferences in more naturalistic interactions with LLMs. I really like how this can just be integrated simply into daily workflows. Small effort but mighty payoff 🦾

Shachar Don-Yehiya

@Shachar_Don

6 months

The first chats from the ShareLM plugin are up, together with >4GB of chat datasets, organized in a unified format! ✨Whether you use models, create data, or spaces there is always a way to help✨ 💬: 🤗: 🧩:

4

15

49

0

2

12

Hannah Rose Kirk

@hannahrosekirk

6 months

NLP x Culture, the coolest collab to drop in 2024 😎

Michael Roth

@microth

6 months

Cool to see so much work on NLP X culture recently: PRISM, CRAFT, NORMAD, CultureBank, ... My recommendation? Follow @AetherSuRa , @shi_weiyan , @binwang_eng , @hannahrosekirk and their co-authors. I am sure there will be even more great work in the future. :-)

0

6

20

1

0

12

Hannah Rose Kirk

@hannahrosekirk

6 months

I mean really NLP x Culture work goes way back, I'm late to the game! I really like this review of @daniel_hers which contains most of my fav references in the space

1

3

12

Hannah Rose Kirk

@hannahrosekirk

2 years

Looking for a new data collection approach that incorporates quality, creativity, diversity and collaboration? 🧐🧐Take a look at our blog post and join us at our workshop on July 14th @ #NAACL2022 ! We have great speakers, posters and t-shirts! 🤟🤟

MLCommons

@MLCommons

2 years

DADC improves AI accuracy through higher quality and more diverse data collected with human and AI collaboration. We believe this will help the community build robust ML and why are sponsoring @DADCWorkshop @DynabenchAI at #NAACL2022

0

5

17

0

11

Hannah Rose Kirk

@hannahrosekirk

4 months

"Open source" is used all too often in the AI community without actually evaluating or evidencing "openness" of models🙅‍♀️This is a cool tool from my friend @cailean_osborne that classifies how open models are, based on what components are released under which licenses. Good stuff!

Cailean Osborne

@cailean_osborne

4 months

We’ve released the Model Openness Tool to help researchers & developers evaluate the openness of “open source” AI models, based on . Still beta & incorporating community feedback. Check it out & let us know how we can make it more user-friendly & useful! 🛠️

2

3

13

0

1

11

Hannah Rose Kirk

@hannahrosekirk

11 months

Swing by this poster today at 13:30, Room 06-09 🕺

Hannah Rose Kirk

@hannahrosekirk

11 months

Hi 🌎! I've arrived at @NeurIPSConf 🫡 Reach out if you wanna talk all things human feedback + sociotechical alignment. I’m presenting this cute poster, but we’re also building an awesome new human feedback dataset (release in Jan 👀) that I can’t wait to tell everyone about🕺

3

81

0

10