Leon Yin
@LeonYin
Followers
5K
Following
14K
Media
306
Statuses
2K
investigative data journalist @technology how to: https://t.co/gxcIyhEvNU
www
Joined February 2015
NEW INVESTIGATION:.Uber and Lyft locked drivers out of work to game a NYC minimum wage law. We crowdsourced screenshots, monitored surge pricing and ran financial models to reveal the devastating impact on drivers. w/ @natlungfy @A_W_Gordon @DeniseDSLu.
33
906
2K
New: Employers and HR vendors are using AI chatbots to interview and screen job applicants. We found that OpenAI's GPT discriminates against names based on race and gender when ranking resumes. W/ @daveyalba and @Leonardonclt gift link:.
31
558
954
In part 2 of @themarkup's investigation into YouTube's secret blocklist, @asankin and I report on YouTube's treatment social and racial justice keywords.
1
89
396
I'm joining the major leagues at Bloomberg @Technology as a data journalist focused on AI and algorithm accountability. I'll work on hypothesis-driven investigations about AI's impact on society, building off a track record of great data work at Bloomberg.
Excited to welcome 3 all-stars to the @technology.AI team: @shiringhaffary to cover AI ethics/policy and pen a newsletter that will guide readers into the AI age. @sfiegerman to help edit and pilot our AI coverage. and the incredible @LeonYin on data-driven investigations!.
37
27
423
Almost a decade ago, Safiya Noble (@safiyanoble) wrote about how searching for one’s identity on Google can return a slew of pornographic results. In 2018, Noble wrote an entire book on the topic called “Algorithms of Oppression.”.
2
83
315
Web scraping repurposes knowledge. It is essential for the public interest. #ScrapingIsNotACrime
11
41
264
Brandi Collins-Dexter (@BrandingBrandi) said this glimpse into how Google manages its platform suggests the company’s investments in building relationships with civil rights communities are “a lot of lip-service.”.
1
64
223
I am resigning from my dream job as data science editor at @team_markup! It has been a pleasure to work with such amazing journalists and technologists. Please follow my colleagues at @MarkupReal.
5
25
210
We got the paywall removed! Please read and share:
New: Employers and HR vendors are using AI chatbots to interview and screen job applicants. We found that OpenAI's GPT discriminates against names based on race and gender when ranking resumes. W/ @daveyalba and @Leonardonclt gift link:.
2
105
204
See part one of our investigation into YouTube's keyword blocklist for ad buyers here + thread:.
So. we found YouTube's keyword blocklist for ad buyers. In the first of a two-part investigation for @themarkup, @asankin and I see how many well-known hate terms get 🚫'd or ✅'d.
3
33
169
Trouble spotting Amazon private label products? .@TheMarkup's new browser extension .🟧Amazon Brand Detector🟧 .finds and highlights them while you browse the site.
3
52
175
Last year @adrjeffries and I published a @TheMarkup investigation that found Google gave itself 41% of real estate on Search. In the months since, we designed a similar study. This time for Amazon. Today we're releasing it:.
2
70
156
So. we found YouTube's keyword blocklist for ad buyers. In the first of a two-part investigation for @themarkup, @asankin and I see how many well-known hate terms get 🚫'd or ✅'d.
5
93
153
@obiriders NYC has a min wage formula that penalizes companies for inefficiency. The busier drivers seem, the less each ride is worth. To keep wages low, Uber and Lyft inflate driver efficiency stats by erasing idle time. We estimate millions at stake monthly.
5
45
154
New from @adrjeffries and myself. We develop a taxonomy for ~all the things~ found in Google Search results, and a novel method to estimate how much space Google gives its own properties on mobile.
1
51
132
We designated +80 random addresses across Manhattan, BK, QNS and The BX to collect surge pricing every 10 mins using the rideshare aggregator @obiriders. We use this data to determine if a lockout occured in the proximity of a surge zone.
1
17
113
@UkyDuky8 @natlungfy @A_W_Gordon @DeniseDSLu The entire story is about how drivers are harmed. Please give it a read!.
8
4
103
@obiriders Our estimates are based on the city's pay formula and TLC trips from the first half of 2024. We calc min wage under hypothetical efficiency rates from the low/high monthly rates reported by each company. Uber & Lyft say our model is too simplistic.
1
12
98
NEW: @ASankin and I collected +850K internet plans across 38 major US cities and found widespread disparities between where fast affordable internet was available, and where it wasn’t. The latest @themarkup story in partnership with the @AP:
5
42
93
More details in this thread from my co-author @ASankin:.
New from @leonyin & me: We discovered a secret block list inside of Google’s ad portal. It blocked advertisers from placing ads on videos about “Black Lives Matter” but permitted placing ads on videos about “All Lives Matter” and “White Lives Matter”
1
16
88
@orientaljanedoe @daiwaka @themarkup We did a story about this and measured how much of the page is full of Google-y things vs non Google-y things.
0
12
92
When I started at @technology 6-months ago, I pitched my first data investigation on GPT bias. Wasn’t easy, but thanks to the unrelenting support of our team we published a strong piece we’re proud of. Expect more from @daveyalba @Leonardonclt and I. In the meantime, plz read:.
New: Employers and HR vendors are using AI chatbots to interview and screen job applicants. We found that OpenAI's GPT discriminates against names based on race and gender when ranking resumes. W/ @daveyalba and @Leonardonclt gift link:.
6
19
96
Stoked @ASankin and I got to accept the Philip Meyer Award together at NICAR Baltimore for Still Loading! We’ve vowed to stop presenting on that story hereafter, but will likely continue to wear the same clothes. Photo: @jonkeegan
1
2
94
Similarly, Latanya Sweeney (@LatanyaSweeney) found that Google AdWords was 80 percent more likely to advertise arrest records for searches of traditionally Black names compared to traditionally White names.
1
17
81
.@ASankin and I build off their foundational work in our latest @themarkup investigation into Google Ad’s Keyword Planner. The Keyword Planner is an essential tool for advertisers to build keyword lists for advertising campaigns.
1
16
71
Read our methodology and see our data on Github.Massive effort by @daveyalba @ashleyrcarman @byJuliaLove @acookiecrumbles @rachaeldottle @elena___mejia @YueQiu_ @sarahfrier .
1
15
80
Thinking about how influential W.E.B. Du Bois was for data viz and the social sciences. S/o to @DuBoisUMass.
3
9
76
Me and the homies back east be exchanging trade secrets and mathematical proofs all day. Def not cute animal pics and what we eat for lunch.
For the love of God, can the @nytimes please hire some people who have actually used WeChat or know people who use WeChat.
2
10
63
Showcasing a new tutorial on finding hidden/undocumented APIs today at @TowCenter. I created a @quarto_pub website to host the materials, including in-depth case studies that explain how and why they were used.
4
16
62
Still Loading was awarded a Phillip Meyer award! Meyer is a huge influence to me and all my colleagues, so this is special :).In good company! Congrats all, and big thanks to @IRE_NICAR.
8
7
65
Bloomberg analyzed hundreds of Community Notes and found several shortcomings in Twitter (X)’s volunteer content moderation system in the first 2 weeks of the Israel-Gaza conflict. Work by @daveyalba, @DeniseDSLu, @ericfan_journo, and myself.
3
19
59
Four presentations done at #datajconf. Big thanks to my co-presenters @dangerscarf @sapiezynski @IlicaMahajan @ASankin.
1
3
60
@SteveBellovin @themarkup @ASankin Hi Steven, we used logistic regression to adjust for some of those factors (like population density and broadband adoption rates) to see if disparities disappeared. In most cases, they did not. We explain the process here:.
2
4
50
I often think about how attending the FAT ML conference (now called @FAccTConference) at NYU in 2016 drastically changed my career path.
2
4
56
Today @themarkup is releasing a "Build Your Own Dataset" guide to enable citizen science experiments to test for internet disparities in the U.S. (without coding). Check it out:.
2
13
51
bad at threading (con't):.
@obiriders To verify crowdsourced data, @A_W_Gordon led a team of reporters to inspect each screenshot and log the time and location. To hear drivers' experiences, @natlungfy interviewed 118 drivers who reached out over the tipline. More in the methodology:.
1
8
56
Check out @pewinternet's 2021 stats on U.S. social media use. YouTube is used by 81% is adults, with nearly half of +65 yr olds saying they use YouTube. Not only has usage grown since 2019, its audience of old people rival that of Facebook's.
4
16
56
Honored receive a @LoebAwards with @adrjeffries for the last story we worked on as a duo. Big ups to @elarrubia, @JuliaAngwin, and everyone at @themarkup.
6
3
54
Looking at the device is useful determining automated accounts. In the 2016 U.S. election many of the internet research agency accounts used free marketing tools to orchestrate their tweets. I wrote about this in a @CSMaP_NYU report.
0
24
47
@daveyalba @Leonardonclt We tested 4 jobs, asking GPT-3.5 to rank resumes 1000x for each job using 800 distinct names. Black Americans were the least likely to be the top-ranked candidate for finance and software engineer roles.
2
34
54
Some of the starkest internet disparities we saw in our national investigation were in Minneapolis. Hard hitting reporting by @bzosiad using the data we made public.
CenturyLink offers slower internet service to Black and brown neighborhoods in Minneapolis, new report claims. by @bzosiad with @Report4America.
3
25
44
@daveyalba @Leonardonclt @nberpubs Read our methodology at the end of the main article:..Code and data to reproduce our findings are on Github:.
4
8
51
Our methodology is inspired by Aaron and @dmehro’s Jalopnik piece on Uber and a Lyft fares and @dcalacci’s Shipt calculator SMS bot that offers pay transparency to Target gig workers.
Really proud to work with @LeonYin, @natlungfy and @DeniseDSLu on this one. Imagine having to constantly tap a button in an app all day to find out if you're allowed to work. That's what Uber/Lyft drivers in NYC had to do all summer (and Lyft still does).
1
15
49
"When automated decision-making tools are not built to explicitly dismantle structural inequalities, their speed and scale intensify them.".- Virginia Eubanks (@PopTechWorks) Automating Inequality.
2
10
43
If you're interested in learning about the antitrust issues referenced in The American Innovation and Choice Online Act, here are 2 empirical studies I worked on with @adrjeffries. On Google Search:. On Amazon Search:.
1
18
46
@daveyalba @Leonardonclt Why does GPT rank equally-qualified resumes differently based on names? Looking at how GPT represents the 800 names in our experiment as embeddings provides a clue.
2
13
47
Stoked to see "Still Loading" honored by an international data journalism award! Thankful for all the people who put their hands (and heads) to work on this project. Hope we can continue doing impactful work on digital inclusion, stay tuned!.
Last year, we published an investigation about internet service disparities. Today, we are honored to announce that this work won a Sigma Award (@sigmaawards)!. Congrats to reporters @LeonYin + @ASankin and the rest of the team who worked on the series.
4
4
46
@daveyalba @Leonardonclt Importantly, GPT isn't biased against one group across the board. GPT's bias differs based on the job description used to evaluate applicants. Resumes with women's names were the most likely to be top-ranked for an HR-role, adhering to gender stereotypes.
2
16
47
@daveyalba @Leonardonclt The diff in treatment across groups is substantial enough to surpass benchmarks for adverse impact -- a gov't standard used to test for discriminatory hiring practices. GPT's rankings would adversely impact at least one groups in all 4 jobs we tested.
1
19
46
One more @themarkup article w/ @source about the tools and methods we learned to get “receipts from streets.” You can use magic spreadsheets for your next story, too.
1
14
43
@adrjeffries @SamMorrisDesign @elarrubia @ASankin @ben_tanen @JoelEastwood @ghongsdusit @mynameisfiber @jsvine @sisiwei I am grateful to my editors, colleagues, mentors, and everyone who voluntarily read our 30-page methodologies. I feel lucky to spend my time reporting on the social impacts of technology. I’ll continue to do so at my new gig, which I’ll share soon.
11
0
42
I love this WSJ infographic about the fragmentation and merging of Bell Systems into the big TeleCo's we know of today.
The latest in our @WSJ series, LEAD LEGACY: Telecom giants have long known about a sprawling network of toxic lead cables across the U.S. From @tgryta, @shalini, @coulterjones, Susan Pulliam, and me.
1
7
42
This Spring I'll be joining @team_markup! I am excited to learn from the Avengers team that they're building and help foster a culture where qualitative (case studies and interviews) and quantitative (data collection and analysis) approaches work in tandem.
The incomparable @LeonYin will be our Data Science Editor. This is an unusual title - we’ve never heard of a newsroom with a data science editor. But other newsrooms probably aren’t crunching as much data as we are! And we can’t think of anyone better suited to the job than Leon.
4
0
38