dorazhao9 Profile Banner
Dora Zhao Profile
Dora Zhao

@dorazhao9

Followers
576
Following
485
Media
8
Statuses
90

CS PhD @Stanford. Previously @SonyAI_global and @PrincetonCS @VisualAILab. (she/her)

Stanford, CA
Joined February 2018
Don't wanna be here? Send us removal request.
@dorazhao9
Dora Zhao
7 months
New #ICML2024 position paper. Many ML datasets report to hold properties such as “diversity” but often fail to properly define or validate these claims. We propose drawing from measurement theory in the social sciences as a framework for diverse dataset collection.
Tweet media one
7
23
129
@dorazhao9
Dora Zhao
6 months
excited that we won Best Paper at #ICML2024! come check out my talk tomorrow morning in the Data and Society session 😄.
@dorazhao9
Dora Zhao
7 months
New #ICML2024 position paper. Many ML datasets report to hold properties such as “diversity” but often fail to properly define or validate these claims. We propose drawing from measurement theory in the social sciences as a framework for diverse dataset collection.
Tweet media one
21
26
205
@dorazhao9
Dora Zhao
1 year
What visual cues are correlated with gender in image datasets? Basically everything!. In our #ICCV2023 work, we explore where gender artifacts arise in visual datasets, using COCO and OpenImages as a case study.
Tweet media one
2
10
95
@dorazhao9
Dora Zhao
4 months
Excited to be presenting this work as an oral at #NeurIPS Datasets and Benchmarks in Vancouver!! ✌️.
@dorazhao9
Dora Zhao
8 months
There’s been a significant push to curate fairer and more responsible ML datasets, but what are the practical aspects of this process? 🤔. In our new study, we interviewed 30 ML dataset curators who have collected fair vision, language, or multi-modal datasets. 🧵
8
6
64
@dorazhao9
Dora Zhao
8 months
There’s been a significant push to curate fairer and more responsible ML datasets, but what are the practical aspects of this process? 🤔. In our new study, we interviewed 30 ML dataset curators who have collected fair vision, language, or multi-modal datasets. 🧵
1
9
42
@dorazhao9
Dora Zhao
1 year
Do you work with NLP or CV datasets? Are you actively and directly involved in collecting or maintaining fair datasets? . If you answered "yes" and you’re interested in doing a paid ($75) 45-60 min interview, please fill out
3
3
29
@dorazhao9
Dora Zhao
2 years
Excited that our work on teenager perceptions and configurations of privacy on Instagram will appear at #CSCW2022! . arXiv preprint: Short thread 🧵:.
1
2
17
@dorazhao9
Dora Zhao
2 years
Excited to be sharing our paper "Men Also Do Laundry: Multi-Attribute Bias Amplification" that I worked on with Jerone Andrews and @alicexiang at #ICML2023. arXiv link:
@SonyAI_global
Sony AI
2 years
The Sony AI team has several papers accepted at this year's @icmlconf. Find our latest work, as well as information on recruitment sessions, here: . #ICML2023.
1
1
15
@dorazhao9
Dora Zhao
7 months
This work was done in collaboration with Jerone Andrews, @SciOrestis, @alicexiang . I will be in Vienna presenting this paper and hope to chat with anyone interested questions around data collection + sociotechnical systems. 📜:
1
1
11
@dorazhao9
Dora Zhao
8 months
This work is a culmination of an awesome team effort from @morganklauss, @Pooja_Chitre, Jerone Andrews, @geodotzip, @walkeroh, @khpine, and @alicexiang!. Check out the full preprint here:
0
0
10
@dorazhao9
Dora Zhao
3 years
@nicole__meister and I presented our poster on "Gender Artifacts in Visual Datasets" @WiCVworkshop. The paper is available on arXiv (.
1
2
9
@dorazhao9
Dora Zhao
3 years
🧵rounding up all of the exciting things from #CVPR22 this past week.
1
1
8
@dorazhao9
Dora Zhao
8 months
We developed a taxonomy of pervasive challenges throughout the dataset lifecycle and the broader landscape of fair ML dataset creation.
1
0
7
@dorazhao9
Dora Zhao
8 months
We identified 5 levels in the broader fairness landscape impacting dataset curation. One persistent challenge is the undervaluation of fair dataset work, which affects participants’ ability to collect and maintain datasets.
Tweet media one
1
0
7
@dorazhao9
Dora Zhao
8 months
Our participants defined fairness in dataset curation across three dimensions:.1. The composition of the data (e.g., representation across groups).2. The process in which the data was collected (e.g., labor practices).3. The release of the dataset (e.g., documentation practices).
1
0
6
@dorazhao9
Dora Zhao
3 years
"Quantifying Societal Bias Amplification in Image Captioning" by @hirota_yusuke, Yuta Nakashima, @noagarciad . Really interesting work extending bias amp metrics to vision + language systems! (
Tweet media one
0
1
6
@dorazhao9
Dora Zhao
8 months
We uncovered many challenges across the lifecycle. For example, creating a fair taxonomy is challenging due to the inherent unfairness of categorization. Practical constraints (e.g., domain norms) forced participants to make decisions conflicting with personal beliefs.
1
0
6
@dorazhao9
Dora Zhao
7 months
Finally, highlighting some awesome work done applying measurement theory to ML in the thread. 1⃣“Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets” Blodgett et al.
1
1
5
@dorazhao9
Dora Zhao
7 months
We survey 135 datasets and highlight key concerns.1⃣ Lack of concrete defn. conflating diversity with other constructs (e.g., scale).2⃣ Documentation gaps around collection + quality make validation hard.3⃣Downstream evals capture different constructs than the dataset's claims.
1
0
5
@dorazhao9
Dora Zhao
8 months
A key takeaway is the need for systemic interventions at the disciplinary, organizational, and regulatory levels to support fair data curation. Currently, many suggestions / best practices focus on individuals, but they can only do so much when facing institutional obstacles.
1
1
5
@dorazhao9
Dora Zhao
8 months
What challenges and trade-offs happen behind-the-scenes? . Participants discussed how power differentials shape datasets. This includes the influence of elite institutions / companies, the predominance of Western perspectives, and control curators have over data workers’ pay.
1
0
5
@dorazhao9
Dora Zhao
7 months
4⃣“It takes two to tango: Navigating conceptualizations of NLP tasks and measurements of performance” Subramonian et al.
0
2
5
@dorazhao9
Dora Zhao
7 months
3⃣ “Evaluating evaluation metrics: A framework for analyzing NLG evaluation metrics using measurement theory” Xiao et al.
1
1
5
@dorazhao9
Dora Zhao
7 years
not 👏 your 👏 china 👏 doll
@princetonian
The Daily Princetonian
7 years
OPINION | Assistant Opinion Editor Dora Zhao urges us to reject stereotypical sexual tropes.
Tweet media one
0
1
5
@dorazhao9
Dora Zhao
7 months
2⃣“Measurement and Fairness” Jacobs and Wallach
1
1
5
@dorazhao9
Dora Zhao
1 year
These findings caution against the use of fairness-through-blindness mitigation approaches which attempt to “remove” gender artifacts as they are likely to also information relevant for the downstream task.
1
1
4
@dorazhao9
Dora Zhao
3 years
"How Much More Data Do I Need?.Estimating Requirements for Downstream Tasks" by Mahmood et al. Some really awesome work on measuring what dataset size is needed for achieving performance on different downstream tasks. Paper:
Tweet media one
1
0
4
@dorazhao9
Dora Zhao
1 year
Shout-out to my amazing co-authors @nicole__meister, @ang3linawang, Vikram, @ruthcfong, and @orussakovsky from the @VisualAILab -- some of whom will be presenting this work in Paris next month 🥐🥖!.
1
1
3
@dorazhao9
Dora Zhao
1 year
Paper Link: We look at “gender artifacts,” visual cues correlated with gender, that (1) are learnable by an image classifier and (2) have an interpretable human corollary.
1
0
3
@dorazhao9
Dora Zhao
6 years
late to reposting about @CrazyRichMovie, but AAPI stories are always relevant so. 💁🏻‍♀️💁🏻‍♀️💁🏻‍♀️.
@princetonian
The Daily Princetonian
6 years
OPINION | ZHAO. For many Asian-Americans, thE sense of statelessness and not belonging depicted in the movie "Crazy Rich Asians" is relatable, @dorazhao9 writes.
0
0
3
@dorazhao9
Dora Zhao
1 year
@VisualAILab Yay congrats!!! 🪩🍾.
0
0
1
@dorazhao9
Dora Zhao
8 months
@ang3linawang @CornellInfoSci @cornell_tech @StanfordHAI @sanmikoyejo Congrats Angelina 🎉🪩!!!! Excited to see you around Stanford next year.
1
0
2
@dorazhao9
Dora Zhao
5 years
And for anyone interested in technology but especially for computer scientists / developers, two more books to read are Race After Technology by Ruha Benjamin and Algorithms of Oppression by Safiya Noble!!.
@victoria_phd_
Victoria Alexander
5 years
I’ve been getting a lot of questions from my non-Black friends about how to be a better ally to Black people. I suggest unlearning and relearning through literature as just one good jumping off point, and have broken up my anti-racist reading list into sections:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
2
@dorazhao9
Dora Zhao
1 year
Some artifacts that we surfaced:. (1) Color: Using only the mean RGB values, a logistic regression model can learn to differentiate between images of males vs females above random chance
Tweet media one
1
0
2
@dorazhao9
Dora Zhao
1 year
@iamrashminagpal Yep! We have a Github repo ( that we are in the process of updating.
1
0
2
@dorazhao9
Dora Zhao
2 years
We will also be sharing "Principlism Guided Responsible Data Curation" which provides ethical considerations and recommendations for human-centric CV datasets at the DMLR workshop. arXiv link:
0
0
1
@dorazhao9
Dora Zhao
6 months
@HelenasResearch 🫶🫶.
0
0
1
@dorazhao9
Dora Zhao
2 years
Finally, thanks to my collaborator Mikako Inaba, advisor @andresmh, classmates in COS 597I: Social Computing, and of course all of our participants!.
0
0
1
@dorazhao9
Dora Zhao
7 years
Because saying “you’re pretty for an Asian girl” is not a compliment. A letter to Asian girls via @Honi_Soit.
0
0
1
@dorazhao9
Dora Zhao
1 year
This project aims to understand the properties of fair datasets used to train and/or evaluate machine learning algorithms and to describe approaches used to create such fair datasets.
1
0
1
@dorazhao9
Dora Zhao
9 months
@morganklauss Congrats Morgan!! Super excited to read 🤩.
0
0
1
@dorazhao9
Dora Zhao
1 year
(2) Size and location: Just the bounding box around the person occluded with white pixels against a black background (MaskRect NoBg) is sufficient info for a classifier to learn gender artifacts
Tweet media one
1
1
1
@dorazhao9
Dora Zhao
1 year
If you have any questions or for more information about the research study, please contact the research team at: Dr. Shawn Walker (shawn.w@asu.edu), Dr. Kathleen H. Pine (khpine@asu.edu) or Pooja Chitre (pnchitre@asu.edu).
0
0
1
@dorazhao9
Dora Zhao
4 years
@karen_ying_ great before but love it even more now!!.
0
0
1
@dorazhao9
Dora Zhao
2 years
Some interesting findings include teenagers using emojis in their bio as a form of social steganography, bookmarking to like public posts that may be more “controversial,” and navigating privacy across different social media platforms.
1
0
1
@dorazhao9
Dora Zhao
10 months
@cqzou @Stanford @NSF congrats!!! 🫶🫶🫶.
0
0
1