Life update: Absolutely thrilled to share that I'll be joining
@UCLA
as an Assistant Professor in summer 2025, with joint appts in Computational Medicine (
@CompMedUCLA
) and Computer Science (
@CS_UCLA
)! Immensely grateful to everyone who has been part of this journey.
🚀 It was awesome meeting up with old pals and making new connections at my first in-person
#NeurIPS
! Had some fantastic conversations and learned a ton. Already looking forward to next year!
(📸 with the legend
@ylecun
himself! 🌐)
#NeurIPS2023
#ML4H2023
Why could medical imaging AI be biased across different demographic groups?🤔
Excited to share our latest
@NatureMedicine
paper on revealing the potential cause of bias, and how to dissect & improve fairness ID & OOD.💡
📄
💻
(1/n)
When tested across tasks, diseases and imaging modalities, performance of
#AI
models depends on encoding of demographic shortcuts and correcting for them decreases their ability to generalize in new populations.
@MarzyehGhassemi
@yang_yuzhe
@MIT_CSAIL
Introducing our
#ICML2023
paper: 𝐒𝐮𝐛𝐩𝐨𝐩𝐁𝐞𝐧𝐜𝐡🧑🤝🧑
- a fine-grained analysis on subpopulation shift;
- a living PyTorch benchmark with datasets & algos for subpopulation shift!
paper:
code:
website:
I'll be at
#ICML2023
next week! More info🧵👇 Let’s☕️💬
I'm also on the faculty market in fall 2023! I work on fair & robust ML to advance health, disease & medicine.
My research:
If I could be a good fit for ur department, pls reach out! RT appreciated
🔥Happy to share that SimPer (simple self-supervised learning of *periodic* targets) has been accepted as a "notable-top-5%" paper (Oral presentation) to
#ICLR
!
Stay tuned for the paper + code updates :)
@iclr_conf
#ICLR2023
#ICLR
📢Check out our latest work on self-supervised learning of *periodic* information from data!
We present SimPer, a simple SSL regime for learning periodic targets. w/
@xliucs
, Jiang, Silviu, Dina, Ming,
@danmcduff
. Thanks
@_akhaliq
for sharing!
🗣️Excited to share our
#ICML2023
workshop on "Interpretable Machine Learning in Healthcare"!
Looking forward to exploring the potentials and challenges in interpretable medical AI! We also provide *Travel* & *Best Paper Awards*! Join us in Hawaii 🏝️
CFP:
Excited to share our
#ICCV2023
workshop on "Computer Vision for Automated Medical Diagnosis"!
Looking forward to exploring the potentials in medical diagnosis with AI! We co-host the CXR-LT challenge () and invite submissions!
CFP:
Excited to share our latest publication in
@NatureMedicine
! 🎉 Proud to have been part of this incredible team effort.
We study the disparity and fairness in AI models for computational pathology, exploring a variety of modeling strategies. Check it out below! 👇
⚡️🔬📣Excited to share our new
@NatureMedicine
article, examining disparities in pathology AI models, assessing how modeling choices impact disparities, and evaluating the potential of self-supervised foundation models in mitigating these disparities.
See
How to learn effective representations for periodic targets in a self-supervised manner? 🌎🌍🌏
#ICLR
#ICLR2023
Wednesday 11am CAT,
@danmcduff
will present our paper SimPer at [Oral 5 Track 1]!
Poster session right afterwards at 11:30am at [MH1-2-3-4],
#146
. Check it out!
🚨Announcing CVPR 2023 workshop on Computer Vision for Physiological Measurement🚨
We hope to bring together the CV and health sensing communities, discuss the opportunities, challenges & latest advances for human health sensing with CV/ML.
@CVPR
#CVPR
Thanks
@MIT_CSAIL
@csail_alliances
for covering my research!
Feel free to check out if you are interested in how AI can help detect Parkinson’s before clinical diagnosis.
.
@MIT_CSAIL
PhD student
@yang_yuzhe
’s research lies at the intersection of machine learning and applications in human disease, health and medicine.
Learn more about Yang's current research in this month's Spotlight:
#IMLH
“Interpretable ML in Healthcare” workshop
@icmlconf
is happening now @ Ballroom C!
We have a great line of speakers and exciting program today. Come and explore the advances in interpretable ML for health!
#ICML
#ICML2023
[
#ICML2021
Long Oral] 📢📢 Introducing "Delving into Deep Imbalanced Regression"
Imbalanced classification is well studied. But how about tasks with continuous targets? We formally study Deep Imbalanced Regression (DIR) arising in real-world settings.
📢 Call for Participants 📢
Join *CXR-LT*, a competition for multi-label, long-tailed diagnosis on chest X-rays! Top teams may present their solutions for publication at our
#ICCV2023
CVAMD workshop ()!
Join here:
#ICCV
CVAMD workshop
@ICCVConference
is happening now @ s01!
We have a great line of speakers and exciting program. We also have a live Zoom for remote attendees:
Come and explore the advances in medical diagnosis with AI!
#ICCV2023
Change is hard, but necessary when building safe & equitable ML models.
MIT researchers analyze subpopulation shifts, showing how ML models have performed poorly w/underrepresented subgroups in health care & offering new strategies to mitigate the issue:
SimPer is a self-supervised contrastive framework for learning periodic information in data, while improving data efficiency and generalization to distribution shifts. Check it out and copy the code →
Learning imbalanced / long-tailed dataset? Check out our
#NeurIPS2020
paper! We show theoretically and empirically that, both *semi-supervised* & *self-supervised* learning can substantially improve the performance on imbalanced datasets.
Wow! Glad to learn that our recent 2 papers in Parkinson's disease (detection, tracking progression & medication response) are featured by
@NatureMedicine
Year in Review 2022!
@AIHealthMIT
#Parkinsons
#Health
A thread 🧵 👇
How to learn *continuous* representations for regression tasks?
Check out our
#NeurIPS2023
Spotlight paper - Rank-N-Contrast (yes, as the name suggests, you first rank, then contrast!). More details👇
Excited to share our new work published at
@NatureMedicine
, which uses AI to advance health! We developed a machine learning system that detects Parkinson's disease and its severity from nocturnal breathing.
🎥 New Talk at
@MedaiStanford
on AI for Parkinson's disease! Check it out to see how equitable () & generalizable () ML algos enable this advance for human disease.
Learn more 👉:
Check out my recent talk on using AI to diagnose and assess Parkinson's disease! I also discussed the core ML breakthroughs in this project, DIR (ICML'21 long oral: ) & MDLT (ECCV'22: ).
More information:
I'm really grateful to my advisor
@dina_katabi
, mentors (
@MarzyehGhassemi
@HaoGarfield
and many others), collaborators, colleagues, friends, and family, for all your help and support along this journey!
We establish the fundamental tradeoff between worst-group accuracy (WGA) and important metrics such as worst-case precision - an "Acc. on the inverse line" phenomenon.
This highlights the need to rethink evaluation metrics in subpopulation shift beyond WGA.
8/n
🧠Excited for
#ML4H
Research Roundtable! We want your input in brainstorming session topics. From clinician-AI interaction, foundational models, multimodal learning, privacy, security, and more. Help us shape these discussions by filling out this survey:
Interested in how neural network architectures would influence adversarial robustness? Check out our
#CVPR2020
paper on investigating and designing adversarially robust network architectures using NAS!
Paper:
Code:
Finally, pls also check out
@MarzyehGhassemi
's invited talk at
#ICML
, "Taking the Pulse Of Ethical ML in Health":
⏰ July 25 (Tue), 9:15 am - 10:30 am HST
📍 Exhibit Hall 2 & 3
--
DM/email/ping me to☕️💬research, opportunities and more! Hope to see you there! 🔥
#ICML
#ICML2023
How to learn imbalanced data arising from multiple domains? Does in-domain data imbalance influence out-of-domain generalization?
Check out our
#ECCV2022
paper for in-depth analysis + intriguing property of multi-domain imbalance!
@tengyuma
Interesting paper! FYI, our NeurIPS 2020 paper () also demonstrated (theoretically + empirically) that self-supervision improves class-imbalanced learning. Glad to see more works on tackling this problem!
ML models often perform poorly on *subgroups* that are underrepresented in training. But, what mechanisms cause subpopulation shifts, and how algorithms generalize across such diverse shifts?
2/n
🗣️Excited to share our
#ICML2023
workshop on "Interpretable Machine Learning in Healthcare"!
Looking forward to exploring the potentials and challenges in interpretable medical AI! We also provide *Travel* & *Best Paper Awards*! Join us in Hawaii 🏝️
CFP:
Excited to share that our
@ScienceTM
paper is selected by
@TheLancetNeuro
as one of 10 crucial advances in Parkinson's disease and other movement disorders among more than 14,000 published papers in 2022!
We first propose a unified framework that dissects and explains common shifts in subgroups. This leads to 4 basic types of subpopulation shift:
- spurious correlations,
- attribute imbalance,
- class imbalance, and
- attribute generalization
3/n
@ICCVConference
Shout out to our great line of speakers today!
@suchop
kicked off today's workshop with holistic views + insights in medical imaging diagnosis!
While successful algorithms rely on the access to group information for model selection, a simple criterion based on *worst-class accuracy* is surprisingly effective even without any group-annotated validation data.
7/n
Representation & classifier quality play different roles under different shifts. Methods that decouple rep. and clf. are more effective for spurious correaltions (DFR) & class imbalance (CRT), but do not bring benefits for other shifts (which might require better rep.).
6/n
The second paper published at
@ScienceTM
, shows that a wireless device could track Parkinson’s progression & medication response at home.
Learn more 👉:
More than 40% of people w/ Parkinson’s are never treated by a neurologist or specialist, often because they live too far from a city or have difficulty traveling. Now, a router-like device could change how we track Parkinson’s progression &... (1/3)
With Haoran,
@dina_katabi
&
@MarzyehGhassemi
, we'll present "Change is Hard: A Closer Look at Subpopulation Shift":
⏰ July 27 (Thu), 10:30 am - 12:00 pm HST
📍 Exhibit Hall 1, poster
#414
Introducing our
#ICML2023
paper: 𝐒𝐮𝐛𝐩𝐨𝐩𝐁𝐞𝐧𝐜𝐡🧑🤝🧑
- a fine-grained analysis on subpopulation shift;
- a living PyTorch benchmark with datasets & algos for subpopulation shift!
paper:
code:
website:
We then establish a comprehensive benchmark of 20 SOTA algorithms evaluated on 12 real-world datasets in vision, language, and healthcare.
W/ over 10K trained models, we make several intriguing observations:
4/n
This year we're giving out 4 best paper awards
#IMLH
!
2nd place best paper awards:
"An interpretable data augmentation framework for improving generative modeling of synthetic clinical trial data"
by
@afrahshafquat
, Jason Mezey, Mandis Beigi,
@jimeng
, Andy Gao &
@JacobAptekar
SOTA algorithms only improve subgroup robustness on *certain types* of shift (e.g., spurious correlations & class imbalance), but not others (attribute imbalance & generalization)!
5/n
🚨Announcing CVPR 2023 workshop on Computer Vision for Physiological Measurement🚨
We hope to bring together the CV and health sensing communities, discuss the opportunities, challenges & latest advances for human health sensing with CV/ML.
@CVPR
#CVPR
The first paper published at
@NatureMedicine
, where I am the 1st-author, demonstrates an AI-based system that can detect PD & its severity just from one's nocturnal breathing signal.
Learn more 👉:
Parkinson's is notoriously difficult to diagnose.
@MIT
#JameelClinic
PI Dina Katabi & her team developed an AI-powered, router-like device that can detect the severity & progression of someone's Parkinson's just from their breathing. MIT News: (1/3)
You can submit (1) long papers up to 8 pages, or (2) extended abstracts up to 4 pages. We provide best paper awards for both types of submissions!
Submission deadline: May 30, 2023.
CFP and submission instructions:
More than 40% of people w/ Parkinson’s are never treated by a neurologist or specialist, often because they live too far from a city or have difficulty traveling. Now, a router-like device could change how we track Parkinson’s progression &... (1/3)
How to find "globally optimal" models that maintain performance & fairness in new domains?
We found that model selection could be crucial for OOD fairness. Choosing models with embeddings that contain the least attribute info could lead to a lower average OOD fairness gap. (5/n)
We have open-sourced our codebase and benchmarks for MDLT + Imbalanced DG. Make sure to check out our
@PyTorch
code, with pre-trained models and datasets available!
Interesting perspective based on Fourier analysis. I would like to also note another spectrum view: the *low-rank* property, which is quite related and can be also effectively exploited for adversarial robustness: . (1/2)
This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena.
However, local fairness doesn't transfer under distribution shift. We provide methods to understand and quantify the types and degrees of these shifts.
Models on the Pareto front for ID do not guarantee optimality when deployed in a different OOD setting. (4/n)
Mitigating such shortcuts with debiasing methods like resampling or adversarial training creates **locally optimal** models --- these models consistently achieve high ID fairness without losing notable overall performance for disease prediction. (3/n)
We will present our
#NeurIPS2020
paper today
@NeurIPSConf
(C0-D1, 9pm-11pm PST)! Come to chat with us about improving imbalanced learning with semi-/self-supervision!
Poster session:
Learning imbalanced / long-tailed dataset? Check out our
#NeurIPS2020
paper! We show theoretically and empirically that, both *semi-supervised* & *self-supervised* learning can substantially improve the performance on imbalanced datasets.
Curious about how to learn imbalanced data arising from multiple domains? Check out our
#ECCV2022
paper! We also observe intriguing phenomena that addressing in-domain data imbalance improves out-of-domain generalization.
Thanks
@_akhaliq
for sharing!
We curate benchmarking DIR datasets for common real-world tasks in computer vision, natural language processing, and healthcare. They range from single-value prediction such as age, text similarity score, health condition score, to dense-value prediction such as depth.
We explored unfairness through _demographic shortcuts_, and extended the observations that algorithmic encoding of attributes leads to fairness gaps.
We further investigated the degree to which demographic attribute encoding ‘shortcuts’ may impact model fairness. (2/n)
We formulate the problem of Multi-Domain Long-Tailed Recognition (MDLT) as learning from multi-domain imbalanced data, with each domain having its own imbalanced label distribution, and generalizing to a test set that is balanced over all domain-class pairs.
Inspired by this, we design BoDA, a theoretically grounded loss function that tracks the upper-bound of transferability statistics to improve the model performance.
We first propose the domain-class transferability graph, which quantifies the transferability between different domain-class pairs under data imbalance. We show that the transferability graph dictates the performance of imbalanced learning across domains.
Early explorations () mainly focused on relating (lower-ranked) principal components to robustness. Recent advances () also provide certified robustness via exploiting low-rank structures. (2/2)
Existing methods for dealing with data imbalance are only for single domain, that is, the data originates from the same domain. However, natural data can originate from distinct domains, where a minority class in one domain could have abundant instances from other domains.
Interestingly, we also found that addressing in-domain data imbalance improves out-of-domain generalization. Our analysis showed that data imbalance is an intrinsic problem in out-of-distribution generalization, but has yet been overlooked by past works.