Super excited to announce our latest work on modeling mRNA translation, reflecting a partnership between
@sanofi
and
@UTAustin
. Longer written description here of our 2 bioRxiv papers in LinkedIn (since I'm minimizing my use of X):
To the Cell editor who rejected >4 years of my work on miRNA target prediction because it didn't "[offer] a substantial advance for researchers who would use the approach": today I celebrate its 5000th citation published in the open access journal
@eLife
. Cheers 🍻
After 15 years in academia (1/2 my life), I'm off to continue my research in the land of biotech - call me when the incentive structures towards publishing are less perverse, science is fully open, and a healthy work-life balance is possible. Signing out in the meantime.🤘🏾🙂
Why is it that every time I attend a immunology talk the presenter assumes a general audience will know what a CD8+/CD28- T cell and Naive CD4+ T cells are? One of the most annoyingly jargon-filled fields I've ever encountered over 15 years of being a biologist 😱😱😱
In this post, I'd like to open up about one of the most traumatic experiences of my scientific career...one that played a major role in my decision to leave academia. I was triggered again by recent events. Before I begin, I want to reinforce that I accuse nobody of ill-intent...
Five years ago, I embarked on a journey as a bright-eyed postdoc to ask a theoretical question: how predictable are gene expression levels from DNA sequence alone? After ~2 years in review (incl. a world record >10mo to receive 1 round in CellReports)...
Super excited to finally release my work with
@drklly
@calico
! mRNA degradation rate is a fundamental property of its metabolism. It has been notoriously difficult to predict mRNA half-life from its sequence, and characterize factors regulating it. 1/4🧵
@DrBenNeel
@weldeiry
@eLife
Why care? Because I was a young impressionable trainee who was trained to think that success (career growth, grant/job opportunities) were tied to CNS pubs. And in 2022 flashy pubs still heavily influence those outcomes.
Reviewing a paper, I learned about something in RNA biology I'd never heard about in my entire life. Vault RNA (vtRNAs)! They form a ribonucleoprotein complex and are highly conserved across phyla. What the heck are these things and why haven't they been studied in 30 years???
Excited to announce our new behemoth of a model: the Enformer! Unified access to gene expression, TF binding, & chrom mark prediction; enhancer-promoter inference; & nc-variant effect prediction - all from a DNA sequence! Great collab with
@Avsecz
@drklly
Huge announcement! Later this month I'll be starting a new job
@sanofi
, leading a team to design therapeutic mRNA. I'll be hiring folks skilled in generative deep learning and adjacent fields -- please connect and send your CV. Full description here:
I've been working (longer than I care to admit!) on the challenging problem of predicting mammalian gene expression levels solely from genomic sequence. This work represents the furthest I could manage to get...so far!
Excited to announce one of our first works from team
@sanofi
: a large language model (LLM) called CodonBERT, trained on mRNA ORFs across 10M sequences from diverse organisms! Excellent collab between my team, Ziv Bar Joseph, Sven Jager, and others!
~2.5 yrs ago
@drklly
and I began to study the question of what is governing mRNA decay, and the degree to which it can be predicted from mRNA sequence. Extremely excited to announce that our pub is finally online! 1/5
in hopes that it will reform the review system and bring awareness to treat our colleagues with greater empathy by acknowledging their contributions. All of these things can have grave psychological consequences in ways that are easy to overlook on one's feeling of self-worth.
Excited for the publication of my latest work
@calico
, joint with Sereno,
@drklly
, &
@JShendure
! Single cell data harbors quantitative info often overlooked. We mined it to examine the dynamics of alternative polyadenylation in the entire mouse embryo!
A lot of folks asked me about if/when we're going to release the CodonBERT model from our preprint. Well -- ask no more :) We finally passed legal review and can finally go public with the model: -- feel free to use it, hope it's helpful to your work!
Bittersweet news: After 8 failed fellowship applications in my early postdoc years, I won my first major grant (K99/R00 from NHGRI). Having waited nearly a year to hear back and transitioning jobs, today I finally declined it 🤕 on bright side, not writing another for a while 😬
Interesting that
@NatureGenet
, who editorially rejected our Enformer work because it wasn't an advance, now publishes 3 papers benchmarking Enformer (because presumably it was considered an advance?)🤔
Tried my best to stay out of the single cell field... But alas, I could no more. Excited to announce our latest work on the severely understudied phenomenon of alternative polyadenylation in scRNA data. Joint work with Sereno*,
@drklly
, and
@JShendure
.
Again, neither it nor the new study mentioned my work at all. The Nature paper is fantastic, but it was hard not feeling that history of this field was trying to bury my role to make the article seem more novel. There is a huge amount of pressure in publishing to show novelty...
Convo I have privately with many postdocs: many are so depressed/anxious about uncertainty of the future, they lack a sense of self-worth. There are brilliant people here in industry eager to snag you up, who care about your skillset rather than publication count.
#MentalHealth
Tired of having to dig up and process every MPRA dataset from the literature? Check out our newly released MPRAbase! Please contribute your new data too. Years of hard work to get to finish line by great colleagues Jingjing Z,
@IGeoso
and
@NadavAhituv
🙂
Amazing discussion with international scientists at my poster at
#RNA22
in Boulder, CO last night. For those of you for whom it was prohibitive to attend, I'm releasing my poster and a prior recorded video on the subject!
Are you a graduating PhD student looking for postdocs? Interested in gene regulation, genomics, single-cell genomics, deep learning, and/or comp bio?
@drklly
and I are beginning our search and will co-mentor the successful candidate -- great time to apply!
meeting presentations proving the development of my independent work, they were satisfied with our response and retracted their claim. Anyways, it was an honor to be highlighted by Nat Rev Genetics, though kind of ironic that their subjournals had rejected
and at times, leaving out relevant references sells an idea as more novel. Of course, there are also completely arbitrary limits on references which further disincentivize referencing everyone. I do not believe the above reviews and articles intended to wash my work away.
All are from colleagues I know and respect and have communicated with in the past, and I'm sure they were aware of my work. It's just painful to witness history writing a misleading narrative about the field's development. It's also disappointing working so hard and passionately
Happy to announce our team's latest work! An update of Enformer, now predicts RNAseq density + chromatin marks from DNA seq. Improved variant effect pred for personal genomes and splicing/polyA now incorporated in a unified model! Amazing work led by
@jjohlin
and
@drklly
@calico
!
@anshulkundaje
Thank you. On the bright side, my self-worth (and pressure of grants/career growth on the line) is less tied to high-profile journals than it once was. I'm trying to readjust my brain to work on cool topics, publish it wherever, & hope that it's valuable to society in some way.
I'm feeling most of the bioinformatics coursework I took in college/grad school is now outdated: we were trained in problems that are now largely solved. Many schools still educating new generation in these antiquated tools, and skills aren't that useful (in industry at least)
on a topic, feeling the world ingested my ideas without citing my work, and seeing these deep learning techniques now become more popularized and widespread in their use. I'm proud to see these methods used more widely in top universities & journals, but it's hard not to grow...
To me,
#MLCB2023
represents what all conferences should be: showcasing some of the best research in the field and 100% free registration. Grassroots run and no $2K fees as a barrier to inclusivity going to undisclosed causes. Some can attend in-person, some can attend virtually!
who tells me to cite their work if it was relevant to mine. Hope we all grow to uplift one another moving forward, since we are in this game together as curious scientists who want to hopefully benefit the world through our work. I write this thread...
my work before. Eventually, I moved to industry to further extend my research ideas. I worked with excellent colleagues and we published an article in Nature Methods last year making progress on the problem together. Strangely, an article reviewing ours
somewhat cynical about the forces at play that have operated to de-emphasize my role in their development. Anyways, I try my absolute best to cite all relevant literature thoroughly & honestly. I probably make mistakes/oversights too, and welcome anyone ...
didn't cite my original paper, which contributed to us laying the framework for our newer study. A few weeks ago, there was another excellent article in Nature discussing the use of neural networks to understand gene expression.
There is a recent bizarre trend to invite someone famous in bio (Doudna & Lander) to AI summits to speak about AI in bio... How about instead, invite someone who regularly applies AI to biological problems, and pioneered those efforts?
Having interviewed in 8 biotechs in comp bio Bay Area/NYC, I can convincingly say most ask questions irrelevant to the job at hand, and select candidates for knowledge rather than problem-solving skills. But most knowledge can be found on the web in 30 sec and quickly learned...
Biotech market is hypercompetitive rn...we received 160+ applications for 1 MS-level position. Advice: don't hold back on CV at all, many ppl submitted 1 page CV. Terseness is counterproductive here vs those with detailed records...some cultural differences in USA vs abroad
Proud to see my latest publication on fly miRNA targeting is out! This proves: i) you can still publish things from your PhD that you should have 4 years ago, ii) a year in peer review can leave your work substantively unaltered from your bioRxiv preprint
@NatureComms
A very strange study! Income is also strongly associated with being born into wealth, which is in turn associated with the history of ethnic groups who oppressed other ethnic groups...many confounding sociological variables at play to bias the interpretation
to provide a single round of reviews. Month by month, they didn't respond to our numerous inquiries about its status, and finally coughed up that they hardly made progress on finding suitable reviewers. It was a nightmare of a review experience that I wish upon no-one.
Looking forward to Biology of Genomes! Here's a copy of our poster for those who can't attend, on the topic of evaluating alternative polyadenylation events in scRNA-seq data of the developing mouse embryo. Looking forward to meet/discuss with those who can!
#BOG21
the work to 4 journals. Most reviewed it, taking between 4-6 months to do so before rejecting it, each for a different reason than the last journal. Ultimately, we submitted to Cell Reports, transferring our Cell reviews in hopes of speeding things up. It took nearly ~10 months..
My letter to the editorial office declining to review for a closed access journal. I am boycotting in hopes it inspires others to follow suit and pressure journals towards OA in the future. Absurd system to volunteer your time towards ultimately paywalled research
#OpenAccess
#OA
Ironically, I designed the strong baseline myself to demonstrate that the deep learning model was better...it was in itself was a big leap in performance over existing models. The paper was ultimately rejected editorially. Over the next 2 years, I resubmitted...
Excited to announce that our computational R&D team
@Sanofi
has an open position!! We are looking for a recent BS/MS student specialized in the fields of Bioinformatics, Comp Bio, etc, with familiarity w/ NGS workflows and pipeline development.
several reviewers loved it...one even called it a "holy grail for the field". A single reviewer hated it, because they simply hated the idea of using deep learning due to its parameter complexity. Their criticism was that it didn't do much better than a simpler baseline...
Around this time, someone wrote my professor that I plagiarized their work...an extremely serious accusation I don't take lightly. IMO their preprint bore little resemblance to mine, with far worse results numerically. Upon providing them with years of my lab...
Check out our latest work on a property of gene evolution previously overlooked in genome-wide MPRA data! Great collaboration between
@JShendure
@RickMyers_PhD
, & Gregory Cooper's lab.
Some labs are limited in generating data, others in analyzing & modeling data. What if we shared all raw data immediately worldwide upon generation, and analysts could process & benchmark models on data. Much more efficient communal use of tax-payer funded science.
The story begins in 2015. Back then, there were a handful of studies applying deep learning to genomics. The techniques to even work with such packages were extremely immature compared to today. Interested in this area and gene regulation, I embarked on a 3-year voyage...
If I ever want to sadistically torture academics, my plan is to become a journal editor and sit on their manuscripts for 9 months while their aspirations for grants and faculty positions languish into the ether; and all the while they are powerless to hold me accountable 😈😈
to apply these models to the study of transcriptional gene regulation. I could write a treatise about all of the things I tried that failed to bear fruit. I finally deposited my preprint on bioRxiv in 2018. It was in the top 1% of viewed articles. The paper was reviewed by Cell..
The funniest part of reading about scientists hatin' on the new eLife publishing model is how willfully they overlook how deeply broken, inside-and-out, the current publishing model truly is. Literally any new idea is better than the status quo.
Thanks to
@n_skene
,
@sj_marzi
, &
@UKDRI
for inviting me to present on our recent publication on predicting gene expression levels from genomic sequence! Here is a Youtube link of the Zoom recording in case it helps anyone better understand the work!!
Incredible, FREE conference for machine learning in comp bio ongoing, livestreaming via YouTube the whole day. I'll be discussing industry <> academic research as industry rep in panel discussion -- tune in 2p PST/5p EST later today!
Hilarious email to lab from
@JShendure
: "Hope everyone is enjoying the snow -- I snow-camped for the first time in my life in my backyard last night, and I cannot in good faith recommend it to anyone."
#SeattleSnowDay
#SeattleFreeze
Translating science talk to real talk:
"Congrats on your paper!"
"Congrats on overcoming the human suffering you endured associated with addressing the largely arbitrary opinions of an editor and 3 coin flippers upon whose desk your work sat collecting dust for 5 months!"
this is the most fun I've ever had at a virtual poster session in my entire life: running around as a Mario-like figure to attend posters interactively. props to the software engineers & designers!!
#MLCB2020
Recent trend in AI/genomics/LLM space is just to show a model is bigger. Then the paper finishes w/o rigorous benchmarking to show the bigger model is functionally superior in important downstream applications vs smaller SOTA models. Laughably sad state of field, don't be fooled.
To augment my recent
@NatureComms
study on investigating alternative polyadenylation in single cell data, I prepped a 30 min YouTube talk as a gentle introduction to the paper. Thank you to Professor
@JinWuNam1
for inviting me as a virtual guest speaker!
Philosophical Q to academic depts hiring: is a first author of a 30-author paper (spending $300K+ for data) more impressive than a first author of a 3-author paper in medium-tier journal that used modest resources? In short, are we simply elevating scientists from labs with $$$?
For anyone needing to rapidly process massively parallel reporter assay data, use our new open-source tool MPRAflow, and streamlined lentiMPRA experimental protocol! A collaboration amongst 5 research labs.
lentiMPRA protocol and MPRAflow a user-friendly tool for MPRA analyses. Great work by Gracie Gordon,
@TakaInoue5
,
@bethkarenmartin
, Max Schubach,
@vagar112
, and many others from Nir Yosef, Jimmie Ye, Katie Pollard,
@JShendure
, Martin Kircher labs.
@YounesMedkour
Biology is so vast that nobody can possibly know so much about every single branch comprehensively to this level of detail...at some point a presentation is just a poor presentation if it doesn't give sufficient background for more general audiences to understand
Looking forward to Biology of Genomes 2019
#BoG19
Please drop by my poster, would love to chat with anyone interested! For those who cannot make it to the conference, here is a free copy of the poster 🙂
We performed a difficult but critical technical comparison of MPRAs to clarify how experimental design choices influence the interpretation of enhancer activity. Excellent collaboration with
@jasoncklein
, Taka Inoue, Aidan Keith,
@NadavAhituv
,
@JShendure
!
Quite a significant finding if true... Would like to see more negative controls though: does everything in the proteome have some degree of residual RNA binding activity if you try to find it?
Delighted that our study on transcription factors binding
#RNA
is now out in
@MolecularCell
! We provide evidence that TF-RNA binding is pervasive, important for gene control, development, and disease.
Free paper access here for a limited time!
A 🧵:
Excited to announce a paid 6mo Co-op opportunity for Masters or PhD students in our team (Jun-Dec '23)!! Perfect for bioinformatics/cheminformatics students wanting exposure to industry style research in the RNA therapeutics space. Details here:
The problem with these papers is that they obfuscate the actual concentration of miRNA getting into new cell type. Just because biology encodes rules doesn't mean the rules are useful for any physiologically relevant downstream function....
Just got back the reviews for a paper I helped review. Why in the world did the editor send it to 7 reviewers?!? Felt like a gang of us beat up a helpless victim from all sides... Journals need more transparency about how many reviewers they're sending out to.
Just finished "The Dropout" on Hulu. Saddest part is not much has changed. Bay area biotech investors/VCs still throwing $200 million on grad school dropouts who are selling them a scientifically untenable dream -- one so obviously flawed it's jaw-dropping.
Manuscript V2 released with 2 new figs, 3 new supp tables, & expanded RBP analysis! Several RBP families associated to the regulation of alternative polyadenylation using mouse scRNA-seq data.
This might be the most unexpected study that has ever cited our TargetScan work in the last 5 years. Mindblowing the sorts of questions that people even think to ask about miRNAs! 🤯
Most miserable part of science is the publishing process....There is a part of me that wants to turn my back to the traditional publishing system and release my best research only on biorxiv; the culture needs a radical rethinking from the status quo
@GwyerFindlay
To give you additional context, the talk was 1 hr long. The audience was software engineers, molecular & cell biologists, ppl in gene regulation, drug developers, and ppl in aging field. Hard to cater to everyone but if 80% of your audience has never taken an immuno class...
Our preprint, "Reporter gene assays and chromatin-level assays define substantially non-overlapping sets of enhancer sequences" is now online at
What started out as a pandemic project with a few undergrads turned out quite interesting (and disturbing)!
Day 1 of mRNA therapeutics conference in Boston was incredible... mind-blowing tech on the horizon, COVID-19 vaccines was just the beginning! Future seems bright for this nascent field 🙂👍🏾
@aaron_mckenna
I don't think I "regret" it per se, just think I'd deprioritize. Big focus was on sequence alignment & viterbi, RNA folding, HMMs... These are sometimes useful but bioinformatics workflows pretty streamlined now... Would prioritize big data mining/viz, ML & model interpretation
Working in the vaccine industry and getting a severe cold this week, keep hearing jokes of why we haven't invented a vaccine to beat rhinovirus yet...This inspired me to better understand why...The answer is because there are >250 A, B, and C strains whose epidemiology is...
Congrats to the incredible awardees! My job would not exist if it were not for their pioneering work, which laid the groundwork for the entire field to build upon! COVID vaccines were just the beginning of this burgeoning area 😌👍🏾
BREAKING NEWS
The 2023
#NobelPrize
in Physiology or Medicine has been awarded to Katalin Karikó and Drew Weissman for their discoveries concerning nucleoside base modifications that enabled the development of effective mRNA vaccines against COVID-19.
Have a few more openings in my teams. If you have a background in ML/DL and domain expertise in either RNA biology or cheminformatics/lipid bio, please DM me for application instructions! Looking to diversify the team, so underrepresented groups especially encouraged to apply!
I learned the hard way how inefficiencies in the system can stifle the release of research for arbitrary reasons, rather than uplift and improve it. More important to me than the paper is the opportunity to release all of my work open source and open access...
My
#1
reason for leaving academia: I disdained the culture of constantly feeling judged by scientific output... Publishing was usually a distraction to the deeper, most challenging questions that required years of focus. Things weren't always that way...
One of the things I detested most about academia: how much of a political chess game it had become to pad your resume with papers you contributed nothing towards intellectually 🙄 no enforced standards to judge contributions between labs