![Nadav Brandes Profile](https://pbs.twimg.com/profile_images/1753817574083629056/pwHYvjR9_x96.jpg)
Nadav Brandes
@BrandesNadav
Followers
615
Following
289
Statuses
131
#ComputationalBiology and #AI
New York, USA
Joined June 2016
I apologize for not giving him credit in my original post, but please follow @kirill_vish, the first author of this work.
0
0
0
@i000 It means that the loss will be very noisy. It's difficult to learn from noisy signal but not impossible.
1
0
1
@i000 I agree. But I think it's still possible to train good DNA language models, just more difficult.
1
0
6
@Throrf I don't think it's true. We have hundreds of thousands of species sequenced, and millions of genotyped people. I think we mostly just need to get better at training and evaluating these models.
0
0
1
@felixchin1 Good question! Evo was only trained on prokaryotes. ESM3 is a protein (not DNA) language model. Protein language models are still a much more powerful technology, but I hope it will eventually change.
1
0
11
Thank you Anthony for enriching the discussion! Counterexamples always exist, but from my experience it's rare to have a model that ranks samples well overall while providing no information at the tails of the distribution. I agree that your final metric should match the use case you truly care about, but ROC-AUC is usually the right place to start.
0
0
1