Dustin Tran Profile
Dustin Tran

@dustinvtran

Followers
40,940
Following
657
Media
201
Statuses
2,545

Research Scientist at Google DeepMind. I lead evaluation at Gemini / Bard. AI, Bayesian statistics, deep learning.

San Francisco, CA
Joined June 2013
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@dustinvtran
Dustin Tran
10 months
We are actively hiring in the Bard research team ( @quocleix @HengTze ). The team's leadership continues to be transparent and laser-focused.
8
33
276
@dustinvtran
Dustin Tran
1 year
For those keeping track
Tweet media one
@dustinvtran
Dustin Tran
1 year
2023 has already seen more advances in AI than any other year. This velocity will only increase
5
3
85
24
283
1K
@dustinvtran
Dustin Tran
3 years
My favorite oxymoron in machine learning: "empirically proven".
30
71
869
@dustinvtran
Dustin Tran
3 years
I'm so appreciative that ML is at a state where open-source code, freely available conference videos/proceedings, and now even open reviews are becoming the norm. That's not a luxury all research fields have.
12
68
800
@dustinvtran
Dustin Tran
6 years
"Simple, Distributed, and Accelerated Probabilistic Programming". The #NIPS2018 paper for Edward2. Scaling probabilistic programs to 512 TPUv2 cores and 100+ million parameter models.
Tweet media one
Tweet media two
8
214
740
@dustinvtran
Dustin Tran
7 years
Videos are now available for the 2017 Deep Learning (and RL) Summer Schools in Montreal
2
298
699
@dustinvtran
Dustin Tran
2 years
Recap of the trendiest conversation topic at NeurIPS in the past 7 years 2015: Bayes / RL / OpenAI 2016: Deep RL & Autoregressive generative models 2017: "DL is alchemy" 2018: Glow & Neural ODEs 2019: Understanding DL 2020: GPT-3 2021: people had conversations..? 2022: ChatGPT
9
56
579
@dustinvtran
Dustin Tran
7 years
Starting today, I am at Google full-time as a Research Scientist. See everyone in the Bay Area!
34
11
575
@dustinvtran
Dustin Tran
3 years
Speaking of interesting facts from analyzing conferences: Google is 3-10x the size of any other AI research lab. That's not even including DeepMind which ranks 3rd.
Tweet media one
17
96
551
@dustinvtran
Dustin Tran
3 years
What gripes do you have with LaTeX's default, and what you always add to papers? Here are mine: 1. Cleveref. Don't use "Section \ref{sec:intro}". Use \Cref{sec:intro}. This makes writing less error prone and it makes "Section" part of the hyperlink!
6
99
543
@dustinvtran
Dustin Tran
6 years
Excited to introduce TensorFlow Probability. Official tools for probabilistic reasoning and statistical analysis in the TF ecosystem.
@TensorFlow
TensorFlow
6 years
Introducing TensorFlow Probability: empowering ML researchers and practitioners to build sophisticated models quickly, leveraging state-of-the-art hardware Read about it on the TensorFlow blog ↓
2
343
746
6
159
470
@dustinvtran
Dustin Tran
2 years
It’s surprising that in 2022, there remains little movement away from LaTeX toward a new language. It has some of the most unintuitive designs and syntax you’d expect in a language today.
57
25
463
@dustinvtran
Dustin Tran
4 years
How I spent this weekend: upgrading my battlestation.
Tweet media one
18
5
440
@dustinvtran
Dustin Tran
4 years
Excited to release rank-1 Bayesian neural nets, achieving new SOTA on uncertainty & robustness across ImageNet, CIFAR-10/100, and MIMIC. We do extensive ablations to disentangle BNN choices. @dusenberrymw @Ghassen_ML @JasperSnoek @kat_heller @balajiln et al
Tweet media one
Tweet media two
3
93
419
@dustinvtran
Dustin Tran
5 years
I got into the field by watching . The MLSS videos are all excellent (my favorite is Cambridge 2009 ). For a deeper dive, you should dive into textbooks. I recommend Bayesian Data Analysis and ML: A Probabilistic Perspective.
7
69
422
@dustinvtran
Dustin Tran
2 years
0 papers at NeurIPS and suddenly 12 publications at NeurIPS 2021🤔This may be the most pervasive cheating scandal I've seen in academia
@LeonDerczynski
Leon Derczynski ✍🏻🌹☀️
2 years
machine learning researchers learn to optimise their own best paper rate through collusion and other unregulated mechanisms
Tweet media one
42
205
1K
15
52
413
@dustinvtran
Dustin Tran
6 years
Interesting in quickly experimenting with BNNs, GPs, and flows? Check out Bayesian Layers, a simple layer API for designing and scaling up architectures. #NeurIPS2018 Bayesian Deep Learning, Happening now.
Tweet media one
11
104
408
@dustinvtran
Dustin Tran
7 years
Syllabus for my qualifying exam. It involves 29 papers representing the state of the art in Bayesian deep learning
13
107
414
@dustinvtran
Dustin Tran
7 years
Excited to be at Google for rest of this year. Aside from basic ML research, expect Edward officially merging into @tensorflow (contrib).
13
66
404
@dustinvtran
Dustin Tran
2 years
Just an appreciation tweet for the normalization of arXiv and freely available conference papers in CS. I'm trying to read science papers in Cog Sci and Psychology, and it's a nightmare to access.
13
20
401
@dustinvtran
Dustin Tran
2 years
In our work "Plex", we propose a framework for reliability in AI. We also introduce new models (ViT-Plex & T5-Plex) for reliable decision-making across a broad array of scenarios. Blog: Paper: Code:
12
77
378
@dustinvtran
Dustin Tran
4 years
Tomorrow @latentjasper @balajiln and I present a #NeurIPS2020 tutorial on "Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning". Whether you're new to the area or an expert, there is critically useful info! 8-10:30a PT
8
51
378
@dustinvtran
Dustin Tran
3 years
It's 2021, and we're still debugging functions by manually checking Tensor/np.ndarray shapes. Why aren't type systems for array dimensions a common standard yet?
21
21
348
@dustinvtran
Dustin Tran
7 years
Think in function space, not parameter space. @yeewhye 's talk on Bayesian deep learning at #NIPS2017
Tweet media one
3
90
322
@dustinvtran
Dustin Tran
8 years
Rajesh Ranganath, Dave Blei and I released "Deep and Hierarchical Implicit Models" on arXiv
Tweet media one
Tweet media two
3
137
322
@dustinvtran
Dustin Tran
5 years
daily reminder about jupyter notebooks
Tweet media one
Tweet media two
9
45
322
@dustinvtran
Dustin Tran
2 years
Wow, they finally did it. You can now render LaTeX equations in Markdown. all with MathJax under the hood
5
46
315
@dustinvtran
Dustin Tran
2 years
One thing I find fascinating is that Parti is another data point suggesting that the key to large models is not diffusion, GANs, contrastive training, autoregressivity, or other more complex methods. What matters most is scale.
@JeffDean
Jeff Dean (@🏡)
2 years
"A photo of the back of a wombat wearing a backpack and holding a walking stick. It is next to a waterfall and is staring at a distant mountain." #parti
Tweet media one
14
167
1K
9
24
306
@dustinvtran
Dustin Tran
6 years
How do we specify priors for Bayesian neural networks? Check out our work on Noise Contrastive Priors at the ICML Deep Generative Models workshop 11:40am+. @danijarh , @alexirpan , Timothy Lillicrap, James Davidson
Tweet media one
4
82
293
@dustinvtran
Dustin Tran
7 years
"A Research to Engineering Workflow". An outline of how I personally learn and do basic research.
3
80
289
@dustinvtran
Dustin Tran
4 years
Snippet 1 from the #NeurIPS2020 tutorial: @balajiln What do we mean by uncertainty and out-of-distribution robustness?
Tweet media one
Tweet media two
3
69
284
@dustinvtran
Dustin Tran
7 years
Excited to be joining @OpenAI today. (I am on leave from Columbia for the rest of this year.) also: shout out to those in the Bay Area!
19
13
288
@dustinvtran
Dustin Tran
6 years
Talks for the Probabilistic Programming conference #PROBPROG2018 are now available! Includes, e.g., Zoubin Ghahramani @djsyclik Dave Blei @roydanroy @tom_rainforth @migorinova Stuart Russell, Josh Tenenbaum, and many others.
1
84
282
@dustinvtran
Dustin Tran
1 year
Yann is wrong that the issue is in autoregressive generation. In fact, you can make an autoregressive model generate a full sequence and refine through inverse CDF-like tricks. The result is exactly the same. 1/3
@ylecun
Yann LeCun
1 year
I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n 1/
Tweet media one
218
537
3K
12
23
271
@dustinvtran
Dustin Tran
6 years
Highly recommend this augmentation, infrastructure, and human-centric perspective on "AI" by Mike Jordan
Tweet media one
6
70
261
@dustinvtran
Dustin Tran
2 years
Official Bard announcement! Team has been hard at work (myself humbly included). Excited to release and share more details soon.
@sundarpichai
Sundar Pichai
2 years
1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.
739
3K
15K
13
8
253
@dustinvtran
Dustin Tran
5 years
Check out Discrete Flows, a simple way to build flexible discrete disctributions. @keyonV @kumarkagrawal @poolio @laurent_dinh Our poster's at the Generative Models for Structured Data workshop! Room R02, today 3:15p #iclr2019
Tweet media one
2
48
246
@dustinvtran
Dustin Tran
6 years
PyMC4 announces to base on @TensorFlow and TensorFlow Probability. This is exciting news for consolidating open source efforts for machine learning!
@twiecki
Thomas Wiecki
6 years
Big announcement on #PyMC4 (it will be based on #TensorFlow probability) as well as #PyMC3 (we will take over #Theano maintenance)
8
122
354
2
69
239
@dustinvtran
Dustin Tran
2 years
I'm against GPT-4chan's unrestricted deployment. However, a condemnation letter against a single independent researcher smells of unnecessary pitchfork behavior. Surely there are more civil and actionable approaches. I'd love to hear what steps were taken leading up to this
@percyliang
Percy Liang
2 years
There are legitimate and scientifically valuable reasons to train a language model on toxic text, but the deployment of GPT-4chan lacks them. AI researchers: please look at this statement and see what you think:
73
138
502
17
10
233
@dustinvtran
Dustin Tran
5 years
With all the new ML frameworks lately and the short lifespan of old ones, I sometimes wonder if we'd all better if we had just stuck with Theano.
16
17
221
@dustinvtran
Dustin Tran
7 years
"The shift from AI being a research domain to it increasingly becoming a research + engineering domain, is a strong signal that we're not in a bubble this time." +100. Systems research advances science and is immediately useful :)
3
70
218
@dustinvtran
Dustin Tran
6 years
"Bayesian Layers: A Module for Neural Network Uncertainty" on arXiv: . With @dusenberrymw , @markvanderwilk , @danijarh .
@dustinvtran
Dustin Tran
6 years
Interesting in quickly experimenting with BNNs, GPs, and flows? Check out Bayesian Layers, a simple layer API for designing and scaling up architectures. #NeurIPS2018 Bayesian Deep Learning, Happening now.
Tweet media one
11
104
408
2
61
207
@dustinvtran
Dustin Tran
7 years
TensorFlow Distributions. For researchers: learn about all its features, PPL: learn how DL apps leads to new designs
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
73
203
@dustinvtran
Dustin Tran
3 years
There are three types of researchers: 1. those that only look at the paper's methods and ideas; 2. those that only look at experiments; and 3. those that no longer read papers.
8
11
207
@dustinvtran
Dustin Tran
4 years
I love how LaTeX font size names are so arbitrary. small, normalsize, large, Large, LARGE.
13
4
205
@dustinvtran
Dustin Tran
5 years
See the arXiv version at . It includes character-level results (1.38 bpc on PTB; 1.23 bpc on text8) with a RealNVP-like flow model that's 100-1000x faster at generation than state-of-the-art autoregressive baselines.
@dustinvtran
Dustin Tran
5 years
Check out Discrete Flows, a simple way to build flexible discrete disctributions. @keyonV @kumarkagrawal @poolio @laurent_dinh Our poster's at the Generative Models for Structured Data workshop! Room R02, today 3:15p #iclr2019
Tweet media one
2
48
246
4
61
200
@dustinvtran
Dustin Tran
4 years
Thanks for the opportunity to give a tutorial on "Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning"! With @latentjasper and @balajiln . Look forward to it. :-)
@DaniCMBelg
Danielle Belgrave
4 years
Thanks to everyone who submitted tutorial proposals for this year's @NeurIPSConf . It was amazing to see all the great work that went in to putting together these proposals. We are happy to announce this year's tutorials at #neurips2020
3
92
321
3
17
199
@dustinvtran
Dustin Tran
7 years
Presenting "Why Aren't You Using Probabilistic Programming?" tomorrow. 8:05-8:30 at Hall C #NIPS2017
Tweet media one
5
46
197
@dustinvtran
Dustin Tran
3 years
Every time I use matplotlib, I'm confused whether to use set_xticks, set_xticks_labels, or xticks. Why pyplot and the object-oriented API are inconsistent, and why matplotlib even supports two ways to do the same thing, is beyond me.
17
2
195
@dustinvtran
Dustin Tran
2 years
Interesting to observe that over the years, the cost and people involved to make AI research papers is probably closer to that of a film production than a novel.
6
13
187
@dustinvtran
Dustin Tran
7 years
"Google and Others Are Building AI Systems That Doubt Themselves" by @willknight Edward, Pyro, and prob programming
7
79
187
@dustinvtran
Dustin Tran
6 years
Interested in causal models? Check out our work applying it to genomics. #ICLR2018 4:30-6:30p today @ East Meeting ( #9 ). With Dave Blei @blei_lab
Tweet media one
5
49
188
@dustinvtran
Dustin Tran
7 years
TensorFlow v1.3.0, including first official release of the `tf.distributions` library
6
65
185
@dustinvtran
Dustin Tran
5 years
Not so controversial take: I don’t think anyone seriously believes doing exact Bayes on a network tuned for SGD and not tuning the prior (or likelihood) will be actually lead to better predictive results.
4
25
176
@dustinvtran
Dustin Tran
7 years
Theano stops development. Thanks for all the amazing, innovative work!
0
102
170
@dustinvtran
Dustin Tran
3 years
Check out the @Entropy_MDPI Special Issue on Probabilistic Methods for Deep Learning. w/ @eric_nalisnick Submission deadline: October 1. It's a timely venue if you're looking to publish, say, a summer research project or a previous conference submission
3
23
172
@dustinvtran
Dustin Tran
2 years
Spicy take against their take: This is the classic resistance whenever there's a paradigm shift in the field. The air in the room is the same as in 2013-14 when deep learning was on the rise.
@LeonDerczynski
Leon Derczynski ✍🏻🌹☀️
2 years
'I don't really trust papers out of "Top Labs" anymore'
Tweet media one
47
554
4K
10
6
170
@dustinvtran
Dustin Tran
4 years
Reviewers have fragile egos. Successful rebuttals are not only about who’s right but also about diplomacy.
6
5
170
@dustinvtran
Dustin Tran
2 years
@docmilanfar I plugged it into YOLOv3. not as bad as I thought
Tweet media one
5
6
164
@dustinvtran
Dustin Tran
7 years
Why probabilistic generative models? A great, concise description. (Found by stalking @DavidDuvenaud 's courses. :)
Tweet media one
2
86
162
@dustinvtran
Dustin Tran
7 years
Come check out our poster on "Deep Probabilistic Programming" today at 10:30-12:30p, C3. #ICLR2017
Tweet media one
2
42
156
@dustinvtran
Dustin Tran
7 years
Thanks @lawrennd for inviting me to GPSS! My slides on "Probabilistic Programming with GPs":
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
38
151
@dustinvtran
Dustin Tran
6 years
The Chambers Statistical Software Award at #JSM2018 was graciously given to Edward. Check out the talk on Monday (10:30am). Will also be at the evening mixer—reach out if you're around!
9
18
153
@dustinvtran
Dustin Tran
7 years
Finally had time to watch this insightful talk on "AI impact on jobs" by Michael Osborne. Highly recommend it.
3
29
147
@dustinvtran
Dustin Tran
7 years
Tutorial on “Deep Probabilistic Programming: TensorFlow Distributions and Edward” w/ Rif Saurous, 2pm #POPL2018
2
36
143
@dustinvtran
Dustin Tran
7 years
"Deep Probabilistic Programming", at @iclr2017 . And a companion webpage to follow the code
Tweet media one
Tweet media two
1
54
143
@dustinvtran
Dustin Tran
8 years
“The Algorithms Behind Probabilistic Programming.” Great description of Bayesian inf, NUTS, ADVI by @FastForwardLabs
1
68
140
@dustinvtran
Dustin Tran
5 years
Check out our work analyzing (non)autoregressive models for NMT! Nonautoregressive latent variable models can achieve higher likelihood (lower perplexity) than autoregressive models.
@jaseleephd
Jason Lee
5 years
“On the Discrepancy between Density Estimation and Sequence Generation” Seq2seq models are optimized w.r.t log-likelihood. We investigate the correlation btw. LL and generation quality on machine translation. w/ @dustinvtran , @orf_bnw , @kchonyc . (1/n)
6
74
268
2
18
137
@dustinvtran
Dustin Tran
4 years
Check out our Hyperparameter Ensembles, #NeurIPS2020 camera-ready at . Hyper-deep ensembles expand on random init diversity by integrating over a larger space of hparams. Hyper-batch ensembles expand on efficient methods. @flwenz @RJenatton @latentjasper
Tweet media one
Tweet media two
4
30
138
@dustinvtran
Dustin Tran
7 years
Edward, now with Jupyter notebooks
1
44
136
@dustinvtran
Dustin Tran
7 years
67 accepted papers at #NIPS2017 Approximate Inference workshop. Titles at . Thanks to PCs!
4
42
137
@dustinvtran
Dustin Tran
5 years
I agree with this post about the TensorFlow user experience. Here's my response regarding the unfortunate lack of success in specifically getting Bayesian neural networks to work.
5
11
135
@dustinvtran
Dustin Tran
7 years
Our ADVI journal paper has been published. Available in Stan and PyMC3; partially in Edward and WebPPL.
Tweet media one
Tweet media two
0
42
126
@dustinvtran
Dustin Tran
2 years
@AlexGDimakis I actually argue one of deep learning's biggest flaws is the lack of modularity. Imagine a system where changing one component affects every other component. This is "end-to-end learning": an engineering nightmare of leaky abstractions that we're somehow OK with in modern ML.
7
9
126
@dustinvtran
Dustin Tran
5 years
We released the ICLR paper! BatchEnsemble includes SOTA on efficient lifelong learning across splitCIFAR and splitImageNet, improved accuracy+uncertainty across CIFAR and contextual bandits, WMT, and diversity analysis. Lead by Yeming Wen and w/ Jimmy Ba.
@dustinvtran
Dustin Tran
5 years
Check out BatchEnsemble: Efficient Ensembling with Rank 1 Perturbations at the #NeurIPS2019 Bayesian DL workshop. Better accuracies and uncertainty than dropout and competitive with ensembles across a wide range of tasks. 1/-
Tweet media one
1
14
109
4
23
124
@dustinvtran
Dustin Tran
7 years
Excellent recorded talks from Cognitive Computional Neuroscience 2017. (thanks @skrish_13 for pointing me to it)
2
29
120
@dustinvtran
Dustin Tran
7 years
A key idea for merging probabilistic programming with deep learning: Conditional independence from Vikash Mansinghka #NIPS2017 tutorial
Tweet media one
2
21
120
@dustinvtran
Dustin Tran
7 years
My comments on "Zhusuan: A Library for Bayesian Deep Learning"
3
31
121
@dustinvtran
Dustin Tran
6 years
Check out the Image Transformer by @nikiparmar09 @ashVaswani others and me. Talk at 3:20p @ Victoria (Deep Learning). Visit our poster at 6:15-9:00p @ Hall B #217 !
Tweet media one
0
31
119
@dustinvtran
Dustin Tran
7 years
Interesting work from UberAI labs. Probabilistic programming in PyTorch led by Noah Goodman, Eli Bingham, and others.
@Tkaraletsos
Theofanis Karaletsos
7 years
Like Bayesian Inference and #pytorch ? Try our PPL, Pyro.
2
68
191
3
37
120
@dustinvtran
Dustin Tran
7 years
I think the more we learn about neural nets, the more statements such as "uninterpretable", "data inefficient", or "noncausal" are wrong
7
28
119
@dustinvtran
Dustin Tran
7 years
Thanks @yaringal @andrewgwils @ChrLouizos for organizing! Slides available at
@dustinvtran
Dustin Tran
7 years
Presenting "Why Aren't You Using Probabilistic Programming?" tomorrow. 8:05-8:30 at Hall C #NIPS2017
Tweet media one
5
46
197
0
34
117
@dustinvtran
Dustin Tran
7 years
TensorFlow Dataset API (). A cleaner and streamlined input pipeline
0
38
118
@dustinvtran
Dustin Tran
5 years
Highly recommend Emti's "Deep Learning with Bayesian Principles" tutorial. Emti has some unique perspectives on Bayesian analysis from optimization to structured inference.
@EmtiyazKhan
Emtiyaz Khan
5 years
Excited for the tutorial tomorrow (Dec 9) at 9am at #NeurIPS2019 If you are at the conference and would to chat, please send me an email (also, if you are interested in a post-doc position in our group at Tokyo).
6
22
141
2
14
118
@dustinvtran
Dustin Tran
8 years
Recorded talks & panels are available for the NIPS 2016 Workshop on Advances in Approximate Bayesian Inference
2
48
117
@dustinvtran
Dustin Tran
7 years
An excellent intro to normalizing flows by @ericjang11 —for density estimation, variational inference, and RL.
@ericjang11
Eric Jang
7 years
I finally learned what a determinant was and wrote a blog post on it. Check out this 2-part tutorial on Normalizing Flows!
Tweet media one
5
172
610
0
24
115
@dustinvtran
Dustin Tran
6 years
"Formulating [RL] as inference provides a number of other appealing tools: a natural exploration strategy based on entropy maximization, effective tools for inverse reinforcement learning, and the ability to deploy powerful approximate inference algorithms to solve RL problems."
@svlevine
Sergey Levine
6 years
If you want to know how probabilistic inference can be tied to optimal control, I just put up a new tutorial on control as inference: This expands on the control as inference lecture in my class:
6
280
804
0
18
114
@dustinvtran
Dustin Tran
7 years
See our #NIPS2017 poster tonight on implicit models + variational inference, with Rajesh Ranganath, Dave Blei. #179
Tweet media one
1
21
112
@dustinvtran
Dustin Tran
8 years
#NIPS2016 tutorial for Variational Inference slides now online! by dave blei, @shakir_za , rajesh ranganath
1
56
113
@dustinvtran
Dustin Tran
3 years
@DavidDuvenaud I'm in favor of model uncertainty and data uncertainty. @balajiln @latentjasper and I also used these terms in our #NeurIPS2020 tutorial. Model vs data makes where uncertainty appears extremely clear in terms of the pipeline. (Ir)reducibility (aleatoric/epistemic) is too subtle.
7
7
113
@dustinvtran
Dustin Tran
3 years
Virtual conferences are too convenient. You shouldn't be able to select the exact presentations you want to watch and when to watch it. Physical constraints—location, time zone, room size, walking—are all crucial elements of the experience.
8
3
111