Stop by our "Chain of Log-Concave Markov Chains" poster this afternoon at
#ICLR2024
!
We present a sampling algorithm that generalizes walk-jump sampling.
Wed 4:30pm
#148
Paper alert🚨 Proteins are my new dark energy, but my latest paper employs hierarchical Bayesian inference, importance sampling, deep sets – all methods close to my heart! 🧵
Paper alert🚨 We apply latent stochastic differential equations (SDEs) to model the time variability of extreme astronomical objects called active galactic nuclei (AGN). Check out our spotlight at
#ICLR2023
#Physics4ML
! 🧵
We are excited to present PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design! 🧵
We are excited to present PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design! 🧵
If you're at
#SIAMUQ24
, come to my talk on BOtied 🎀, a new CDF-based acquisition function for multi-objective BayesOpt (feat. copulas, tails) in MS226 (Fri 5-7pm)! joint work with
@tagasovska
@stephenrra
@kchonyc
You will join me,
@m_kirchmeyer
, and the
@PrescientDesign
, Genentech team in modeling dynamical systems with ML for drug discovery. We welcome diverse approaches, including those using causality, new solvers, or LLMs! 2/5
We release with this paper the open-source
@LSSTDESC
Python package, Node to Joy. It implements the structure-enhanced raytracing, training and evaluation of the BGNN, comparison with number counts matching, and the hierarchical inference analysis.
By encapsulating our experimental and biological priors on the relationship between molecular properties, our method promises to accelerate computational drug discovery. ⏩💊
Check out my new paper, [2012.00042] Large-Scale Gravitational Lens Modeling with Bayesian Neural Networks for Accurate and Precise Inference of the Hubble Constant
Being so luminous, AGNs can be observed out to great distances, close to the edge of the observable Universe. By inferring black hole physics from AGN light, we can gain insights about the origin and evolution of the Universe, including the nature of dark energy and dark matter.
We run multi-objective BO with PropertyDAG over multiple simulated active learning iterations on a variety of tasks, including an antibody design task based on a real-world dataset of expression and affinity measurements.
We modify the latent SDE (~an infinite-dimensional VAE with an SDE-induced process as the latent) to jointly (1) reconstruct the multivariate AGN time series from limited observations and (2) infer the posterior PDF over key black hole properties.
Bayesian optimization (BO) is a sample-efficient framework for navigating the exploration-exploitation trade-off in the vast design space of biological sequences. PropertyDAG operates on top of the traditional multi-objective BO to confer hierarchy to the objectives.
We recover the true input H0 at 0.7% precision, from our test set of 200 simulated lenses. Our pipeline is implemented in the open-source Python package "H0rton":
Traditionally, the external convergence has been estimated by reducing the whole sightline of data to some summary statistics (such as galaxy number counts N) and matching them to a simulation with known convergence. BGNNs can instead take in all available info. No info wasted!
This is a 3-month full-time internship with a substantial ML research component. We will provide you with the data, computing resources, and mentorship to develop algorithms aimed at solving complex biological problems. 4/5
On realistic, physics-driven simulations of AGN time series, we outperform the Gaussian process baseline in reconstruction and can also precisely infer the black hole properties.
To impose a desired partial ordering on the objectives, e.g. expression --> affinity, our framework modifies the posterior inference procedure within standard BO in two ways.
For instance, when designing antibodies, we'd like to maximize the binding affinity to a target antigen only if expression happens in live cell culture -- modeling the experimental dependency in which affinity can only be measured for antibodies that express in viable quantities.
We propose the Bayesian graph neural network (BGNN), a deep set that estimates the posterior on convergence by updating local embeddings (modeling individual galaxies, whose number can vary) and a global embedding (modeling the whole line of sight) in a series of residual blocks.
Second, before samples from the posterior distribution inferred by the surrogate model enter the multi-objective acquisition function (like EHVI), we transform the samples such that they conform to the specified partial ordering of properties.
In the mood for a quick weekend read that packs a punch?📚👊 Check out my translation of a very socially relevant Korean short story, "Blind Spot" by Im Sol-A, published online in the latest issue of Nabillera.
AGN variability is believed to be stochastic and commonly modeled as a Gaussian process with a predefined kernel. Latent SDEs (Li et al 2020) offer a more flexible model of the stochastic dynamics.
We then combine the convergence constraints on individual sightlines from the BGNN in a hierarchical Bayesian model via importance weighting to obtain constraints on population-level hyperparameters of the test set.
What is convergence? Say we observe a galaxy in the sky. It may appear to us distorted in shape or magnified/de-magnified in brightness due to gravitational lensing by foreground masses, both luminous and dark. Convergence captures the degree of magnification.
We seek to robustly infer the weak lensing convergence from photometric catalog information (e.g., measured positions, colors) of galaxies along a line of sight.
It demonstrates that lens modeling with Bayesian neural networks (BNN) is accurate and efficient enough so as to enable unbiased recovery of H0 over hundreds (!) of lenses.
Supermassive black holes reside at the centers of most galaxies, feeding on diffuse matter around them. Radiation from matter falling into their gravitational pull makes these galactic centers, or AGNs, some of the most luminous in the Universe.
In all tasks, PropertyDAG-BO identifies significantly more designs that are jointly positive (exceeding a chosen threshold in a given property and all its ancestral properties, like "expressing binders") than does standard BO.
The book "Foundations of Probabilistic Programming" has now been published, edited by
@alexandra8silva
, Barthe and Katoen, and including the chapter I co-authored with
@schrijvers_tom
, Carroll Morgan and Annabelle McIver. Best of all, it's Open Access!
Upcoming large-sky telescope surveys herald an unprecedented increase in AGN data volume. Our method -- equipped for multiple bands, long gaps of missing data, non-uniform sampling, and random/systematic noise -- promises to accelerate cosmology and AGN physics.
The BGNN extracts more convergence signal compared to matching number counts, particularly in the high- and low-convergence tails of the training set with sparse samples. Note that number counts matching is limited by sample variance.
The variability patterns of an AGN's light can reveal important information about the physical properties of the underlying black hole, such as its mass and age.
Why do we care about it? Reconstructing a map of convergence across the sky can inform the relationship between luminous matter (which we see in starlight) and dark matter (which we don't see but convergence captures), known as the galaxy-halo connection.
Bonus: We use raytracing in Lenstronomy to enhance the lensing scales of large-scale simulations. Starting with convergence computed on a coarse ~1' grid, we introduce finer fluctuations on galaxy lensing scales of ~1" to generate our training set.
The BNN method of H0 inference only takes ~10 minutes per lens, compared to ~3 hours for the traditional forward modeling method! It promises to become a key tool in exploring ensemble-level systematics in lens modeling for H0 inference ...
First, we model each objective as zero-inflated, i.e. as a mixture of zeros and a continuous distribution of non-zero values. The surrogate model consists of a binary classifier and a regressor inferring the zeros and non-zeros, respectively.
WFIRST --> the Nancy Grace Roman Space Telescope. I wouldn't have known about Roman's legacy with Hubble if it weren't for the naming. Loving this trend to honor trailblazing women in astrophysics, following the Vera Rubin Observatory!
If a dark matter halo happens to line up in front of a quasar, we have a *strong* gravitational lens from which we can measure the Hubble constant. The precision of this analysis depends on the total "external" convergence due to objects in the lens environment and line of sight.
On a particularly challenging test set with high convergence relative to the training set, BGNN recovers the population mean precisely and without bias, resulting in a sub-percent contribution to the hypothetical error budget on the Hubble constant.