Leo Zang Profile Banner
Leo Zang Profile
Leo Zang

@LeoTZ03

Followers
3,323
Following
387
Media
310
Statuses
924

Alias: Leo Chen. Protein Designer | Share Daily Papers (AI+Protein/RNA/DNA) | @DukeUBME @pranamanam | @harvardmed @DeboraMarksLab

Joined March 2024
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@LeoTZ03
Leo Zang
3 months
Collection of Papers and Posts
0
16
57
@LeoTZ03
Leo Zang
3 months
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review Link:
Tweet media one
8
259
1K
@LeoTZ03
Leo Zang
3 months
Just find that Caltech has this interesting Biological Circuit Design course open sourced
Tweet media one
4
210
1K
@LeoTZ03
Leo Zang
3 months
List of Papers using Deep Learning for Protein Design (by @PeldomZ )
Tweet media one
2
130
545
@LeoTZ03
Leo Zang
3 months
A brief tutorial on information theory Link:
Tweet media one
2
107
501
@LeoTZ03
Leo Zang
3 months
Rapid protein evolution by few-shot learning with a protein language model - EVOLVEpro, PLMs with active learning, enables rapid and efficient evolution of protein activities. Show promosing optimization of Antibody, CRISPR Nuclease, Prime Editor, Integrase, and T7 RNA Polymerase
Tweet media one
1
40
159
@LeoTZ03
Leo Zang
23 days
If you're looking for labs working on Protein Design, we've put together a list based on our best knowledge. The initial list came from @Zuricho_zbzt , and with a bit of help from me, we've now got it up on GitHub. Feel free to suggest any labs we might have missed!
Tweet media one
21
85
386
@LeoTZ03
Leo Zang
3 months
An introduction to reinforcement learning for neuroscience Link:
Tweet media one
2
80
345
@LeoTZ03
Leo Zang
2 months
Leveraging Deep Generative Model For Computational Protein Design And Optimization | Thesis @UChicago
Tweet media one
5
64
332
@LeoTZ03
Leo Zang
3 months
Deep learning guided design of dynamic proteins - Design proteins with conformational changes, focusing on intra-domain reorientation of secondary structural elements - Use systematic physics-based conformational sampling and Rosetta Design to create a library of alternative
Tweet media one
0
68
268
@LeoTZ03
Leo Zang
3 months
MM-LLMs: Recent Advances in MultiModal Large Language Models |ACL 24’
Tweet media one
3
86
255
@LeoTZ03
Leo Zang
6 months
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments Use Agent "to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing
Tweet media one
2
75
207
@LeoTZ03
Leo Zang
2 months
AlphaFold2 knows some protein folding principles -Use AF2 without MSAs/templates, mimicking an ab initio approach. The iterations show AF2's energy landscape and "local first, global later" folding mechanism. - Folded intermediates of six small proteins (protein G, protein L and
Tweet media one
@liwei_chang_
Liwei Chang
2 months
Did AlphaFold solve the protein folding problem? ...Not yet! AF2 predicts static structures, usually the native state by default. However, we found AF2 can generate structures aligning well with known folding intermediates. @Al__Perez @UFChemistry 1/n
2
37
195
0
50
195
@LeoTZ03
Leo Zang
4 months
Simulating 500 million years of evolution with a language model | @EvoscaleAI - ESM3 (1.4B, 7B, and 98B), a multimodal protein language model on sequence, structure, and function tokens, using MLM for representation learning and generation. - Uses a MLM objective with diverse
Tweet media one
1
42
142
@LeoTZ03
Leo Zang
5 months
Training Compute-Optimal Protein Language Models - Trained 300+ models with 3.5M to 10.7B parameters on 5 to 200B tokens, comparing CLM and MLM scaling behavior - Compiled a dataset of 939M protein sequences (194B tokens) to address overfitting - Observed a transfer phenomenon
Tweet media one
1
37
142
@LeoTZ03
Leo Zang
3 months
Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space Version 5:
Tweet media one
@DdelAlamo
Diego del Alamo
1 year
"Deep Generative Models of Protein Structure Uncover Distant Relationships Across a Continuous Fold Space" has been revised It "provides a sensitive approach to detect and thus explore distant protein relationships" paper: github:
Tweet media one
1
24
95
0
50
182
@LeoTZ03
Leo Zang
3 months
interesting 🤔
Tweet media one
2
36
171
@LeoTZ03
Leo Zang
4 months
A list of methods for tokenizing protein structures: - Foldseek - ProSST - FoldToken 1 & 2 - Learning the Language of Protein Structure Did I miss any works?
@LeoTZ03
Leo Zang
4 months
FoldToken2: Learning compact, invariant and generative protein structure language - Introduces an invariant structure encoder (BlockGAT), vector-quantized compressor (SoftCVQ), and equivariant structure decoder - Uses multiple loss terms (global, fragment, pair, neighbor,
Tweet media one
2
23
141
5
37
164
@LeoTZ03
Leo Zang
2 months
Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures - Dynamic PDB contains 12.6K proteins subjected to 1 microsecond of molecular dynamics simulations with detailed physical properties such as
Tweet media one
Tweet media two
1
33
160
@LeoTZ03
Leo Zang
4 months
Antibody design using deep learning: from sequence and structure design to affinity maturation | @BriefingBioinfo "This survey highlights significant advancements in protein design and optimization, specifically focusing on antibodies. This includes various aspects such as
Tweet media one
0
30
157
@LeoTZ03
Leo Zang
2 months
DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures | @BriefingBioinfo - Combine both sequence and 3D structural features of proteins to improve enzyme turnover number (kcat) prediction accuracy
Tweet media one
0
41
150
@LeoTZ03
Leo Zang
3 months
Accurate prediction of protein function using statistics-informed graph networks | @NatureComms - PhiGnet, uses evolutionary couplings (EVCs) and residue communities (RCs) with Dual-Channel Graph Convolutional Networks. Embed sequences via ESM-1b - Use Grad-CAM method computes
Tweet media one
1
42
143
@LeoTZ03
Leo Zang
4 months
FoldToken2: Learning compact, invariant and generative protein structure language - Introduces an invariant structure encoder (BlockGAT), vector-quantized compressor (SoftCVQ), and equivariant structure decoder - Uses multiple loss terms (global, fragment, pair, neighbor,
Tweet media one
2
23
141
@LeoTZ03
Leo Zang
5 months
ProteinCLIP: enhancing protein language models with natural language - CLIP-like Model aligns embeddings from protein language model and language models describing protein functions - Excels in PPI, Homology, and Mutation identification
Tweet media one
3
23
141
@LeoTZ03
Leo Zang
7 months
Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions Interesting PPI virtual screening work based on AlphaFold-Multimer. - SPOC (Structure Prediction and Omics informed Classifier) is a random forest-based classifier that accurately
Tweet media one
Tweet media two
Tweet media three
3
43
138
@LeoTZ03
Leo Zang
7 months
RNA language models predict mutations that improve RNA function A great work from Jennifer Dounda and Jamie Cate's Lab! - Use Genome Taxonomy Database (GTDB) to build the GARNET (Gtdb Acquired RNa with Environmental Temperatures) database - Train a generative GNN model using a
Tweet media one
Tweet media two
3
27
139
@LeoTZ03
Leo Zang
4 months
Learning the Language of Protein Structure From @instadeepai - Maps protein backbone to continuous downsampled representations by MPNN - Discretizes representations into tokens using Finite Scalar Quantization - Reconstructs protein structures from tokens using a structure
Tweet media one
1
21
134
@LeoTZ03
Leo Zang
4 months
FlowPacker: Protein side-chain packing with torsional flow matching - FlowPacker, a fast and accurate model for predicting side-chain conformations using Torsional Flow Matching and Equivariant Graph Attention Networks - Inference with an exponential schedule for the vector field
Tweet media one
0
32
134
@LeoTZ03
Leo Zang
2 months
A bioactivity foundation model using pairwise meta-learning | @NatMachIntell - ActFound is trained on 1.6 million bioactivity data points across 35,644 assays to predict the bioactivity of compounds using pairwise meta-learning (predicting the relative difference) - Encode
Tweet media one
0
30
135
@LeoTZ03
Leo Zang
2 months
Structure prediction of alternative protein conformations | @NatureComms - Cfold, a structure prediction model designed to predict alternative protein conformations. - Train AF2 without the Template Track and focus on MSAs and coevolutionary signals - Predict different
Tweet media one
@Patrick18287926
Patrick Bryant
2 months
Check out my new work with @FrankNoeBerlin where we answer if the structure of different protein conformations really can be predicted. Also try the Colab to see if your protein has different conformations:
2
110
407
0
31
130
@LeoTZ03
Leo Zang
5 months
Accurate Conformation Sampling via Protein Structural Diffusion - Diffold, a diffusion-based model for robust sampling of diverse protein conformations using amino acid sequences - Transforms AlphaFold2 into a diffusion model, and applies hierarchical reweighting based on
Tweet media one
1
27
125
@LeoTZ03
Leo Zang
6 months
Learning to design protein-protein interactions with enhanced generalization ICLR24' - PPIRef, the largest and non-redundant dataset of 3D protein–protein interactions, - PPIformer, a new SE(3)- equivariant model generalizing across diverse protein-binder variants. - Finetune
Tweet media one
0
24
126
@LeoTZ03
Leo Zang
2 months
Fine-tuning protein language models boosts predictions across diverse tasks | @NatureComms - Finetune pLMs (ESM2, ProtT5, Ankh) on different tasks (GB1, GFP, AAV, Location, Meltome, Stability, Disorder Prediction, and Secondary Structure Prediction) - Explore various PEFT
Tweet media one
1
32
125
@LeoTZ03
Leo Zang
28 days
Generative Modeling of Molecular Dynamics Trajectories | NeurIPS 24' |📸"molecular video" - MDGEN, a flow-based model for MD trajectories, with various capabilities (forward simulation, interpolation, upsampling, and molecular design) - Tokenize structure as Roto-Translation and
Tweet media one
1
29
122
@LeoTZ03
Leo Zang
3 months
Toward De Novo Protein Design from Natural Language - T2struct, encoder-decoder architecture, uses PubMedBERT for encoding text and GPT-2 for decoding structural tokens - Retrain SaProt with projected text embeddings and structural tokens for sequence generation (Retrain ProGen
Tweet media one
1
32
120
@LeoTZ03
Leo Zang
3 months
Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure - CHEAP, a novel method for compressing protein sequence and structure latent space (ESMFold), achieves up to 128x channel and 8x length compression from sequence input alone - Uses per-channel
Tweet media one
2
34
120
@LeoTZ03
Leo Zang
2 months
Unsupervised learning of progress coordinates during weighted ensemble simulations: Application to millisecond protein folding - Improve rare events in protein folding (e.g., state transitions) through weighted ensemble simulation and an unsupervised deep learning model. - Use a
Tweet media one
0
27
119
@LeoTZ03
Leo Zang
2 months
Large protein databases reveal structural complementarity and functional locality - Cluster AFDB with FoldSeek, Annotate with deepFRI, Generate embeddings with Geometricus and Use PaCAMP for dimension reduction Preprint:
Tweet media one
1
36
119
@LeoTZ03
Leo Zang
4 months
Unsupervised evolution of protein and antibody complexes with a structure-informed language model | @ScienceMagazine - Train on millions of nonredundant pairs of protein sequences and backbone - Use autoregressive modeling to integrate sequence and structural information
Tweet media one
1
22
117
@LeoTZ03
Leo Zang
5 months
PocketGen: Generating Full-Atom Ligand-Binding Protein Pockets - Co-designs the residue sequence and full-atom structure of protein pockets for binding - Uses a bilevel graph transformer to model multi-granularity (atom and residue/ligand level) and multi-aspect (intra-protein
Tweet media one
1
35
118
@LeoTZ03
Leo Zang
2 months
De novo design of Ras isoform selective binders | Baker Lab - Use several methods to generate backbone for disordered Ras C-terminus: Amino Acid Recognition Pocket-Based Design, Scaffolded RFDiffusion, and Sequence Input RFDiffusion - For Pocket-Based Design: Build an initial
Tweet media one
0
13
116
@LeoTZ03
Leo Zang
3 months
Geometric deep learning of protein–DNA binding specificity | @naturemethods - Represent DNA structures as symmetrized helices, and Proteins as atom-based graphs with (one-hot atom type, solvent-accessible surface feature, and Atchley factors etc.) - Use spatial Graph
Tweet media one
Tweet media two
0
34
113
@LeoTZ03
Leo Zang
6 days
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Tweet media one
@ChenyuW64562111
Chenyu Wang
6 days
Excited to share: "Fine-tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design" With my amazing coauthors Masatoshi Uehera, @yiyiyihe , @amywang01 , @tbyanc , @lal_avantika , Tommi Jaakkola, @svlevine , @hcwww_ , Aviv Regev
Tweet media one
6
38
225
2
21
112
@LeoTZ03
Leo Zang
14 days
De novo protein design with a denoising diffusion network independent of pretrained structure prediction models | @naturemethods - SCUBA-D uses a two-step denoising process. First, generate an initial low-resolution backbone, then perform multiple steps of denoising to generate
Tweet media one
0
21
108
@LeoTZ03
Leo Zang
5 months
Multi-Scale Protein Language Model for Unified Molecular Modeling | ICML 2024 - ESM-AA, a multi-scale protein language model that enables unified modeling at both the residue and atom scales - Pre-training on multi-scale code-switch protein sequences that randomly unzip residues
Tweet media one
Tweet media two
1
23
106
@LeoTZ03
Leo Zang
4 months
A Survey of Generative AI for de novo Drug Design: New Frontiers in Molecule and Protein Generation New Review 🙌 Link:
Tweet media one
0
36
106
@LeoTZ03
Leo Zang
3 months
T cell receptor binding prediction: A machine learning revolution - Survey of ML/pLM in T cell receptor related problems Link:
Tweet media one
2
24
104
@LeoTZ03
Leo Zang
3 months
Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering | @NatureComms - MODIFY, optimizes diversity at the residue level using Pareto optimization (Stochastic Gradient Ascent) to balance both fitness and
Tweet media one
0
32
103
@LeoTZ03
Leo Zang
3 months
Models are open sourced!
Tweet media one
@LeoTZ03
Leo Zang
5 months
Training Compute-Optimal Protein Language Models - Trained 300+ models with 3.5M to 10.7B parameters on 5 to 200B tokens, comparing CLM and MLM scaling behavior - Compiled a dataset of 939M protein sequences (194B tokens) to address overfitting - Observed a transfer phenomenon
Tweet media one
1
37
142
1
29
102
@LeoTZ03
Leo Zang
7 months
A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions A new 5′ untranslated region (UTR) Language Model! - Pretrain the model with mask prediction, 5' UTR secondary strcuture prediction, and minimum free energy prediction - Finetune [CLS]
Tweet media one
1
19
102
@LeoTZ03
Leo Zang
2 months
Force-Guided Bridge Matching for Full-Atom Time-Coarsened Dynamics of Peptides - Force-Guided Bridge Matching (FBM) learns the dynamics between two states using a Brownian bridge process with the integration of an intermediate force field as guidance for a Boltzmann-like
Tweet media one
0
28
101
@LeoTZ03
Leo Zang
2 months
A catalog of small proteins from the global microbiome | @NatureComms - "construct a global microbial smORFs catalog (GMSC) derived from 63,410 publicly available metagenomes across 75 distinct habitats and 87,920 high-quality isolate genomes. GMSC contains 965 million
Tweet media one
0
21
97
@LeoTZ03
Leo Zang
3 months
Improving AlphaFlow for Efficient Protein Ensembles Generation | ICML 24' Workshop - AlphaFlow-Lit focuses on fine-tuning only the lightweight structure module for faster sampling - Treat AlphaFold as a sequence-conditioned denoising model, focusing on precomputed single and pair
Tweet media one
Tweet media two
0
25
96
@LeoTZ03
Leo Zang
4 months
Reinforcement Learning for Sequence Design Leveraging Protein Language Models - Investigate RL algorithms for protein sequence design using pLM as a reward function - Use ESMFold as the oracle pLM, and Distill it into a smaller model to serve as the proxy reward model - Train the
Tweet media one
1
27
97
@LeoTZ03
Leo Zang
2 months
Mapping glycoprotein structure reveals Flaviviridae evolutionary history | @Nature - Collect 458 Flaviviridae genomes (11 novel) and build a family-wide phylogeny with three distinct clades, using conserved NS5 sequences encoding RdRp - Build a large-scale "protein foldome"
Tweet media one
1
18
97
@LeoTZ03
Leo Zang
20 days
De novo design of ATPase based on the blueprint optimized for harboring the P-loop motif - Use Rosetta to design a stable conformal backbone harboring the P-loop (phosphate-binding loop) for ATPase activity - Conduct fragment assembly simulations with the β-(P-loop)-α-β motif,
Tweet media one
1
20
96
@LeoTZ03
Leo Zang
2 months
Protein Language Models in Directed Evolution | ICML Workshop - Use MSA Transformer for guided directed evolution on enzyme variants (PET degradation) - Few-shot mode takes a small amount of experimental data to train a ridge regression model on top of the MSA Transformer -
Tweet media one
0
28
95
@LeoTZ03
Leo Zang
5 months
Full-Atom Peptide Design based on Multi-modal Flow Matching | ICML 24' - PepFlow, uses a conditional flow-matching framework to model peptide binder structures and sequences - Position (Euclidean space), Orientation (SO(3)), Angles (Toric space), Type (Categorical space)
Tweet media one
2
15
95
@LeoTZ03
Leo Zang
6 months
InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions - Use an adapter to connect the Structure Encoder (ProteinMPNN and others) with pLM (ProGen2) - Outperforms ProteinMPNN in terms of Perplexity and Recovery Rate - Validate the model by
Tweet media one
Tweet media two
1
18
94
@LeoTZ03
Leo Zang
2 months
A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity | @CellDiscovery - Train conditional protein diffusion model, CPDiffusion, a Equivariant Graph Denoising Network for Argonaute (Ago) proteins under the DDPM
Tweet media one
1
31
94
@LeoTZ03
Leo Zang
29 days
Recent Papers from Baker Lab 1. Designed endocytosis-inducing proteins degrade targets and amplify signals | @Nature 2. Multistate and functional protein design using RoseTTAFold sequence space diffusion | @NatureBiotech ProteinGenerator Paper
0
23
92
@LeoTZ03
Leo Zang
1 month
ProteinBench: A Holistic Evaluation of Protein Foundation Models - Benchmark on Inverse Folding, Backbone Design, Sequence Design, Structure-Sequence Co-Design, Motif Scaffolding, Antibody Design, Protein Conformation Prediction Preprint: Project Page:
Tweet media one
1
22
91
@LeoTZ03
Leo Zang
3 months
Fast, sensitive detection of protein homologs using deep dense retrieval | @NatureBiotech -Dense Homolog Retriever (DHR) employs a bi-encoder (ESM1b, first vector as fixed-length vector) architecture and a CLIP-like approach to train on homologous pairs with in-batch negatives -
Tweet media one
0
17
92
@LeoTZ03
Leo Zang
6 months
Designing molecular RNA switches with Restricted Boltzmann machines - Use Restricted Boltzmann machines to design artificial SAM-I riboswitches, focusing on their aptamer domain - The designed sequences were validated through chemical probing, with approximately 30% demonstrating
Tweet media one
0
21
89
@LeoTZ03
Leo Zang
5 months
RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching | ICML 24' - RNAFlow, a flow matching (FM) model for RNA sequences and structures generation, conditioned on protein interactions - Combines an RNA inverse folding model with a pretrained
Tweet media one
0
13
90
@LeoTZ03
Leo Zang
6 months
Accurate structure prediction of biomolecular interactions with AlphaFold 3 | Nature They use Diffusion with Transformer😲. Replaced invariant point attention with a relatively standard non-equivariant point-cloud diffusion model over all atoms
Tweet media one
Tweet media two
Tweet media three
1
19
89
@LeoTZ03
Leo Zang
2 months
Accelerating protein engineering with fitness landscape modeling and reinforcement learning - µFormer, pre-trained using a pairwise masked language model (next thread) on UniRef50. - Fine-tune and evaluate the model on FLIP and ProteinGym (random split 🤔) using residue (capable
Tweet media one
1
16
89
@LeoTZ03
Leo Zang
5 months
ProTrek: Navigating the Protein Universe through Tri-Modal Contrastive Learning - Aligns sequence-structure, sequence-function, and structure-function pairs by ESM, BERT, and Foldseek - Leverages max-inner product search for rapid retrieval preprint:
Tweet media one
2
20
88
@LeoTZ03
Leo Zang
2 months
Benchmarking text-integrated protein language model embeddings and embedding fusion on diverse downstream tasks - Benchmark six tpLMs (OntoProtein, ProteinDT, ProtST, ProteinCLIP, ProTrek, ESM3) against ESM2-3B on six tasks (GB1, GFP, AAV, Location, Meltome, Stability) - No tpLM
Tweet media one
1
24
86
@LeoTZ03
Leo Zang
6 months
Preference optimization of protein language models as a multi-objective binder design paradigm Use DPO (Direct Pereference Optimization) for Peptide Binder Design with ProGPT2
Tweet media one
1
17
87
@LeoTZ03
Leo Zang
1 month
On Arxiv now
@GoogleDeepMind
Google DeepMind
2 months
We’re presenting AlphaProteo: an AI system for designing novel proteins that bind more successfully to target molecules. 🧬 It could help scientists better understand how biological systems function, save time in research, advance drug design and more. 🧵
76
853
3K
4
16
87
@LeoTZ03
Leo Zang
4 months
Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning | @NatureComms - Propose FSFP with model-agnostic meta-learning, Learning to Rank (ListMLE loss), and LoRA to enhance protein language models for few shot learning of fitness - In
Tweet media one
1
14
85
@LeoTZ03
Leo Zang
2 months
PiNUI: A Dataset of Protein-Protein Interactions for Machine Learning - PiNUI, Protein interactions with Nearly Uniform Imbalance - Construct the negative set exclusively from positive sequence pairs, sampling two proteins that each interact with only one other protein in their
Tweet media one
0
19
86
@LeoTZ03
Leo Zang
4 months
BulkRNABert: Cancer prognosis from bulk RNA-seq based language models - Transform gene expression values into tokens by binning Transcripts Per Million (TPM) values - Train Self-supervised MLM over expression data - Finetune MLP head for cancer type classification and survival
Tweet media one
0
25
85
@LeoTZ03
Leo Zang
6 months
FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling - BLIP-2 for Protein Annotation - Combine ESM2 and Mistral-7B - Outperforms DeepGo series
Tweet media one
1
30
86
@LeoTZ03
Leo Zang
2 months
De novo design of miniprotein antagonists of cytokine storm inducers | @NatureComms - Use Rosetta-based binder design approach (Cao, et al., 2024) against IL-6R, GP130, and IL-1R1 - Dock 40k de novo protein scaffolds to hotspot residues (Patchdock and Rifdock), and design 2.5
Tweet media one
0
18
85
@LeoTZ03
Leo Zang
1 month
What has AlphaFold3 learned about antibody and nanobody docking, and what remains unsolved? - Evaluate models on Structural Antibody Database (SabDab) - AF3 improves docking accuracy over AF2-M and AlphaRED, with a 38.4% success rate for antibodies and 36.1% for nanobodies
Tweet media one
1
18
85
@LeoTZ03
Leo Zang
3 months
Adapting protein language models for structure-conditioned design - proseLM, enhances ProGen2 with Structural Adapter Layers - Causal Encoder uses message-passing (MPNN) and invariant-point message-passing (IPMP) layers to capture structural information of both protein and
Tweet media one
1
16
84
@LeoTZ03
Leo Zang
6 months
Evolution-Inspired Loss Functions for Protein Representation Learning Evolutionary Ranking (EvoRank) incorporates evolutionary dynamics from MSA-based Soft Labels to learn more diverse protein representations
Tweet media one
1
19
84
@LeoTZ03
Leo Zang
2 months
Protein isoform-centric therapeutics: expanding targets and increasing specificity | @NatRevDrugDisc - "highlight three modes of action for protein isoform-centric drugs: isoform switching, isoform introduction or depletion, and modulation of isoform activity. In addition, we
Tweet media one
0
17
82
@LeoTZ03
Leo Zang
2 months
ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding Q: <Protein><Struct><Seq> </Protein><QuestionP rompts> A: < Description > - Align sequence and structure modalities (frozen ESM-2 and ESM-IF) via projection layers - Use instruction tuning
Tweet media one
0
25
81
@LeoTZ03
Leo Zang
4 months
General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design - Introduce BADGER, Binding Affinity Diffusion Guidance with Enhanced Refinement - Uses an Equivariant Graph Neural Network (EGNN) to approximate AutoDock Vina’s non-differentiable energy
Tweet media one
1
14
80
@LeoTZ03
Leo Zang
3 months
Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design - Curated antibody dataset using SAbDab and the Walk Jump Sampler method, used a surrogate model (e.g., PyRosetta) to label binding energy, and split into 5 environment sets - Benchmarked SeqCNN,
Tweet media one
1
16
81
@LeoTZ03
Leo Zang
3 months
Diffusing protein binders to intrinsically disordered proteins - Finetune RFdiffusion to accept secondary structure specifications along with sequence input. Add partially masked secondary structure and "block adjacency" information. - Input Target protein sequence (optionally
Tweet media one
0
28
80
@LeoTZ03
Leo Zang
3 months
Peptipedia v2.0: A peptide sequence database and user-friendly web platform. A major update - Expand collection by over 45%, improved functional biological activity tree (managed by managed by PostgreSQL) - Train over 90 binary classification models using protein language models
Tweet media one
0
21
79
@LeoTZ03
Leo Zang
3 months
Contextual AI models for single-cell protein biology - PINNACLE, a geometric deep learning model for generating context-specific protein representations via link prediction and cell type classification pretraining - Construct context-sensitive protein interaction networks and a
Tweet media one
0
25
78
@LeoTZ03
Leo Zang
5 months
EquiScore, a novel protein-ligand interaction scoring method integrating physical prior knowledge @NatMachIntell - Uses a heterogeneous graph neural network to evaluate interactions in equivariant geometric space - Constructs the PDBscreen dataset by combining redocking,
Tweet media one
0
18
77
@LeoTZ03
Leo Zang
5 months
Mamba for Protein Design
@damiano_sga
Damiano Sgarbossa
5 months
🚀 Excited to introduce ProtMamba! Our novel protein language model designed to facilitate protein design.🔬📄 🔧 1/n
Tweet media one
9
53
241
0
24
76
@LeoTZ03
Leo Zang
5 months
Aligning protein generative models with experimental fitness via Direct Preference Optimization @talaldotpdb - ProteinDPO, aligns ESM-IF1 with experimental stability fitness using Direct Preference Optimization - Capable of improved binding affinity prediction and stabilized
Tweet media one
0
10
77
@LeoTZ03
Leo Zang
5 months
Proteus: pioneering protein structure generation for enhanced designability and efficiency | ICML '24 - Proteus, an unconditional protein backbone diffusion model without pre-training by utilizing graph-based triangle methods and a multi-track interaction network - Achieved
Tweet media one
0
19
77
@LeoTZ03
Leo Zang
5 months
MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training - Uses 2D Evolutionary Positional Encoding by RoPE, and flattens MSA for a 1D decoding problem - MSA Generative Pre-Training, Rejective Fine-tuning, Reinforcement Learning from AlphaFold2
Tweet media one
0
18
74
@LeoTZ03
Leo Zang
1 month
De novo Design of A Fusion Protein Tool for GPCR Research - Design Fusion Protein to facilitate GPCR cryo-EM study with enhanced stability and rigidity - Use RFdiffusion + AF2 workflow. Delete the third intracellular loop (ICL3) and generate 10-20 amino acids between ICL3 and
Tweet media one
0
22
76
@LeoTZ03
Leo Zang
3 months
Pseudo-perplexity in One Fell Swoop for Protein Fitness Estimation - Use ESM-2-650M embeddings from an unmasked sequence to predict masked one-at-a-time probability vectors (by MLP), reducing the need for multiple forward passes - Combine OFS pseudo-perplexity technique within an
Tweet media one
0
16
76
@LeoTZ03
Leo Zang
6 months
De novo generation of multi-target compounds using deep generative chemistry @NatureComms POLYGON, a VAE based model with reinforcement learning for programmatic generation of new polypharmacology compounds that inhibit multiple protein targets
Tweet media one
Tweet media two
0
12
76
@LeoTZ03
Leo Zang
3 months
Protein Set Transformer: A protein-based genome language model to power high diversity viromics - Protein Set Transformer (PST) represents genomes as graphs, with proteins as nodes using ESM2 embeddings - Encoder contextualizes protein nodes with different attention weights;
Tweet media one
0
18
76
@LeoTZ03
Leo Zang
4 months
Structures/Backbone using Deep Learning: - Foldseek - SWAMPNN - ProTokens - ProSST - Learning the Language of Protein Structure - FoldToken 1 & 2 - ESM3 Non DL methods before 🤔: - Geometricus and TERMs and More
@LeoTZ03
Leo Zang
4 months
Updated List of Recent Works on Tokenizing Protein Structures/Backbone using Deep Learning: - Foldseek - SWAMPNN - ProTokens - ProSST - Learning the Language of Protein Structure - FoldToken 1 & 2 Non DL methods before 🤔: - Geometricus and TERMs and More Thanks to everyone
1
9
57
2
16
76
@LeoTZ03
Leo Zang
2 months
Unsupervised domain classification of AlphaFold2-predicted protein structures - Use Foldseek for all-against-all local alignment, followed by density-based clustering, and merge into metaclusters of protein domains - Recover known folds from SCOP (94%) and CATH (86%) using a
Tweet media one
1
19
76
@LeoTZ03
Leo Zang
3 months
ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction - Use Protein Chain of Thought (ProCoT) to simulate signaling pathways with ProtTrans embeddings and step-by-step reasoning chains. - Convert the Mol dataset into prompt-answer pairs for
Tweet media one
0
20
75
@LeoTZ03
Leo Zang
4 months
Computational design of soluble and functional membrane protein analogues | @Nature - Uses an AlphaFold2-based pipeline coupled with ProteinMPNN for sequence optimization to design complex folds and soluble analogues - Key proteins designed included Ig-like folds (IGFs),
Tweet media one
0
17
74