![Xubo Liu Profile](https://pbs.twimg.com/profile_images/1721647536732053504/ys2JTcB7_x96.jpg)
Xubo Liu
@LiuXub
Followers
2K
Following
503
Statuses
245
Research Scientist, Meta GenAI
Guildford, England
Joined October 2020
I am excited to share TAAE — the first Transformer-based Audio AutoEncoder scaled to 1B parameters for neural speech coding! 🔥 TAAE achieves state-of-the-art speech quality at ultra-low bitrates of 400 or 700 bits-per-second, delivering reconstruction quality remarkably close to real audio. It sets a new benchmark for efficient and high-quality speech tokenization. 📖 Paper: 👂 Demos: 💻 GitHub: Code and pre-trained models will be released to empower the community! Thank you to my collaborators at @StabilityAI : Julian Parker, Anton Smirnov, @jordiponsdotme , and the @harmonai_org Team, for their incredible contributions to this work!
7
10
96
RT @jordiponsdotme: Passionate about AI, music, and audio research? Join our team at Stability AI as a research intern. Application link:…
0
9
0
I'm thrilled to announce that our paper "Scaling Transformers for Low-Bitrate High-Quality Speech Coding" has been accepted to ICLR 2025 @iclr_conf ! Looking forward to seeing everyone in Singapore!
0
2
45
We are excited to release Stable Codec Speech! 🔥 This released model is developed based on our recently published TAAE architecture, has been trained on 10w hours of speech data and includes additional improvements (details available in the GitHub repository). It demonstrates state-of-the-art capabilities in tokenizing 16 kHz speech signals at exceptionally low bitrates (e.g., 700 bps and 400 bps), while maintaining outstanding reconstructed audio quality! Paper, model and code all are released for our community! ❤️ TAAE Paper: Demopage: Checkpoint: Code: We're looking forward to seeing what the community builds with this! A huge thank you to my collaborators at @StabilityAI : Julian Parker, Anton Smirnov, @jordiponsdotme , and the @harmonai_org Team, for their remarkable contributions to this work!
0
6
44
Model and code are official released now! 🔥 Checkpoint: Code:
I am excited to share TAAE — the first Transformer-based Audio AutoEncoder scaled to 1B parameters for neural speech coding! 🔥 TAAE achieves state-of-the-art speech quality at ultra-low bitrates of 400 or 700 bits-per-second, delivering reconstruction quality remarkably close to real audio. It sets a new benchmark for efficient and high-quality speech tokenization. 📖 Paper: 👂 Demos: 💻 GitHub: Code and pre-trained models will be released to empower the community! Thank you to my collaborators at @StabilityAI : Julian Parker, Anton Smirnov, @jordiponsdotme , and the @harmonai_org Team, for their incredible contributions to this work!
0
0
13
More details (2/2) "Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Survey" We provide a comprehensive review of digital aquaculture technologies, examining three interconnected tasks - fish tracking, counting, and behavior analysis - through various methods including vision-based, acoustic-based, and biosensor-based approaches. Paper:
0
0
0
More details (1/2) "Multimodal Fish Feeding Intensity Assessment in Aquaculture" In this work, we introduce a novel multimodal dataset focused on fish feeding behavior and present a unified model capable of processing both single and multiple modalities. Our system leverages audio-visual observations during the fish feeding process to automatically predict whether the fish is satiated or not. Paper:
0
0
1
Training code and model weights of SD-Codec have been released! 🔥 GitHub:
I'm excited to introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a new codec model that can disentangle arbitrary audio sources into distinct latent codes for speech, music, and SFX. Check our paper below 👇 Paper:
1
1
11
All five (5/5) of our papers were accepted at ICASSP this year! Huge congrats to my co-authors and thanks for their efforts! 👏 - FlowSep: Language-Queried Sound Separation with Rectified Flow Matching ( - Learning Source Disentanglement in Neural Audio Codec ( - Sound-VECaps: Improving Audio Generation With Visual Enhanced Captions ( - Disentangling Hierarchical Features for Anomalous Sound Detection Under Domain Shift - NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval
2
2
40