Ziyang Ma Profile
Ziyang Ma

@ddlbojack

Followers
242
Following
68
Statuses
57

PhD Candidate SJTU X-LANCE Lab | Focus on speech, language, audio and music processing | Ex @MSFTResearch NLC Group @AlibabaGroup Tongyi SpeechAI

Joined September 2022
Don't wanna be here? Send us removal request.
@ddlbojack
Ziyang Ma
2 months
500 Citation Millstone 🥳
Tweet media one
5
2
80
@ddlbojack
Ziyang Ma
4 months
RT @reach_vb: Let's goo! F5-TTS 🔊 > Trained on 100K hours of data > Zero-shot voice cloning > Speed control (based on total duration) > Em…
0
242
0
@ddlbojack
Ziyang Ma
4 months
Marriage of BERT and LLaMA 😂
@vaibhav_adlakha
Vaibhav Adlakha
4 months
A little teaser for LLM2Vec @COLM_conf! Stop by Tuesday morning poster session to know how we officiated the marriage of BERTs and Llamas! 🦙
Tweet media one
0
0
4
@ddlbojack
Ziyang Ma
4 months
RT @WilliamWangNLP: BREAKING: Taylor Swift's Eras Tour just did what AI couldn’t—pushed NeurIPS by a whole day! 🤖 🤣🤣🤣 #NeurIPS 2024 Confer…
0
55
0
@ddlbojack
Ziyang Ma
5 months
@FeitengLi 太硬了,feiteng大佬整顿职场😂
1
0
0
@ddlbojack
Ziyang Ma
5 months
RT @karpathy: It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They…
0
1K
0
@ddlbojack
Ziyang Ma
6 months
Glad that I will go to Kos, Greece🇬🇷 for #Interspeech2024 in person. We have 2 papers at oral sessions and 2 at poster sessions. Drop by if you are interested at SSL, LLM, emotion, and real-time interaction & generation!
Tweet media one
Tweet media two
0
0
14
@ddlbojack
Ziyang Ma
6 months
RT @omarsar0: Foundation Models for Music Provides a comprehensive overview of state-of-the-art pre-trained models and foundation models i…
0
116
0
@ddlbojack
Ziyang Ma
6 months
0
0
0
@ddlbojack
Ziyang Ma
6 months
RT @WenhuChen: I love simple yet effective things. However, reviewers never agree with me on that.
0
16
0
@ddlbojack
Ziyang Ma
6 months
RT @arankomatsuzaki: Language Model Can Listen While Speaking Explores full duplex modeling in interactive speech LMs, focusing on enhanci…
0
76
0
@ddlbojack
Ziyang Ma
6 months
Check our listening-while-speaking language model (LSLM), pushing interactive speech language model (iSLM) a step forward!
@_akhaliq
AK
6 months
Language Model Can Listen While Speaking Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisfactory. To address these limitations, we explore full duplex modeling (FDM) in interactive speech language models (iSLM), focusing on enhancing real-time interaction and, more explicitly, exploring the quintessential ability of interruption. We introduce a novel model design, namely listening-while-speaking language model (LSLM), an end-to-end system equipped with both listening and speaking channels. Our LSLM employs a token-based decoder-only TTS for speech generation and a streaming self-supervised learning (SSL) encoder for real-time audio input. LSLM fuses both channels for autoregressive generation and detects turn-taking in real time. Three fusion strategies -- early fusion, middle fusion, and late fusion -- are explored, with middle fusion achieving an optimal balance between speech generation and real-time interaction. Two experimental settings, command-based FDM and voice-based FDM, demonstrate LSLM's robustness to noise and sensitivity to diverse instructions. Our results highlight LSLM's capability to achieve duplex communication with minimal impact on existing systems. This study aims to advance the development of interactive speech dialogue systems, enhancing their applicability in real-world contexts.
Tweet media one
0
2
16
@ddlbojack
Ziyang Ma
7 months
Chinese Tiny LLM was accepted by 1st conference of COLM. Congrats!
@arankomatsuzaki
Aran Komatsuzaki
10 months
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model - Presents CT-LLM, a 2B LLM - Open-sourcing the full process of training, including a detailed data processing procedure hf: abs:
Tweet media one
0
0
0
@ddlbojack
Ziyang Ma
7 months
RT @TONGYI_SpeechAI: The Tongyi Speech Team has open-sourced two foundational speech models: SenseVoice and CosyVoice. 😄SenseVoice, a mult…
0
68
0
@ddlbojack
Ziyang Ma
7 months
RT @Thom_Wolf: The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irre…
0
367
0
@ddlbojack
Ziyang Ma
8 months
RT @dr_cintas: Luma has released a new feature that connects start and end keyframes for more AI video control. Look at these 10 wild exam…
0
484
0
@ddlbojack
Ziyang Ma
8 months
RT @billyuchenlin: M-A-P/Neo-7B-Instruct is the 1st 💎fully-open💎 LLM on WildBench leaderboard and its performance is awesome. "Fully open-…
0
18
0
@ddlbojack
Ziyang Ma
8 months
@PuyuanPeng actually there are three(at least) and another one from PSU
1
0
1
@ddlbojack
Ziyang Ma
8 months
so cool as a man @jiatongshi
@jiatongshi
jiatongshi
8 months
This is the most enjoyable work I’ve done for Interspeech! We enhanced original data from my wife Kiki through ACE Studio, and our data appeared in this year’s SVDD and VoiceMOS challenges 😁 Although KiSing’s songs are niche, hope it becomes the LJSpeech of singing in research🤔
0
0
3