Ziyang Ma @ddlbojack profile

Ziyang Ma

@ddlbojack

Followers

242

Following

68

Statuses

57

PhD Candidate SJTU X-LANCE Lab | Focus on speech, language, audio and music processing | Ex @MSFTResearch NLC Group @AlibabaGroup Tongyi SpeechAI

Joined September 2022

Don't wanna be here? Send us removal request.

Ziyang Ma

@ddlbojack

2 months

500 Citation Millstone 🥳

5

2

80

Ziyang Ma

@ddlbojack

4 months

RT @reach_vb: Let's goo! F5-TTS 🔊 > Trained on 100K hours of data > Zero-shot voice cloning > Speed control (based on total duration) > Em…

0

242

0

Ziyang Ma

@ddlbojack

4 months

Marriage of BERT and LLaMA 😂

Vaibhav Adlakha

@vaibhav_adlakha

4 months

A little teaser for LLM2Vec @COLM_conf! Stop by Tuesday morning poster session to know how we officiated the marriage of BERTs and Llamas! 🦙

0

4

Ziyang Ma

@ddlbojack

4 months

RT @WilliamWangNLP: BREAKING: Taylor Swift's Eras Tour just did what AI couldn’t—pushed NeurIPS by a whole day! 🤖 🤣🤣🤣 #NeurIPS 2024 Confer…

0

55

0

Ziyang Ma

@ddlbojack

5 months

@FeitengLi 太硬了，feiteng大佬整顿职场😂

1

0

Ziyang Ma

@ddlbojack

5 months

RT @karpathy: It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They…

0

1K

0

Ziyang Ma

@ddlbojack

6 months

Glad that I will go to Kos, Greece🇬🇷 for #Interspeech2024 in person. We have 2 papers at oral sessions and 2 at poster sessions. Drop by if you are interested at SSL, LLM, emotion, and real-time interaction & generation!

0

14

Ziyang Ma

@ddlbojack

6 months

RT @omarsar0: Foundation Models for Music Provides a comprehensive overview of state-of-the-art pre-trained models and foundation models i…

0

116

0

Ziyang Ma

@ddlbojack

6 months

@hemingkx @TONGYI_SpeechAI See you!

0

Ziyang Ma

@ddlbojack

6 months

RT @WenhuChen: I love simple yet effective things. However, reviewers never agree with me on that.

0

16

0

Ziyang Ma

@ddlbojack

6 months

RT @arankomatsuzaki: Language Model Can Listen While Speaking Explores full duplex modeling in interactive speech LMs, focusing on enhanci…

0

76

0

Ziyang Ma

@ddlbojack

6 months

Check our listening-while-speaking language model (LSLM), pushing interactive speech language model (iSLM) a step forward!

AK

@_akhaliq

6 months

Language Model Can Listen While Speaking Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisfactory. To address these limitations, we explore full duplex modeling (FDM) in interactive speech language models (iSLM), focusing on enhancing real-time interaction and, more explicitly, exploring the quintessential ability of interruption. We introduce a novel model design, namely listening-while-speaking language model (LSLM), an end-to-end system equipped with both listening and speaking channels. Our LSLM employs a token-based decoder-only TTS for speech generation and a streaming self-supervised learning (SSL) encoder for real-time audio input. LSLM fuses both channels for autoregressive generation and detects turn-taking in real time. Three fusion strategies -- early fusion, middle fusion, and late fusion -- are explored, with middle fusion achieving an optimal balance between speech generation and real-time interaction. Two experimental settings, command-based FDM and voice-based FDM, demonstrate LSLM's robustness to noise and sensitivity to diverse instructions. Our results highlight LSLM's capability to achieve duplex communication with minimal impact on existing systems. This study aims to advance the development of interactive speech dialogue systems, enhancing their applicability in real-world contexts.

0

2

16

Ziyang Ma

@ddlbojack

7 months

Chinese Tiny LLM was accepted by 1st conference of COLM. Congrats!

Aran Komatsuzaki

@arankomatsuzaki

10 months

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model - Presents CT-LLM, a 2B LLM - Open-sourcing the full process of training, including a detailed data processing procedure hf: abs:

0

Ziyang Ma

@ddlbojack

7 months

RT @TONGYI_SpeechAI: The Tongyi Speech Team has open-sourced two foundational speech models: SenseVoice and CosyVoice. 😄SenseVoice, a mult…

0

68

0

Ziyang Ma

@ddlbojack

7 months

RT @Thom_Wolf: The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irre…

0

367

0

Ziyang Ma

@ddlbojack

8 months

RT @dr_cintas: Luma has released a new feature that connects start and end keyframes for more AI video control. Look at these 10 wild exam…

0

484

0

Ziyang Ma

@ddlbojack

8 months

RT @billyuchenlin: M-A-P/Neo-7B-Instruct is the 1st 💎fully-open💎 LLM on WildBench leaderboard and its performance is awesome. "Fully open-…

0

18

0

Ziyang Ma

@ddlbojack

8 months

@PuyuanPeng actually there are three(at least) and another one from PSU

1

0

1

Ziyang Ma

@ddlbojack

8 months

so cool as a man @jiatongshi

jiatongshi

@jiatongshi

8 months

This is the most enjoyable work I’ve done for Interspeech! We enhanced original data from my wife Kiki through ACE Studio, and our data appeared in this year’s SVDD and VoiceMOS challenges 😁 Although KiSing’s songs are niche, hope it becomes the LJSpeech of singing in research🤔

0

3