![Dong Zhang Profile](https://pbs.twimg.com/profile_images/1761418287651713024/H86swle6_x96.jpg)
Dong Zhang
@dongzha35524835
Followers
521
Following
612
Statuses
64
MS Student at FudanNLP Lab @FudanUniv | Developing SpeechGPT-Series
Joined September 2022
π₯ Introducing SpeechGPT 2.0-preview: A GPT-4O-level, real-time spoken dialogue system! (Currently supporting Chinese only, English will be soon.) π Highlights: Real-time speech-to-speech dialogue with latency under 200ms Rich in emotion and diverse in style, with strong speech style generalization Strong role-playing capabilities π€οΈ Try it out: Online system: Github: More demos:
6
29
134
RT @Open_MOSS: π₯³ Introducing SpeechGPT 2.0-preview: A GPT-4o-level, real-time spoken dialogue system! (Only Chinese for now) π Highlights:β¦
0
10
0
We introduce a semantic-acoustic joint modeling ultra-low bitrate streaming speech codec and Codec-Patchify based speech-text LLM architecture, which is proved effective to reduce the modality gap between speech and text sequences. Through the experimental process, we also observed many interesting phenomena and conclusions. For example, through extensive pre-training on speech-text alignment, we found that the model could "emerge" with the ability to generalize speech styles. This includes controlling speech rate even without training on dialogue data with explicit speech rate adjustments, and adopting tones and styles of characters that the model had never seen before.
0
0
14
Happy to share that our SpeechAlign, which applies RLHF to speech LLM, has been accepted by #NeurIPS2024. Many voice agents have emerged recently, but almost none of them consider SpeechLLM post-training. Letβs explore more in this direction! Arxiv:
6
15
115
@hingeloss @andersonbcdefg We conducted some analysis in out SpeechTokenizer and SpeechGPT-Gen paper. You can also refer to
0
0
1
Thrilled to see Moshi draw inspiration from our SpeechTokenizer and SpeechGPT!π Honored to contribute to advancing the spoken dialogue field!π Check more about our works about end2end spoken dialouge on
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 𧡠β¬οΈ Paper: Repo: HuggingFace:
0
0
34
Off to Bangkok to attend #ACL2024. Glad to have a chat w/ folks interested in End2end speech2speech dialogue chatbot.
0
0
26