![Xipeng Qiu Profile](https://pbs.twimg.com/profile_images/1080649914118373376/NXfCYuhs_x96.jpg)
Xipeng Qiu
@xpqiu
Followers
294
Following
52
Statuses
20
Natural Language Processing Machine Learning
Shanghai
Joined April 2013
🥳 Introducing SpeechGPT 2.0-preview: A GPT-4o-level, real-time spoken dialogue system! (Only Chinese for now) 🎆 Highlights: ~⚡️ Real-time speech-to-speech dialogue with latency under 200ms ~😊 Rich in emotion and diverse in style, with strong speech style generalization ~🦁 Strong role-playing capabilities 🤖️ Try it out: Online system: Github: More demos:
0
1
5
RT @TMarczew: Unraveling the Mystery of OpenAI's o1 #o1 #ReinforcementLearning #ReverseEngineering #AI #LLM A new paper, "Scaling of Sear…
0
1
0
RT @simulately12492: #SimulatelyPapers | December 25, 2024 📄 VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulat…
0
1
0
RT @yinzhangyue: A Technical Roadmap of o1 from a Reinforcement Learning Perspective Arxiv Link:
0
4
0
AnyGPT: The Any-to-Any Multimodal LLM - Audio, Text, and Image! Each modality is a different foreign language.
AnyGPT Unified Multimodal LLM with Discrete Sequence Modeling introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages. We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equipping the model to handle arbitrary combinations of multimodal inputs and outputs. Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model.
0
2
20
RT @jure: Sharing slides from my @textgraphs @NAACLHLT workshop keynote: Reasoning with Language and Knowledge Graphs
0
44
0
RT @mohitban47: VALUE = "Video-And-Language Understanding Evaluation"! Strong effort led by @linjiefun & @jielei for this fun collaboration…
0
4
0
RT @WilliamWangNLP: The most comprehensive survey I’ve seen on pretrained language models for #NLProc: https://t.co…
0
122
0
@MSFTResearch @murefil @iclr2019 Congrats! But the basic idea of this paper looks like one of our papers at EMNLP 2016 "Cached Long Short-Term Memory Neural Networks" But the citation is missing.
0
0
3
Congrats! But the basic idea of this paper looks like one of our papers at EMNLP 2016 "Cached Long Short-Term Memory Neural Networks" But the citation is missing.
We're excited to announce that Yikang Shen, Shawn Tan, Alessandro Sordoni @murefil and Aaron Courville received the Best Paper Award at @ICLR2019. Discover their work on Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks: #ICLR2019
0
0
2