Wei Ping Profile Banner
Wei Ping Profile
Wei Ping

@_weiping

Followers
1,993
Following
261
Media
9
Statuses
222

Principal research scientist @NVIDIA . Working hard on building LLMs and multimodal LLMs. Views my own.

San Francisco, CA
Joined June 2020
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@_weiping
Wei Ping
17 days
Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only
Tweet media one
11
112
439
@_weiping
Wei Ping
5 months
Introducing ChatQA-1.5, a family of models that surpasses GPT-4-0613 and Command-R-Plus on RAG and conversational QA. ChatQA-1.5 has two variants: Llama3-ChatQA-1.5-8B, Llama3-ChatQA-1.5-70B, We also open source our instruction
14
137
628
@_weiping
Wei Ping
2 years
🔥Our BigVGAN is a universal audio synthesizer!🔥 Trained only on speech data but shows extraordinary zero-shot generalization for non-speech vocalizations (laughter, applaud), singing voices, music, instrumental audio that are even recorded in varied noisy environment!
@_akhaliq
AK
2 years
BigVGAN: A Universal Neural Vocoder with Large-Scale Training abs: project page: sota zero-shot performance for various out-of-distribution scenarios (new speakers, novel languages, singing voices, music/instrumental audio)
Tweet media one
2
43
168
7
39
209
@_weiping
Wei Ping
2 months
Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities. The long-context capability of LLMs is sometimes viewed as a rival to
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
42
156
@_weiping
Wei Ping
3 months
Introducing RankRAG, a novel RAG framework that instruction-tunes a single LLM for the dual purposes of top-k context ranking and answer generation in RAG. For context ranking, it performs exceptionally well by incorporating a small fraction of ranking data into the training
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
39
150
@_weiping
Wei Ping
2 years
BigVGAN is accepted at ICLR 2023. Listen audio samples: A universal audio synthesis model, trained on speech only, works for out-of-distribution scenarios, e.g., unseen singing voices and music audio! Code and models are released!
Tweet media one
0
34
127
@_weiping
Wei Ping
1 year
🔥Introducing IntructRetro 48B, which largely outperforms the instruction tuned GPT🔥 Surprisingly, we find that one can ablate the encoder from the Retro architecture and directly use it as a GPT decoder, while still obtaining comparable results on zero-shot QA tasks!
@omarsar0
elvis
1 year
Instruction Tuning the Largest Pretrained Retrieval-Augmented LLM This exciting new paper from NVIDIA introduces Retro 48B, the largest LLM pretrained with retrieval. Continues pretraining a 43B parameter GPT model on additional 100B tokens by retrieving from 1.2T tokens (using
Tweet media one
4
94
396
2
22
118
@_weiping
Wei Ping
3 years
📢 The source code and models are now available at:
@_akhaliq
AK
3 years
Long-Short Transformer: Efficient Transformers for Language and Vision pdf: abs: On ImageNet, sota results (e.g., Top-1 accuracy 84.1% trained on 224 × 224 ImageNet-1K only), while being more scalable on high-resolution images
Tweet media one
1
58
288
0
35
108
@_weiping
Wei Ping
3 years
Our model also achieves SOTA results on NLP tasks, including long range arena and character-level language modeling. The models and code will be released soon!!
@_akhaliq
AK
3 years
Long-Short Transformer: Efficient Transformers for Language and Vision pdf: abs: On ImageNet, sota results (e.g., Top-1 accuracy 84.1% trained on 224 × 224 ImageNet-1K only), while being more scalable on high-resolution images
Tweet media one
1
58
288
0
17
105
@_weiping
Wei Ping
4 years
We release DiffWave, a versatile diffusion model for conditional & unconditional audio generation. It readily matches the SOTA neural vocoder in terms of quality. More interestingly, it can generate abundant realistic voices in time-domain without any conditional information!
@ArxivSound
arXiv Sound
4 years
``DiffWave: A Versatile Diffusion Model for Audio Synthesis. (arXiv:2009.09761v1 []),'' Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro,
0
3
17
2
39
102
@_weiping
Wei Ping
9 months
ChatQA can outperform GPT-4 on a wide range of conversational QA tasks: - ChatQA and GPT-4 take the same top-5 chunks from our best retriever, when long documents are involved. - ChatQA performs very well on tabular data, arithmetic calculation, and “unanswerable” cases!
@_akhaliq
AK
9 months
Nvidia presents ChatQA Building GPT-4 Level Conversational QA Models paper page: introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning
Tweet media one
6
98
485
1
21
88
@_weiping
Wei Ping
4 months
Introducing NV-Embed, a generalist embedding model that ranks No. 1 on the MTEB Benchmark, which includes 56 diverse tasks, using only publicly available data. Notably, our model also achieves the highest score of 59.36 on 15 retrieval tasks within this benchmark. NV-Embed
1
19
85
@_weiping
Wei Ping
2 years
No matter what people think for image synthesis, GAN firmly holds the SOTA for speech/audio: 😉
@sedielem
Sander Dieleman
2 years
Rumours of GANs' demise have been greatly exaggerated, part 2
3
5
80
2
7
73
@_weiping
Wei Ping
24 days
The model checkpoints and instruction-tuning data are now available!
@_akhaliq
AK
2 months
Nvidia presents ChatQA 2 Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-
Tweet media one
3
62
259
0
9
46
@_weiping
Wei Ping
4 years
Accepted as an oral presentation at #ICLR2021
@_weiping
Wei Ping
4 years
We release DiffWave, a versatile diffusion model for conditional & unconditional audio generation. It readily matches the SOTA neural vocoder in terms of quality. More interestingly, it can generate abundant realistic voices in time-domain without any conditional information!
2
39
102
0
5
42
@_weiping
Wei Ping
4 months
The model checkpoint has been uploaded. Enjoy!
@_weiping
Wei Ping
4 months
Introducing NV-Embed, a generalist embedding model that ranks No. 1 on the MTEB Benchmark, which includes 56 diverse tasks, using only publicly available data. Notably, our model also achieves the highest score of 59.36 on 15 retrieval tasks within this benchmark. NV-Embed
1
19
85
1
5
40
@_weiping
Wei Ping
1 month
Our NV-Embed-v2, has achieved a record-breaking score of 72.31 across 56 text embedding / retrieval tasks, reclaiming the top 1 on the Massive Text Embedding Benchmark (MTEB) leaderboard! It also holds the No. 1 in the retrieval sub-category (15 tasks) in the leaderboard, which
0
7
40
@_weiping
Wei Ping
2 years
📢 Language models simply generate way more factual text using our techniques: 🔥 i) Factual-nucleus sampling -- outperforms top-p by a large margin! 🔥 ii) Continued training with TopicPrefix & sentence completion loss -- much more effective than next-token-prediction loss!
@_akhaliq
AK
2 years
Factuality Enhanced Language Models for Open-Ended Text Generation abs: factual-nucleus sampling improves generation factuality at inference, combination of sentence completion loss & TOPICPREFIX pre-processing improves factuality with continued training
Tweet media one
0
0
31
0
7
40
@_weiping
Wei Ping
2 months
Long Context and RAG : Better Together
@_akhaliq
AK
2 months
Nvidia presents ChatQA 2 Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-
Tweet media one
3
62
259
1
7
39
@_weiping
Wei Ping
1 year
🔥Retrieval meets Long Context LLMs🔥 We demonstrate that LLaMA2-70B with simple retrieval augmentation outperforms its 32K long context base model and GPT-3.5-turbo-16k in terms of average score on 7 long context tasks!
@omarsar0
elvis
1 year
Retrieval meets Long Context LLMs This is an important and timely paper investigating two important trends in LLMs: RAGs and long-context LLMs. It compares retrieval augmentation and long-context windows for downstream tasks. Also investigates if the methods can be combined to
Tweet media one
1
145
602
0
10
37
@_weiping
Wei Ping
2 years
I will be giving a talk tomorrow at GTC 2023 on Retrieval meets Large Language Model !! This talk covers the application for open-domain QA, and 🔥Re-ViLM🔥: our latest retrieval-augmented visual language model Welcome to join:
1
10
34
@_weiping
Wei Ping
1 year
🔥We should pretrain LLM with retrieval🔥 We find RETRO outperforms GPT on generation(less repetition, better factuality, lower toxicity) and knowledge-intensive task. Our RETRO++, largely improves original RETRO(54.1 vs. 45.5 on NQ) and GPT w/ retrieval(50.9) on open-domain QA.
@arankomatsuzaki
Aran Komatsuzaki
1 year
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study Presents RETRO++, which significantly outperforms retrieval-augmented GPT across different model sizes. repo: abs:
Tweet media one
5
72
327
1
6
31
@_weiping
Wei Ping
4 years
Have started our ICML poster session on Non-Autoregressive Neural Text-to-Speech. Please join if you are interested! Session link: Paper:
Tweet media one
0
5
28
@_weiping
Wei Ping
4 years
Have started our ICML poster session on "WaveFlow: A Compact Flow-based Model for Raw Audio". Please join us if you are interested! Session link: Paper: Code:
Tweet media one
Tweet media two
0
2
25
@_weiping
Wei Ping
3 years
📢Interesting findings: 1) Fine-tuning on self-generated data is more efficient to detoxify LM, as it mitigates the exposure bias 2) Large LM has the same toxicity level as smaller one 3) Training on adapter layers achieves a much better trade-off between toxicity & perplexity
@_akhaliq
AK
3 years
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models abs:
Tweet media one
0
19
59
0
9
25
@_weiping
Wei Ping
4 years
📢📢📢 Our unconditional DiffWave can do zero-shot speech denoising now!! YES, unconditional generation of speech is useful as in language and vision! Check it out in Section V on our demo website:
@_weiping
Wei Ping
4 years
We release DiffWave, a versatile diffusion model for conditional & unconditional audio generation. It readily matches the SOTA neural vocoder in terms of quality. More interestingly, it can generate abundant realistic voices in time-domain without any conditional information!
2
39
102
0
7
18
@_weiping
Wei Ping
2 years
🔥BigVGAN rocks🔥
@L0SG
Sang-gil Lee (L0SG)
2 years
🔥BigVGAN demo with @Gradio is now live on @huggingface Spaces!🤗 BigVGAN is a universal waveform synthesizer with strong zero-shot robustness for every sound of the world, which is built towards generative AI for audio 🔊 👉
0
16
73
0
2
17
@_weiping
Wei Ping
3 years
Want to share our ICLR oral paper DiffWave: A versatile diffusion model for audio synthesis It introduced a fast sampling method for diffusion model (trained with 200 diffusion step, sampling with only 6 step) Project: Talk:
0
5
15
@_weiping
Wei Ping
4 years
To appear at ICML 2020! Another paper from my team @BaiduResearch
@arxiv_cscl
arXiv CS-CL
4 years
Non-Autoregressive Neural Text-to-Speech
0
0
1
1
3
12
@_weiping
Wei Ping
1 year
Clearly, researchers/engineers who are building LLMs, need such therapy first. A lot is going on & no work-life balance😂
@ilyasut
Ilya Sutskever
1 year
In the future, once the robustness of our models will exceed some threshold, we will have *wildly effective* and dirt cheap AI therapy. Will lead to a radical improvement in people’s experience of life. One of the applications I’m most eagerly awaiting.
283
233
2K
0
0
13
@_weiping
Wei Ping
5 months
Welcome to our ICLR poster session on Friday, May 10th, from 4:30 to 6:30 PM. Peng Xu ( @PengXu51108979 ) will present our findings on "RAG meets Long Context LLMs!" Our RAG-enhanced long-context LLM outperforms both the long-context baseline Llama2-70B-32k, and GPT-3.5-turbo-16k
1
1
13
@_weiping
Wei Ping
3 months
🔥 BigVGAN-v2 offers top audio fidelity, fast CUDA inference, and commercial checkpoints! 🤯
@L0SG
Sang-gil Lee (L0SG)
3 months
🚀 BigVGAN-v2 is here! It is our latest update of the universal vocoder to benefit all audio generative AI. It features: 🎵 State-of-the-art audio quality ⚡ Custom CUDA kernel with fast inference speed 🔊 New commercial-friendly checkpoints up to 44kHz
4
37
156
0
5
12
@_weiping
Wei Ping
9 days
Thank you, Nando! We are really glad you enjoyed our paper, especially after the relentless hard work and dedication we put into it. 😃
@NandoDF
Nando de Freitas
9 days
The NVLM paper is outstanding. It is full of remarkable findings: (1) "dataset quality and task diversity are more important than scale", (2) positive transfer from multimodal datasets to text-only on math benchmarks, (3) model and data ablations, etc. Congrats to the authors on
3
19
220
0
0
14
@_weiping
Wei Ping
24 days
We are excited to release ChatQA-2 (and its training data!), 128K long-context models that also have exceptional RAG capabilities for efficient inference or to handle inputs significantly longer than 128K tokens. The ChatQA-2 70B model outperforms GPT-4-Turbo-2024-04-09,
Tweet media one
0
9
12
@_weiping
Wei Ping
2 years
We release our code and model at:
@_akhaliq
AK
3 years
Speech Denoising in the Waveform Domain with Self-Attention abs: project page:
Tweet media one
0
13
76
0
2
11
@_weiping
Wei Ping
6 months
Can't wait to test its RAG capability 👏
@CohereForAI
Cohere For AI
6 months
Announcing C4AI Command R+ open weights, a state-of-the-art 104B LLM with RAG, tooling and multilingual in 10 languages.  This release builds on our 35B and is a part of our commitment to make AI breakthroughs accessible to the research community. 🎉
Tweet media one
2
61
231
0
1
11
@_weiping
Wei Ping
17 days
We shared valuable insights on how to build cutting-edge multimodal LLMs, covering aspects such as architectural design, data curation, tagging high-resolution image tiles, and pushing toward state-of-the-art results in vision-language tasks, all while maintaining or even
@_weiping
Wei Ping
17 days
Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only
Tweet media one
11
112
439
0
1
11
@_weiping
Wei Ping
6 months
This is why it's important to value the publication of negative results. The Megatron-LM paper investigates the scalability of the BERT model, as detailed in Tables 4 and 5. It empirically demonstrates that accuracies for downstream tasks tend to plateau.
@srush_nlp
Sasha Rush
6 months
Lazy twitter: A common question in NLP class is "if xBERT worked well, why didn't people make it bigger?" but I realize I just don't know the answer. I assume people tried but that a lot of that is unpublished. Is the theory that denoising gets too easy for big models?
46
41
481
1
0
11
@_weiping
Wei Ping
4 years
We use hyperlinks from online conversations as a source of knowledge, augmenting dialog history to create higher quality text generation. The output from our 8.3B parameter model is difficult to distinguish from true conversation in human evaluations!
@_weiping
Wei Ping
4 years
Introducing Local Knowledge Powered Conversational Agents Paper: By @sazoo_nlp @wpingnet @TheRealRPuri @MohammadShoeybi @MostofaPatwary @ctnzr
0
4
13
0
3
10
@_weiping
Wei Ping
6 months
Knowing the right thing to do but not adhering to it makes us no different from those who are clueless, in the eyes of the world.
0
0
8
@_weiping
Wei Ping
6 months
Skip pre-training and go directly to SFT.
@thegautamkamath
Gautam Kamath
6 months
NeurIPS 2024 will have a track for papers from high schoolers.
Tweet media one
78
90
599
0
1
8
@_weiping
Wei Ping
5 months
@JefferyTatsuya Yes, the maximum context is 8k for now. If your documents are longer than 8k, you can use the uploaded retriever to obtain the top-5 or top-10 retrieved contexts, which have proven to be highly effective.
1
1
6
@_weiping
Wei Ping
3 months
As an open model, DeepSeek-Coder-V2 by @deepseek_ai is truly amazing at coding and math. I really enjoyed reading their series of reports, which provide extensive technical detail 🫡
@lmsysorg
lmsys.org
3 months
[Chatbot Arena Update] We are excited to launch Math Arena and Instruction-Following (IF) Arena! Math/IF are the two key domains testing models’ logical skills & real-world tasks. Key findings: - Stats: 500K IF votes (35%), 180K Math votes (13%) - Claude 3.5 Sonnet is now #1
Tweet media one
8
87
425
0
0
6
@_weiping
Wei Ping
2 years
I tend to think both RL and Gumbel trick can work well if they are well executed w/ a reward function trained on massive human annotations. However, straight through/Gumbel trick give less noisy but biased gradient, so RL could win in the long run (my guess)
@zdhnarsil
Dinghuai Zhang 张鼎怀
2 years
I don't understand why "RLHF" even needs RL? The reward function is a learned neural network and thus white-box. This means we could simply use straight through estimater (or Gumbel trick) to obtain a much better gradient. (context: my understanding is from InstructGPT paper)
27
75
760
0
0
6
@_weiping
Wei Ping
6 months
Flowers will eventually bloom as people work hard and maintain great patience. 静待花开终有时. 🫡
0
0
5
@_weiping
Wei Ping
2 years
NO forms! Apache license!👍
@YiTayML
Yi Tay
2 years
New open source Flan-UL2 20B checkpoints :) - Truly open source 😎 No forms! 🤭 Apache license 🔥 - Best OS model on MMLU/Big-Bench hard 🤩 - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog:
51
343
2K
0
0
5
@_weiping
Wei Ping
6 months
@ravithejads
Ravi Theja
9 months
💡ChatQA: Building GPT-4 Level Conversational QA Models Builds ChatQA conversational QA models, that obtain OpenAI GPT-4 level accuracies. 🌟 Key Contributions: 1️⃣ Two-Stage Instruction Tuning Method: - Stage 1: Supervised Fine-Tuning on a mix of instruction-following and
Tweet media one
3
15
63
0
0
4
@_weiping
Wei Ping
6 months
8x22b, mixtral strikes again 🤯
@MistralAI
Mistral AI
6 months
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%%3A1337%2Fannounce&tr=http%3A%2F%%3A1337%2Fannounce
272
821
6K
0
0
3
@_weiping
Wei Ping
4 years
@lucidrains 1) Architecture. DiffWave uses compact WaveNet-like architecture to support both conditional & uncond generation. WaveGrad uses Upsample and Downsample blocks for mel-spectrogram inputs. 2) In addition to neural vocoding, we tackle the challenging unconditional generation task.
0
0
3
@_weiping
Wei Ping
2 years
ChatGPT vs. its retrieval-augmented version? 😄
@DrJimFan
Jim Fan
2 years
This is exactly why I like ChatGPT much more than Bing/Sydney. ChatGPT tells the truth. 🤣
Tweet media one
Tweet media two
22
17
353
0
0
3
@_weiping
Wei Ping
1 year
This can make us feel better: Teaching LLM to say “I don’t know” when it want to hallucinate, is quite a bit of work too
@ProfFeynman
Prof. Feynman
1 year
Illusion of knowledge is more dangerous than ignorance: It's Okay to say "I don't know" and admit that you don't know it. It's shameful to pretend that you know everything.
90
1K
5K
0
1
3
@_weiping
Wei Ping
2 months
@_akhaliq Thanks for sharing our work! @_akhaliq
0
0
2
@_weiping
Wei Ping
2 years
Link of the recorded talk:
0
0
0
@_weiping
Wei Ping
4 years
@sedielem I like this table! It helps me to sort out all these work👍
0
0
2
@_weiping
Wei Ping
3 months
@AkariAsai Looking forward to it! We retrieved 1.2T tokens in InstructRetro and are eager to see the results of further scaling. Do you have a timeline :)
1
0
1
@_weiping
Wei Ping
16 days
@Yuchenj_UW @DrJimFan We didn't find the checkpoint of Qwen2 VL 72B when we are tying to evaluate and compare last month. It seems it was just released now:
1
0
1
@_weiping
Wei Ping
4 years
@TomKenter @manish1765 Is the preprint available? Can't wait :)
2
0
1
@_weiping
Wei Ping
3 months
1
0
1
@_weiping
Wei Ping
4 years
@TechRonic9876 @BaiduResearch The audio samples are in:
0
0
1
@_weiping
Wei Ping
6 months
@srush_nlp BERT is trained using a denoising auto-encoding objective. I vaguely recall there being some established connections between denoising auto-encoding and lossy compression e.g.,
1
0
1
@_weiping
Wei Ping
6 months
@Francis_YAO_ Not to mention, some open weights models are released with benchmark numbers but without prompts😅
0
0
1
@_weiping
Wei Ping
6 months
@WenhuChen good design!
1
0
1
@_weiping
Wei Ping
6 months
@jefffhj Congratulations, Dr. Huang!
1
0
1
@_weiping
Wei Ping
4 years
@heiga_zen Same here! I really like the theme of the story: family. My son is only 2 yrs old. I guess I still need wait a couple of years before he could enjoy the movie :)
0
0
1
@_weiping
Wei Ping
7 months
@mkamp query and context encoder are initialized by an embedding model, e.g., E5 or Dragon retriever, then fine-tuned on conversational QA dataset. This is separated from LLM instruction tuning.
0
0
1
@_weiping
Wei Ping
6 months
@xiangyue96 interesting work!
0
0
1
@_weiping
Wei Ping
2 years
@r9y9 Congratulations!!
1
0
1
@_weiping
Wei Ping
6 months
@sharan0909 can't wait to read the research paper
0
0
1
@_weiping
Wei Ping
16 days
0
0
1
@_weiping
Wei Ping
3 months
@huybery Thanks for sharing👍
0
0
1
@_weiping
Wei Ping
4 years
@r9y9 I see. 論文 could be paper or dissertation. Google always translate "論文を書" to "write a dissertation". This is the case that NMT need to infer the intention behind text :D
0
0
1
@_weiping
Wei Ping
6 months
0
0
1
@_weiping
Wei Ping
4 months
@raulkite_ @JagersbergKnut we didn't test multilingual performance. Will work on it
0
0
1
@_weiping
Wei Ping
4 years
@heiga_zen 鬼滅の刃 is one of my favorite anime. I will watch it with my son when he grow up😀
1
0
1
@_weiping
Wei Ping
2 months
@aaron_lou congrats Aaron!
1
0
1
@_weiping
Wei Ping
2 years
@heiga_zen Thank you Heiga!
0
0
1