Wei Ping @_weiping profile

Wei Ping

@_weiping

Followers

1,993

Following

261

Media

9

Statuses

222

Principal research scientist @NVIDIA . Working hard on building LLMs and multimodal LLMs. Views my own.

https://t.co/ZO9fS3Un4J

San Francisco, CA

Joined June 2020

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

FEMA • 1498003 Tweets

نابولي • 642248 Tweets

2NE1 • 331219 Tweets

Joker • 172258 Tweets

天使の日 • 132117 Tweets

Grammys • 100633 Tweets

Mancuso • 99639 Tweets

Song of the Year • 63697 Tweets

名探偵ピカチュウ • 56792 Tweets

Record of the Year • 55487 Tweets

Best Record • 54059 Tweets

Ülke • 52989 Tweets

Pogba • 46942 Tweets

AFFAIR EP6 • 45005 Tweets

Ocho • 37543 Tweets

Bielsa • 37442 Tweets

Trump Bible • 37003 Tweets

Uyuşturucu • 31611 Tweets

HAPPY 2nd BIRTHDAY SMILEY • 25314 Tweets

Celtics • 16699 Tweets

Jamie Dimon • 13478 Tweets

菅野メジャー • 12502 Tweets

Semih Çelik • 11979 Tweets

#اوقفوا_الاباده • 10121 Tweets

بوغبا

Payton Pritchard

İdam

مشروع راس الحكمه

Westbrook

Neres

Piroe

McTominay

Lukaku

$LUNA

Sunderland

Nico Paz

James Harrison

Tencent

Bold Prediction

Markstrom

HASSAS İÇERİK

The Raven

Manics

Sabres

Ashcroft

Edgar Allen Poe

#ترقبوا_شتاء_اقمشه_vision

#ikbaluzuner

#اارفع_ترند_θち966ち8931

#ميثا_في_معرض_المراه_العربيه

Last Seen Profiles

@Sabah49315420

@psych_chek

@Katheri41565708

@justinliang1020

@arabia_sexx

@Penburukonten

@imsimplistic

@18content_arab

@Katheri41565708

@HunterdonBrew

@minter_wute

@CaliDream251

@Hipotens66

@OzzyTKing

@doubletwinspops

@jaaaypeezy

@cavlity

@bomsd_

@504_logo

@spacEARTHbender

Pinned Tweet

Wei Ping

@_weiping

17 days

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only

11

112

439

Wei Ping

@_weiping

5 months

Introducing ChatQA-1.5, a family of models that surpasses GPT-4-0613 and Command-R-Plus on RAG and conversational QA. ChatQA-1.5 has two variants: Llama3-ChatQA-1.5-8B, Llama3-ChatQA-1.5-70B, We also open source our instruction

nvidia/Llama3-ChatQA-1.5-70B · Hugging Face

huggingface.co

14

137

628

Wei Ping

@_weiping

2 years

🔥Our BigVGAN is a universal audio synthesizer!🔥 Trained only on speech data but shows extraordinary zero-shot generalization for non-speech vocalizations (laughter, applaud), singing voices, music, instrumental audio that are even recorded in varied noisy environment!

AK

@_akhaliq

2 years

BigVGAN: A Universal Neural Vocoder with Large-Scale Training abs: project page: sota zero-shot performance for various out-of-distribution scenarios (new speakers, novel languages, singing voices, music/instrumental audio)

2

43

168

7

39

209

Wei Ping

@_weiping

2 months

Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leading proprietary models like GPT-4-Turbo in both long-context and RAG capabilities. The long-context capability of LLMs is sometimes viewed as a rival to

3

42

156

Wei Ping

@_weiping

3 months

Introducing RankRAG, a novel RAG framework that instruction-tunes a single LLM for the dual purposes of top-k context ranking and answer generation in RAG. For context ranking, it performs exceptionally well by incorporating a small fraction of ranking data into the training

2

39

150

Wei Ping

@_weiping

2 years

BigVGAN is accepted at ICLR 2023. Listen audio samples: A universal audio synthesis model, trained on speech only, works for out-of-distribution scenarios, e.g., unseen singing voices and music audio! Code and models are released!

0

34

127

Wei Ping

@_weiping

1 year

🔥Introducing IntructRetro 48B, which largely outperforms the instruction tuned GPT🔥 Surprisingly, we find that one can ablate the encoder from the Retro architecture and directly use it as a GPT decoder, while still obtaining comparable results on zero-shot QA tasks!

elvis

@omarsar0

1 year

Instruction Tuning the Largest Pretrained Retrieval-Augmented LLM This exciting new paper from NVIDIA introduces Retro 48B, the largest LLM pretrained with retrieval. Continues pretraining a 43B parameter GPT model on additional 100B tokens by retrieving from 1.2T tokens (using

4

94

396

2

22

118

Wei Ping

@_weiping

3 years

📢 The source code and models are now available at:

GitHub - NVIDIA/transformer-ls: Official PyTorch Implementation of Long-Short Transformer (NeurIPS...

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021). - NVIDIA/transformer-ls

github.com

AK

@_akhaliq

3 years

Long-Short Transformer: Efficient Transformers for Language and Vision pdf: abs: On ImageNet, sota results (e.g., Top-1 accuracy 84.1% trained on 224 × 224 ImageNet-1K only), while being more scalable on high-resolution images

1

58

288

0

35

108

Wei Ping

@_weiping

3 years

Our model also achieves SOTA results on NLP tasks, including long range arena and character-level language modeling. The models and code will be released soon!!

AK

@_akhaliq

3 years

Long-Short Transformer: Efficient Transformers for Language and Vision pdf: abs: On ImageNet, sota results (e.g., Top-1 accuracy 84.1% trained on 224 × 224 ImageNet-1K only), while being more scalable on high-resolution images

1

58

288

0

17

105

Wei Ping

@_weiping

4 years

We release DiffWave, a versatile diffusion model for conditional & unconditional audio generation. It readily matches the SOTA neural vocoder in terms of quality. More interestingly, it can generate abundant realistic voices in time-domain without any conditional information!

arXiv Sound

@ArxivSound

4 years

``DiffWave: A Versatile Diffusion Model for Audio Synthesis. (arXiv:2009.09761v1 []),'' Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro,

0

3

17

2

39

102

Wei Ping

@_weiping

9 months

ChatQA can outperform GPT-4 on a wide range of conversational QA tasks: - ChatQA and GPT-4 take the same top-5 chunks from our best retriever, when long documents are involved. - ChatQA performs very well on tabular data, arithmetic calculation, and “unanswerable” cases!

AK

@_akhaliq

9 months

Nvidia presents ChatQA Building GPT-4 Level Conversational QA Models paper page: introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning

6

98

485

1

21

88

Wei Ping

@_weiping

4 months

Introducing NV-Embed, a generalist embedding model that ranks No. 1 on the MTEB Benchmark, which includes 56 diverse tasks, using only publicly available data. Notably, our model also achieves the highest score of 59.36 on 15 retrieval tasks within this benchmark. NV-Embed

1

19

85

Wei Ping

@_weiping

2 years

No matter what people think for image synthesis, GAN firmly holds the SOTA for speech/audio: 😉

Sander Dieleman

@sedielem

2 years

Rumours of GANs' demise have been greatly exaggerated, part 2

3

5

80

2

7

73

Wei Ping

@_weiping

24 days

The model checkpoints and instruction-tuning data are now available!

nvidia/Llama3-ChatQA-2-70B · Hugging Face

huggingface.co

AK

@_akhaliq

2 months

Nvidia presents ChatQA 2 Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-

3

62

259

0

9

46

Wei Ping

@_weiping

4 years

Accepted as an oral presentation at #ICLR2021

Wei Ping

@_weiping

4 years

We release DiffWave, a versatile diffusion model for conditional & unconditional audio generation. It readily matches the SOTA neural vocoder in terms of quality. More interestingly, it can generate abundant realistic voices in time-domain without any conditional information!

2

39

102

0

5

42

Wei Ping

@_weiping

4 months

The model checkpoint has been uploaded. Enjoy!

nvidia/NV-Embed-v1 at main

huggingface.co

Wei Ping

@_weiping

4 months

Introducing NV-Embed, a generalist embedding model that ranks No. 1 on the MTEB Benchmark, which includes 56 diverse tasks, using only publicly available data. Notably, our model also achieves the highest score of 59.36 on 15 retrieval tasks within this benchmark. NV-Embed

1

19

85

1

5

40

Wei Ping

@_weiping

1 month

Our NV-Embed-v2, has achieved a record-breaking score of 72.31 across 56 text embedding / retrieval tasks, reclaiming the top 1 on the Massive Text Embedding Benchmark (MTEB) leaderboard! It also holds the No. 1 in the retrieval sub-category (15 tasks) in the leaderboard, which

0

7

40

Wei Ping

@_weiping

2 years

📢 Language models simply generate way more factual text using our techniques: 🔥 i) Factual-nucleus sampling -- outperforms top-p by a large margin! 🔥 ii) Continued training with TopicPrefix & sentence completion loss -- much more effective than next-token-prediction loss!

AK

@_akhaliq

2 years

Factuality Enhanced Language Models for Open-Ended Text Generation abs: factual-nucleus sampling improves generation factuality at inference, combination of sentence completion loss & TOPICPREFIX pre-processing improves factuality with continued training

0

31

0

7

40

Wei Ping

@_weiping

2 months

Long Context and RAG : Better Together

AK

@_akhaliq

2 months

Nvidia presents ChatQA 2 Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-

3

62

259

1

7

39

Wei Ping

@_weiping

1 year

🔥Retrieval meets Long Context LLMs🔥 We demonstrate that LLaMA2-70B with simple retrieval augmentation outperforms its 32K long context base model and GPT-3.5-turbo-16k in terms of average score on 7 long context tasks!

elvis

@omarsar0

1 year

Retrieval meets Long Context LLMs This is an important and timely paper investigating two important trends in LLMs: RAGs and long-context LLMs. It compares retrieval augmentation and long-context windows for downstream tasks. Also investigates if the methods can be combined to

1

145

602

0

10

37

Wei Ping

@_weiping

2 years

I will be giving a talk tomorrow at GTC 2023 on Retrieval meets Large Language Model !! This talk covers the application for open-domain QA, and 🔥Re-ViLM🔥: our latest retrieval-augmented visual language model Welcome to join:

1

10

34

Wei Ping

@_weiping

1 year

🔥We should pretrain LLM with retrieval🔥 We find RETRO outperforms GPT on generation(less repetition, better factuality, lower toxicity) and knowledge-intensive task. Our RETRO++, largely improves original RETRO(54.1 vs. 45.5 on NQ) and GPT w/ retrieval(50.9) on open-domain QA.

Aran Komatsuzaki

@arankomatsuzaki

1 year

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study Presents RETRO++, which significantly outperforms retrieval-augmented GPT across different model sizes. repo: abs:

5

72

327

1

6

31

Wei Ping

@_weiping

4 years

Have started our ICML poster session on Non-Autoregressive Neural Text-to-Speech. Please join if you are interested! Session link: Paper:

0

5

28

Wei Ping

@_weiping

4 years

Have started our ICML poster session on "WaveFlow: A Compact Flow-based Model for Raw Audio". Please join us if you are interested! Session link: Paper: Code:

0

2

25

Wei Ping

@_weiping

3 years

📢Interesting findings: 1) Fine-tuning on self-generated data is more efficient to detoxify LM, as it mitigates the exposure bias 2) Large LM has the same toxicity level as smaller one 3) Training on adapter layers achieves a much better trade-off between toxicity & perplexity

AK

@_akhaliq

3 years

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models abs:

0

19

59

0

9

25

Wei Ping

@_weiping

4 years

To appear at #icml2020 . Code and pre-trained models are released at:

GitHub - PaddlePaddle/Parakeet: PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2,...

PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) - GitHub - PaddlePaddle/Parakeet: PAddle PARAlle...

github.com

arXiv CS-CL

@arxiv_cscl

4 years

WaveFlow: A Compact Flow-based Model for Raw Audio

0

8

28

0

3

19

Wei Ping

@_weiping

4 years

📢📢📢 Our unconditional DiffWave can do zero-shot speech denoising now!! YES, unconditional generation of speech is useful as in language and vision! Check it out in Section V on our demo website:

Wei Ping

@_weiping

4 years

We release DiffWave, a versatile diffusion model for conditional & unconditional audio generation. It readily matches the SOTA neural vocoder in terms of quality. More interestingly, it can generate abundant realistic voices in time-domain without any conditional information!

2

39

102

0

7

18

Wei Ping

@_weiping

2 years

🔥BigVGAN rocks🔥

Sang-gil Lee (L0SG)

@L0SG

2 years

🔥BigVGAN demo with @Gradio is now live on @huggingface Spaces!🤗 BigVGAN is a universal waveform synthesizer with strong zero-shot robustness for every sound of the world, which is built towards generative AI for audio 🔊 👉

0

16

73

0

2

17

Wei Ping

@_weiping

3 years

Want to share our ICLR oral paper DiffWave: A versatile diffusion model for audio synthesis It introduced a fast sampling method for diffusion model (trained with 200 diffusion step, sampling with only 6 step) Project: Talk:

0

5

15

Wei Ping

@_weiping

4 years

To appear at ICML 2020! Another paper from my team @BaiduResearch

arXiv CS-CL

@arxiv_cscl

4 years

Non-Autoregressive Neural Text-to-Speech

0

1

3

12

Wei Ping

@_weiping

1 year

Clearly, researchers/engineers who are building LLMs, need such therapy first. A lot is going on & no work-life balance😂

Ilya Sutskever

@ilyasut

1 year

In the future, once the robustness of our models will exceed some threshold, we will have *wildly effective* and dirt cheap AI therapy. Will lead to a radical improvement in people’s experience of life. One of the applications I’m most eagerly awaiting.

283

233

2K

0

13

Wei Ping

@_weiping

4 years

Introducing Local Knowledge Powered Conversational Agents Paper: By @sazoo_nlp @wpingnet @TheRealRPuri @MohammadShoeybi @MostofaPatwary @ctnzr

Local Knowledge Powered Conversational Agents

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models. However, even with these advancements, conversational...

arxiv.org

0

4

13

Wei Ping

@_weiping

5 months

Welcome to our ICLR poster session on Friday, May 10th, from 4:30 to 6:30 PM. Peng Xu ( @PengXu51108979 ) will present our findings on "RAG meets Long Context LLMs!" Our RAG-enhanced long-context LLM outperforms both the long-context baseline Llama2-70B-32k, and GPT-3.5-turbo-16k

1

13

Wei Ping

@_weiping

3 months

🔥 BigVGAN-v2 offers top audio fidelity, fast CUDA inference, and commercial checkpoints! 🤯

Sang-gil Lee (L0SG)

@L0SG

3 months

🚀 BigVGAN-v2 is here! It is our latest update of the universal vocoder to benefit all audio generative AI. It features: 🎵 State-of-the-art audio quality ⚡ Custom CUDA kernel with fast inference speed 🔊 New commercial-friendly checkpoints up to 44kHz

4

37

156

0

5

12

Wei Ping

@_weiping

9 days

Thank you, Nando! We are really glad you enjoyed our paper, especially after the relentless hard work and dedication we put into it. 😃

Nando de Freitas

@NandoDF

9 days

The NVLM paper is outstanding. It is full of remarkable findings: (1) "dataset quality and task diversity are more important than scale", (2) positive transfer from multimodal datasets to text-only on math benchmarks, (3) model and data ablations, etc. Congrats to the authors on

3

19

220

0

14

Wei Ping

@_weiping

24 days

We are excited to release ChatQA-2 (and its training data!), 128K long-context models that also have exceptional RAG capabilities for efficient inference or to handle inputs significantly longer than 128K tokens. The ChatQA-2 70B model outperforms GPT-4-Turbo-2024-04-09,

0

9

12

Wei Ping

@_weiping

2 years

We release our code and model at:

GitHub - NVIDIA/CleanUNet: Official PyTorch Implementation of CleanUNet (ICASSP 2022)

Official PyTorch Implementation of CleanUNet (ICASSP 2022) - NVIDIA/CleanUNet

github.com

AK

@_akhaliq

3 years

Speech Denoising in the Waveform Domain with Self-Attention abs: project page:

0

13

76

0

2

11

Wei Ping

@_weiping

6 months

Can't wait to test its RAG capability 👏

Cohere For AI

@CohereForAI

6 months

Announcing C4AI Command R+ open weights, a state-of-the-art 104B LLM with RAG, tooling and multilingual in 10 languages. This release builds on our 35B and is a part of our commitment to make AI breakthroughs accessible to the research community. 🎉

2

61

231

0

1

11

Wei Ping

@_weiping

17 days

We shared valuable insights on how to build cutting-edge multimodal LLMs, covering aspects such as architectural design, data curation, tagging high-resolution image tiles, and pushing toward state-of-the-art results in vision-language tasks, all while maintaining or even

Wei Ping

@_weiping

17 days

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only

11

112

439

0

1

11

Wei Ping

@_weiping

6 months

This is why it's important to value the publication of negative results. The Megatron-LM paper investigates the scalability of the BERT model, as detailed in Tables 4 and 5. It empirically demonstrates that accuracies for downstream tasks tend to plateau.

Sasha Rush

@srush_nlp

6 months

Lazy twitter: A common question in NLP class is "if xBERT worked well, why didn't people make it bigger?" but I realize I just don't know the answer. I assume people tried but that a lot of that is unpublished. Is the theory that denoising gets too easy for big models?

46

41

481

1

0

11

Wei Ping

@_weiping

4 years

We use hyperlinks from online conversations as a source of knowledge, augmenting dialog history to create higher quality text generation. The output from our 8.3B parameter model is difficult to distinguish from true conversation in human evaluations!

Wei Ping

@_weiping

4 years

Introducing Local Knowledge Powered Conversational Agents Paper: By @sazoo_nlp @wpingnet @TheRealRPuri @MohammadShoeybi @MostofaPatwary @ctnzr

0

4

13

0

3

10

Wei Ping

@_weiping

6 months

Knowing the right thing to do but not adhering to it makes us no different from those who are clueless, in the eyes of the world.

0

8

Wei Ping

@_weiping

6 months

Skip pre-training and go directly to SFT.

Gautam Kamath

@thegautamkamath

6 months

NeurIPS 2024 will have a track for papers from high schoolers.

78

90

599

0

1

8

Wei Ping

@_weiping

4 years

@YoavR7 I have already moved to Nvidia, so I could not make the decision. As far as I know, Baidu has a tentative plan to open source ParaNet in Parakeet toolkit. You can watch on this repo:

GitHub - PaddlePaddle/Parakeet: PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2,...

PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) - GitHub - PaddlePaddle/Parakeet: PAddle PARAlle...

github.com

0

1

6

Wei Ping

@_weiping

5 months

@JefferyTatsuya Yes, the maximum context is 8k for now. If your documents are longer than 8k, you can use the uploaded retriever to obtain the top-5 or top-10 retrieved contexts, which have proven to be highly effective.

nvidia/dragon-multiturn-query-encoder · Hugging Face

huggingface.co

1

6

Wei Ping

@_weiping

3 months

As an open model, DeepSeek-Coder-V2 by @deepseek_ai is truly amazing at coding and math. I really enjoyed reading their series of reports, which provide extensive technical detail 🫡

lmsys.org

@lmsysorg

3 months

[Chatbot Arena Update] We are excited to launch Math Arena and Instruction-Following (IF) Arena! Math/IF are the two key domains testing models’ logical skills & real-world tasks. Key findings: - Stats: 500K IF votes (35%), 180K Math votes (13%) - Claude 3.5 Sonnet is now #1

8

87

425

0

6

Wei Ping

@_weiping

2 years

I tend to think both RL and Gumbel trick can work well if they are well executed w/ a reward function trained on massive human annotations. However, straight through/Gumbel trick give less noisy but biased gradient, so RL could win in the long run (my guess)

Dinghuai Zhang 张鼎怀

@zdhnarsil

2 years

I don't understand why "RLHF" even needs RL? The reward function is a learned neural network and thus white-box. This means we could simply use straight through estimater (or Gumbel trick) to obtain a much better gradient. (context: my understanding is from InstructGPT paper)

27

75

760

0

6

Wei Ping

@_weiping

6 months

Flowers will eventually bloom as people work hard and maintain great patience. 静待花开终有时. 🫡

0

5

Wei Ping

@_weiping

2 years

NO forms! Apache license!👍

Yi Tay

@YiTayML

2 years

New open source Flan-UL2 20B checkpoints :) - Truly open source 😎 No forms! 🤭 Apache license 🔥 - Best OS model on MMLU/Big-Bench hard 🤩 - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog:

51

343

2K

0

5

Wei Ping

@_weiping

6 months

Thanks! @ravithejads

Ravi Theja

@ravithejads

9 months

💡ChatQA: Building GPT-4 Level Conversational QA Models Builds ChatQA conversational QA models, that obtain OpenAI GPT-4 level accuracies. 🌟 Key Contributions: 1️⃣ Two-Stage Instruction Tuning Method: - Stage 1: Supervised Fine-Tuning on a mix of instruction-following and

3

15

63

0

4

Wei Ping

@_weiping

6 months

8x22b, mixtral strikes again 🤯

Mistral AI

@MistralAI

6 months

magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%%3A1337%2Fannounce&tr=http%3A%2F%%3A1337%2Fannounce

272

821

6K

0

3

Wei Ping

@_weiping

4 years

@lucidrains 1) Architecture. DiffWave uses compact WaveNet-like architecture to support both conditional & uncond generation. WaveGrad uses Upsample and Downsample blocks for mel-spectrogram inputs. 2) In addition to neural vocoding, we tackle the challenging unconditional generation task.

0

3

Wei Ping

@_weiping

2 years

ChatGPT vs. its retrieval-augmented version? 😄

Jim Fan

@DrJimFan

2 years

This is exactly why I like ChatGPT much more than Bing/Sydney. ChatGPT tells the truth. 🤣

22

17

353

0

3

Wei Ping

@_weiping

1 year

This can make us feel better: Teaching LLM to say “I don’t know” when it want to hallucinate, is quite a bit of work too

Prof. Feynman

@ProfFeynman

1 year

Illusion of knowledge is more dangerous than ignorance: It's Okay to say "I don't know" and admit that you don't know it. It's shameful to pretend that you know everything.

90

1K

5K

0

1

3

Wei Ping

@_weiping

5 months

@yin_hongxu Congrats @jilin_14 @yin_hongxu !

1

0

2

Wei Ping

@_weiping

2 months

@_akhaliq Thanks for sharing our work! @_akhaliq

0

2

Wei Ping

@_weiping

2 years

Link of the recorded talk:

0

Wei Ping

@_weiping

4 years

@sedielem I like this table! It helps me to sort out all these work👍

0

2

Wei Ping

@_weiping

3 months

@AkariAsai Looking forward to it! We retrieved 1.2T tokens in InstructRetro and are eager to see the results of further scaling. Do you have a timeline :)

1

0

1

Wei Ping

@_weiping

16 days

@Yuchenj_UW @DrJimFan We didn't find the checkpoint of Qwen2 VL 72B when we are tying to evaluate and compare last month. It seems it was just released now:

Qwen/Qwen2-VL-72B-Instruct at main

huggingface.co

1

0

1

Wei Ping

@_weiping

5 months

@AiBeginners @_akhaliq Here we go:

nvidia/Llama3-ChatQA-1.5-8B · Hugging Face

huggingface.co

0

1

Wei Ping

@_weiping

4 years

@TomKenter @manish1765 Is the preprint available? Can't wait :)

2

0

1

Wei Ping

@_weiping

3 months

@AkariAsai 👍

1

0

1

Wei Ping

@_weiping

4 years

@TechRonic9876 @BaiduResearch The audio samples are in:

0

1

Wei Ping

@_weiping

6 months

@srush_nlp BERT is trained using a denoising auto-encoding objective. I vaguely recall there being some established connections between denoising auto-encoding and lossy compression e.g.,

1

0

1

Wei Ping

@_weiping

6 months

@Francis_YAO_ Not to mention, some open weights models are released with benchmark numbers but without prompts😅

0

1

Wei Ping

@_weiping

6 months

@WenhuChen good design!

1

0

1

Wei Ping

@_weiping

6 months

@jefffhj Congratulations, Dr. Huang!

1

0

1

Wei Ping

@_weiping

4 years

@heiga_zen Same here! I really like the theme of the story: family. My son is only 2 yrs old. I guess I still need wait a couple of years before he could enjoy the movie :)

0

1

Wei Ping

@_weiping

7 months

@mkamp query and context encoder are initialized by an embedding model, e.g., E5 or Dragon retriever, then fine-tuned on conversational QA dataset. This is separated from LLM instruction tuning.

0

1

Wei Ping

@_weiping

6 months

@xiangyue96 interesting work!

0

1

Wei Ping

@_weiping

2 years

@r9y9 Congratulations!!

1

0

1

Wei Ping

@_weiping

6 months

@sharan0909 can't wait to read the research paper

0

1

Wei Ping

@_weiping

16 days

@markopolojarvi @_akhaliq Thanks :)

0

1

Wei Ping

@_weiping

3 months

@huybery Thanks for sharing👍

0

1

Wei Ping

@_weiping

4 years

@r9y9 I see. 論文 could be paper or dissertation. Google always translate "論文を書" to "write a dissertation". This is the case that NMT need to infer the intention behind text :D

0

1

Wei Ping

@_weiping

6 months

@sandeep1337 @huggingface Congrats Sandeep!

0

1

Wei Ping

@_weiping

4 months

@raulkite_ @JagersbergKnut we didn't test multilingual performance. Will work on it

0

1

Wei Ping

@_weiping

4 years

@heiga_zen 鬼滅の刃 is one of my favorite anime. I will watch it with my son when he grow up😀

1

0

1

Wei Ping

@_weiping

2 months

@aaron_lou congrats Aaron!

1

0

1

Wei Ping

@_weiping

2 years

@heiga_zen Thank you Heiga!

0

1