Qinqing Zheng @qqyuzu profile

Qinqing Zheng

@qqyuzu

Followers

472

Following

45

Statuses

47

Reinforcement Learning, Generative Modeling @ FAIR (@AIatMeta). PhD @UChicago.

New York, NY

Joined February 2022

Don't wanna be here? Send us removal request.

Qinqing Zheng

@qqyuzu

4 months

Introducing Dualformer: a new model that integrates fast and slow thinking! By learning with randomized reasoning traces, Dualformer offers both quick response and enhanced performance with more succinct CoTs. w/ Andy Mike @tesatory @tydsh

5

28

174

Qinqing Zheng

@qqyuzu

7 days

Hanlin's twitter account: @zhuhl98

0

Qinqing Zheng

@qqyuzu

7 days

RT @_akhaliq: Token Assorted Mixing Latent and Text Tokens for Improved Language Model Reasoning

0

16

0

Qinqing Zheng

@qqyuzu

2 months

RT @HenaffMikael: We share our code - excited to see what people build with this! Many thanks to @qqyuzu @adityagrover_ @yayitsamyzhang @br…

0

1

0

Qinqing Zheng

@qqyuzu

2 months

ONI offers concurrent policy training & reward synthesizing, a good fit for long horizon sparse reward problems! I also believe its great potential to be extended to multimodal inputs and complex planning/reasoning environments!

Brandon Amos

@brandondamos

2 months

🤔 How to extract knowledge from LLMs to train better RL agents? 📚 Our new paper (with @qqyuzu @HenaffMikael @yayitsamyzhang @adityagrover_ ) studies LLM-driven rewards for NetHack! Paper: Code:

0

2

15

Qinqing Zheng

@qqyuzu

4 months

RT @sirbayes: Excited to share our new paper on "Diffusion Model Predictive Control" (D-MPC). Key idea: leverage diffusion models to learn…

0

72

0

Qinqing Zheng

@qqyuzu

4 months

RT @tydsh: 🚀🎯Dualformer, our simple yet novel training paradigm that leads to 1️⃣ Emergent behaviors of automatic switching between system…

0

41

0

Qinqing Zheng

@qqyuzu

8 months

RT @tungnd_13: Introducing LICO, a new black-box optimizer for arbitrary domains (including non-textual) based on large language models. A…

0

15

0

Qinqing Zheng

@qqyuzu

1 year

These works, including ours, are motivated from various perspectives and have derived different resulting algorithms. The blossom of works applying DM for world modeling (most above works appear in the last 3 months), shows the great potential of this research direction. (8/n)

1

0

2

Qinqing Zheng

@qqyuzu

1 year

Our method is on par with SOTA methods, eliminating the performance gap between model-based and model-free methods for offline RL. [4/n]

0

2

Qinqing Zheng

@qqyuzu

1 year

Takeaway: Sequence modeling tools like Diffusion model and Transformer are better alternatives of one-step dynamics models, and Diffusion model is even better than Transformer due to the elimination of the autoregressive structure. [3/n]

0

4

Qinqing Zheng

@qqyuzu

1 year

In particular, we propose Diffusion Model Based Value Expansion, using DWM generated sequences to facilitate value estimation. [3/n]

1

0

3