Shitian Zhao @zst96687522 profile

Shitian Zhao

@zst96687522

Followers

472

Following

627

Statuses

386

Looking for CS PhD position in 2025Fall. Researcher @ Shanghai AI Lab @opengvlab Bachelor @ ECNU @ECNUER Previous Intern @ CCVL @JohnsHopkins

shanghai, China

Joined April 2021

Don't wanna be here? Send us removal request.

Shitian Zhao

@zst96687522

6 months

Thanks AK for posting our work!

AK

@_akhaliq

6 months

Lumina-mGPT Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining paper page: We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. Unlike existing autoregressive image generation approaches, Lumina-mGPT employs a pretrained decoder-only transformer as a unified framework for modeling multimodal token sequences. Our key insight is that a simple decoder-only transformer with multimodal Generative PreTraining (mGPT), utilizing the next-token prediction objective on massive interleaved text-image sequences, can learn broad and general multimodal capabilities, thereby illuminating photorealistic text-to-image generation. Building on these pretrained models, we propose Flexible Progressive Supervised Finetuning (FP-SFT) on high-quality image-text pairs to fully unlock their potential for high-aesthetic image synthesis at any resolution while maintaining their general multimodal capabilities. Furthermore, we introduce Ominiponent Supervised Finetuning (Omni-SFT), transforming Lumina-mGPT into a foundation model that seamlessly achieves omnipotent task unification. The resulting model demonstrates versatile multimodal capabilities, including visual generation tasks like flexible text-to-image generation and controllable generation, visual recognition tasks like segmentation and depth estimation, and vision-language tasks like multiturn visual question answering. Additionally, we analyze the differences and similarities between diffusion-based and autoregressive methods in a direct comparison.

0

6

Shitian Zhao

@zst96687522

1 hour

RT @DanHendrycks: We're releasing EnigmaEval, a collection of long, complex reasoning challenges that take groups of people many hours or d…

0

115

0

Shitian Zhao

@zst96687522

12 hours

RT @_jasonwei: We do not rise the power of our RL optimization algorithms—we fall to the hackability of our RL environment

0

21

0

Shitian Zhao

@zst96687522

2 days

RT @largemodelgame: [1/N] LLM evaluations can be done while you are playing live computer games. 🤯 We are excited to announce our game: AI…

0

19

0

Shitian Zhao

@zst96687522

2 days

RT @_akhaliq: ReasonFlux Hierarchical LLM Reasoning via Scaling Thought Templates

0

27

0

Shitian Zhao

@zst96687522

2 days

RT @lockonlvange: Introducing CodeI/O (, a systematic way to condense diverse reasoning patterns via code input-out…

0

48

0

Shitian Zhao

@zst96687522

2 days

RT @PointsCoder: Can Vision-Language Models (VLMs) truly understand the physical world? 🌍🔬 Introducing PhysBench – the first benchmark to…

0

71

0

Shitian Zhao

@zst96687522

2 days

RT @aclmentorship: 📢 Join us for the ACL Mentorship Session on Zoom! Session Link: Questions:

0

11

0

Shitian Zhao

@zst96687522

4 days

RT @_akhaliq: DeepScaleR-1.5B-Preview a open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning…

0

132

0

Shitian Zhao

@zst96687522

13 days

RT @SharonYixuanLi: How should we assign rewards to intermediate steps in reasoning? DeepSeek-R1 paper highlights it as an open challenge.…

0

59

0

Shitian Zhao

@zst96687522

13 days

RT @GaoyueZhou: Can we extend the power of world models beyond just online model-based learning? Absolutely! We believe the true potential…

0

100

0

Shitian Zhao

@zst96687522

14 days

Tülu3 gives so much details of Verifiable Reward RL in their technique report. You can't miss it.

0

2

Shitian Zhao

@zst96687522

14 days

The Real-Time "Canvas"

Intelligent Internet

@ii_posts

15 days

DeepSeek R1 is great. How do humans think with reasoning machines like R1? CoT-Lab represents @ii_posts's latest exploration at UBAI into cognitive partnership — where human intuition and AI reasoning become co-evolving thought partners. We enable collaboration with reasoning models: An interface to guide humans through AI's reasoning flow, actively reshape thought trajectories, and collectively elevate cognitive outcomes. True intelligence emerges not from artificial or human minds — but through their symbiotic dance. We're exploring the choreography for this new cognitive ballet. Like skilled dance partners, human and machine refine each other's moves.

0

Shitian Zhao

@zst96687522

14 days

RT @ideogram_ai: The Ideogram Text Tool is here. Add text, choose fonts, and customize colors. All within Ideogram Canvas. Premium graphic…

0

42

0

Shitian Zhao

@zst96687522

14 days

RT @allen_ai: Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of t…

0

391

0

Shitian Zhao

@zst96687522

15 days

RT @shaneguML: Iterated synthetic data + filtering + distillation is RL. Check for the data quality to avoid reward hacking. It's called re…

0

11

0

Shitian Zhao

@zst96687522

15 days

RT @Alibaba_Qwen: Announcing Qwen2.5-VL Cookbooks! 🧑‍🍳A collection of notebooks showcasing use cases of Qwen2.5-VL, include local model a…

0

525

0

Shitian Zhao

@zst96687522

16 days

RT @karpathy: For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments t…

0

835

0

Shitian Zhao

@zst96687522

16 days

RT @alsuhr: Check out our SWE-Gym RL environment: @jiayi_pirate @xingyaow_ @hengjinlp @YizheZhangNLP @gneubig

0

10

0

Shitian Zhao

@zst96687522

16 days

@bboczeng 难绷

0