zst96687522 Profile Banner
Shitian Zhao Profile
Shitian Zhao

@zst96687522

Followers
472
Following
627
Statuses
386

Looking for CS PhD position in 2025Fall. Researcher @ Shanghai AI Lab @opengvlab Bachelor @ ECNU @ECNUER Previous Intern @ CCVL @JohnsHopkins

shanghai, China
Joined April 2021
Don't wanna be here? Send us removal request.
@zst96687522
Shitian Zhao
6 months
Thanks AK for posting our work!
@_akhaliq
AK
6 months
Lumina-mGPT Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining paper page: We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. Unlike existing autoregressive image generation approaches, Lumina-mGPT employs a pretrained decoder-only transformer as a unified framework for modeling multimodal token sequences. Our key insight is that a simple decoder-only transformer with multimodal Generative PreTraining (mGPT), utilizing the next-token prediction objective on massive interleaved text-image sequences, can learn broad and general multimodal capabilities, thereby illuminating photorealistic text-to-image generation. Building on these pretrained models, we propose Flexible Progressive Supervised Finetuning (FP-SFT) on high-quality image-text pairs to fully unlock their potential for high-aesthetic image synthesis at any resolution while maintaining their general multimodal capabilities. Furthermore, we introduce Ominiponent Supervised Finetuning (Omni-SFT), transforming Lumina-mGPT into a foundation model that seamlessly achieves omnipotent task unification. The resulting model demonstrates versatile multimodal capabilities, including visual generation tasks like flexible text-to-image generation and controllable generation, visual recognition tasks like segmentation and depth estimation, and vision-language tasks like multiturn visual question answering. Additionally, we analyze the differences and similarities between diffusion-based and autoregressive methods in a direct comparison.
Tweet media one
0
0
6
@zst96687522
Shitian Zhao
1 hour
RT @DanHendrycks: We're releasing EnigmaEval, a collection of long, complex reasoning challenges that take groups of people many hours or d…
0
115
0
@zst96687522
Shitian Zhao
12 hours
RT @_jasonwei: We do not rise the power of our RL optimization algorithms—we fall to the hackability of our RL environment
0
21
0
@zst96687522
Shitian Zhao
2 days
RT @largemodelgame: [1/N] LLM evaluations can be done while you are playing live computer games. 🤯 We are excited to announce our game: AI…
0
19
0
@zst96687522
Shitian Zhao
2 days
RT @_akhaliq: ReasonFlux Hierarchical LLM Reasoning via Scaling Thought Templates
Tweet media one
0
27
0
@zst96687522
Shitian Zhao
2 days
RT @lockonlvange: Introducing CodeI/O (, a systematic way to condense diverse reasoning patterns via code input-out…
0
48
0
@zst96687522
Shitian Zhao
2 days
RT @PointsCoder: Can Vision-Language Models (VLMs) truly understand the physical world? 🌍🔬 Introducing PhysBench – the first benchmark to…
0
71
0
@zst96687522
Shitian Zhao
2 days
RT @aclmentorship: 📢 Join us for the ACL Mentorship Session on Zoom! Session Link: Questions:
0
11
0
@zst96687522
Shitian Zhao
4 days
RT @_akhaliq: DeepScaleR-1.5B-Preview a open-source, 1.5B-parameter model trained with RL to surpass o1-preview for general math reasoning…
0
132
0
@zst96687522
Shitian Zhao
13 days
RT @SharonYixuanLi: How should we assign rewards to intermediate steps in reasoning? DeepSeek-R1 paper highlights it as an open challenge.…
0
59
0
@zst96687522
Shitian Zhao
13 days
RT @GaoyueZhou: Can we extend the power of world models beyond just online model-based learning? Absolutely! We believe the true potential…
0
100
0
@zst96687522
Shitian Zhao
14 days
Tülu3 gives so much details of Verifiable Reward RL in their technique report. You can't miss it.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
2
@zst96687522
Shitian Zhao
14 days
The Real-Time "Canvas"
@ii_posts
Intelligent Internet
15 days
DeepSeek R1 is great. How do humans think with reasoning machines like R1? CoT-Lab represents @ii_posts's latest exploration at UBAI into cognitive partnership — where human intuition and AI reasoning become co-evolving thought partners. We enable collaboration with reasoning models: An interface to guide humans through AI's reasoning flow, actively reshape thought trajectories, and collectively elevate cognitive outcomes. True intelligence emerges not from artificial or human minds — but through their symbiotic dance. We're exploring the choreography for this new cognitive ballet. Like skilled dance partners, human and machine refine each other's moves.
0
0
0
@zst96687522
Shitian Zhao
14 days
RT @ideogram_ai: The Ideogram Text Tool is here. Add text, choose fonts, and customize colors. All within Ideogram Canvas. Premium graphic…
0
42
0
@zst96687522
Shitian Zhao
14 days
RT @allen_ai: Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of t…
0
391
0
@zst96687522
Shitian Zhao
15 days
RT @shaneguML: Iterated synthetic data + filtering + distillation is RL. Check for the data quality to avoid reward hacking. It's called re…
0
11
0
@zst96687522
Shitian Zhao
15 days
RT @Alibaba_Qwen: Announcing Qwen2.5-VL Cookbooks! 🧑‍🍳A collection of notebooks showcasing use cases of Qwen2.5-VL, include local model a…
0
525
0
@zst96687522
Shitian Zhao
16 days
RT @karpathy: For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments t…
0
835
0
@zst96687522
Shitian Zhao
16 days
RT @alsuhr: Check out our SWE-Gym RL environment: @jiayi_pirate @xingyaow_ @hengjinlp @YizheZhangNLP @gneubig
0
10
0
@zst96687522
Shitian Zhao
16 days
@bboczeng 难绷
0
0
0