Ying Shan Profile
Ying Shan

@yshan2u

Followers
1,530
Following
608
Media
190
Statuses
950

Distinguished Scientist @TencentGlobal , Founder of PCG ARC Lab, Director of AI Lab Visual Computing. Formerly @Microsoft , @MSFTResearch . Views are my own.

Joined June 2014
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@yshan2u
Ying Shan
25 days
I’m attending CVPR next week! The ARC and AI Lab teams will be presenting quite a few papers on-site, including the ones listed below. Feel free to DM me if you'd like to chat. Looking forward to engaging discussions! 🤝✨🎉
@yshan2u
Ying Shan
3 months
Our CVPR24 highlights: SmartEdit: Exploring Complex Instruction-based Image Editing with LLMs Programmable Motion Generation for Open-set Motion Control Tasks HumanGaussian: Text-Driven 3D Human Generation with GS Turns out there's no overlap with the ones listed earlier😆😆
0
4
28
1
4
34
@yshan2u
Ying Shan
3 months
Announcing Mira: A glimpse into the world of Sora, providing insights through open-sourced resources including MiraData (training samples), MiraDiT (the model), and code, all aimed at fostering collaboration and accelerating innovation in this promising field. 🩷🩷 Project Page:
@xinntao
Xintao Wang
3 months
🎉We are exploring the #Mira project~ - Built a long video dataset #MiraData with structured captions. - Trained #MiraDiT to explore the consistency in long video generation. Hope it will be a supplement to existing text-to-video methods. Project Page:
3
28
121
3
24
116
@yshan2u
Ying Shan
3 months
CustomNet demo is live: id preserved object placement with controllable viewpoints, location, and background. Feel free to give it a try!
Tweet media one
@Gradio
Gradio
3 months
🎉Tencent's CustomNet official demo is on Spaces 🌟A unified encoder-based framework for object customization in text-to-image diffusion models 🌟Incorporates 3D view synthesis capabilities 🌟Adjusts spatial positions and viewpoints 🌟Preserves object's Identity effectively
1
14
60
3
18
111
@yshan2u
Ying Shan
4 months
Excited that our PhotoMaker, YOLO-World, VideoCrafter2, SEED-Bench, DreamAvatar, EvalCrafter, &GS-IR etc. made #CVPR2024 ! A confirmation for academia, open-source, & industry unity. Big thanks to our teams & collaborators! 🎉🎉
Tweet media one
9
8
95
@yshan2u
Ying Shan
2 months
An image-to-texture dreamer, clearly explained🚀🚀
@jbhuang0604
Jia-Bin Huang
2 months
Showcasing the capabilities of TextureDreamer ( #CVPR2024 )! ... include my favorite results of transferring Rirakkuma 🧸to all types of shapes. Full explainer video:
0
20
124
0
12
75
@yshan2u
Ying Shan
1 month
Introducing Mani-GS: 3DGS editing made easy, through triangular mesh with self-adaptation🚀🚀 Project page: Paper:
@zhenjun_zhao
Zhenjun Zhao
1 month
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, @yshan2u , Long Quan tl;dr: use a triangular mesh to manipulate 3DGS directly with self-adaptation
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
6
36
0
18
68
@yshan2u
Ying Shan
6 months
@zipengfu The hardware is definitely impressive! However, it's essential to state upfront in both the main post and video that the robot is teleoperated (operated by a human). This will prevent unrealistic expectations and avoid confusion with autonomous robots.
4
4
69
@yshan2u
Ying Shan
3 months
LiDAR to Gaussian Splatting: claimed centimeter-level accuracy for both indoor and outdoor scanning
@RadianceFields
Radiance Fields
3 months
LiDAR to Gaussian Splatting, Lixel CyberColor, from @XGRIDS2023 has been announced. ✨ LiDAR to Gaussian Splatting 📏 CM level precision 🥽 Compatible with Apple Vision Pro 🤝 Compatible with XGRIDs scanning suite 🔗
2
33
145
0
9
66
@yshan2u
Ying Shan
1 month
A new benchmark for Video MLLMs📏📐
Tweet media one
@_akhaliq
AK
1 month
Video-MME The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus
Tweet media one
2
38
121
0
17
68
@yshan2u
Ying Shan
5 months
Thanks @_akhaliq for featuring! The survey on 3D Model Generation encompasses 436 papers on the latest advancements. Hope it's helpful!🌟📷 Thanks to the team: Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, @yanpei_cao , @yshan2u
Tweet media one
@_akhaliq
AK
5 months
Advances in 3D Generation: A Survey paper page: Generating 3D models lies at the core of computer graphics and has been the focus of decades of research. With the emergence of advanced neural representations and generative models, the field of 3D content
Tweet media one
1
64
281
1
9
64
@yshan2u
Ying Shan
1 month
🚀🚀 ARC Lab is hiring Junior Researchers with: 🩷a recent Ph.D. in 2D/3D generative AI 🩷3-5+ top conference/journal papers 🩷500+ GitHub stars 🩷a "make it happen" mindset Feel free to DM me! Website: Recent work: links below
7
5
65
@yshan2u
Ying Shan
3 months
Introducing BrushNet, fills in anywhere with precision and coherence, and works with any frozen DMs (diffusion model)! Code, paper, demo available! cc: @juxuan_27 , @AlvinLiu27 , @xinntao , Yuxuan Bian, @yshan2u , Qiang Xu
Tweet media one
@xinntao
Xintao Wang
3 months
#BrushNet A plug-and-play inpainting/outpainting model Please try it out: Codes and models are released: Thanks to co-authors @juxuan_27 , @AlvinLiu27 , Yuxuan Bian, @yshan2u , Qiang Xu
1
8
41
1
14
65
@yshan2u
Ying Shan
3 months
We've just launched InstantMesh, our latest addition to the Image-to-3D family — arguably one of the best open source models to date, based on our tests😉. Feel free to try it out.🚀 CC: Jiale Xu, Weihao Cheng, Yiming Gao, @xinntao , Shenghua Gao, @yshan2u
@xinntao
Xintao Wang
3 months
#InstantMesh 🎉, an image-to-3D mesh generation method from a single image within 10 seconds. Incorporate mesh-based optimization, better training efficiency, and scalability, allowing explicit geometric supervision. Codes: Demo:
3
33
178
1
6
58
@yshan2u
Ying Shan
6 months
Thanks @_akhaliq for sharing! #PhotoMaker creates images with customized person and style in just seconds.
@_akhaliq
AK
6 months
Tencent just released PhotoMaker Customizing Realistic Human Photos via Stacked ID Embedding demo realistic: demo style:
23
187
982
5
4
50
@yshan2u
Ying Shan
5 months
Thanks @_akhaliq for sharing! This is for 3D scene editing with better control. Thanks to the team: Jingyu Zhuang, Di Kang, @yanpei_cao , Guanbin Li, Liang Lin, @yshan2u
@_akhaliq
AK
5 months
Tencent presents TIP-Editor An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts paper page: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still
2
85
391
2
8
49
@yshan2u
Ying Shan
3 months
SAM is able to simplify complex low-level vision tasks. In this case, adapting SAM to take flow as input, or use flow as segmentation prompt outperforms all previous approaches by a significant margin in both single and multi-object benchmarks.
@dreamingtulpa
Dreaming Tulpa 🥓👑
3 months
SAM + Optical Flow = FlowSAM FlowSAM can discover and segment moving objects in a video and outperforms all previous approaches by a considerable margin in both single and multi-object benchmarks 🔥
10
227
1K
0
4
47
@yshan2u
Ying Shan
6 months
Thanks to @_akhaliq for sharing! #VideoCrafter2 is NOW open-sourced - featuring improved visual quality, motion and concept combos! Feel free to try it out!🌐✨ Demo: Project Page: Thanks to the team: @haoxin_chen @Norris29973102
@_akhaliq
AK
6 months
Tencent presents VideoCrafter2 Overcoming Data Limitations for High-Quality Video Diffusion Models paper page: Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate
Tweet media one
1
30
149
0
8
46
@yshan2u
Ying Shan
6 months
Thanks to @_akhaliq for sharing. M2UGen is our first attempt to unify music understanding and generation with LLMs. #GenerativeAI
@_akhaliq
AK
6 months
Tencent and NUS release M2UGen Multi-modal Music Understanding and Generation with the Power of Large Language Models demo: The M2UGen model is a Music Understanding and Generation model that is capable of Music Question Answering and also Music
6
69
300
0
15
44
@yshan2u
Ying Shan
5 months
Cool video editing demo! Thanks @ptsi for putting this together! cc: @skalskip92 @tiahch @ge_yixiao @XinggangWang @yshan2u
@ptsi
Philipp Tsipman
5 months
So.. full video editing using AI 📹🎨 is still a bit away. But we are now closer! 🔥🔥 We just built a demo for @CamcorderAI that lets you crop (rotobrush) any object out of a video just by using your words! 😁 What do you think - should we make it into a full tool? 🧵
24
33
256
1
10
42
@yshan2u
Ying Shan
20 days
Introducing Open-MAGVIT2: an open-source effort investigating and advancing the lookup-free visual tokenizer with large codebooks. From SEED (discrete) to SEEDX (continuous), we keep exploring the frontier of MLLM and sharing our progress along the way.
Tweet media one
@ge_yixiao
Yixiao Ge
21 days
MAGVIT2 is a leading visual tokenizer, but hasn't been officially open-sourced. Existing reproductions lack complete codes and checkpoints. We did this! 🔥 We are keeping iterating the codebase and welcome collaboration on the Open-MAGVIT2 plan. 🤗
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
29
146
0
9
42
@yshan2u
Ying Shan
3 months
Run Brushnet locally on your machine with 1 click!
@cocktailpeanut
cocktail peanut
3 months
Segment and Edit Anything, on your Local Computer. The Brushnet Gradio app lets you select some points in an image to segment items, and replace them with ANYTHING you want. Pure magic. And now, run locally on your machine with 1 click. Works on all OS (Windows, Mac, Linux)
13
41
197
0
3
40
@yshan2u
Ying Shan
3 months
TIP-Editor accepted as a SIGGRAPH-2024 Journal paper: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts🎉🎉 Congrats: @JjZhuang26958 , Di Kang, @yanpei_cao , Guanbin Li, Liang Lin, @yshan2u
@_akhaliq
AK
5 months
Tencent presents TIP-Editor An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts paper page: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still
2
85
391
1
6
38
@yshan2u
Ying Shan
4 months
Study reveals Sora's stunning geometrical consistency!
@_akhaliq
AK
4 months
Sora Generates Videos with Stunning Geometrical Consistency The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there
Tweet media one
6
49
268
0
6
33
@yshan2u
Ying Shan
3 months
Thanks @_akhaliq , @liuziwei7 ! Glad to see our I2V DynamiCrafter at Top1, and t2V VideoCrafter2 at Top3 on your VBench leaderboard.
Tweet media one
@liuziwei7
Ziwei Liu
3 months
📢VBench now Supports I2V Eval📢 📊 #VBench now supports the multi-dimensional evaluation of Image-to-Video (I2V) models 🏆 #DynamiCrafter and #SVD are among the top models - Code: - Leaderboard @huggingface : . Thanks to @_akhaliq !
Tweet media one
2
13
41
0
5
33
@yshan2u
Ying Shan
6 months
🚀 A stellar 2023 at ARC Lab & AI Lab Visual Computing! Proud of our impactful work in T2IAdapter, VideoCrafter, Tune-a-video, FateZero, SEED-Bench, Dream3D, SadTalker etc. [see links], all open sourced. Big thanks to our teams and collaborators. Looking forward to an even more
0
4
34
@yshan2u
Ying Shan
1 month
Adding "parts" to 3D generation🚀🚀
@AnySyn3D
AnySyn3D
1 month
✨The rapid progress in 3D generation is impressive, but generated meshes often lack structure. We integrate *parts* into the reconstruction process, enhancing segmentation, structural distinction, and shape editing! Project Page: #SIGGRAPH2024 #AIGC
0
14
68
1
5
34
@yshan2u
Ying Shan
2 months
Text to Video is in GenAI arena! Collecting votes at:
@WenhuChen
Wenhu Chen
2 months
We are happy to integrate "text-to-video" into GenAI arena . Currently, we support six open-source video generation models. Please help us vote to create the video leaderboard! For "text-to-image" arena, Playground V2 and V2.5 @playground_ai are leading
Tweet media one
Tweet media two
3
6
52
0
5
32
@yshan2u
Ying Shan
3 months
Our CVPR24 highlights: SmartEdit: Exploring Complex Instruction-based Image Editing with LLMs Programmable Motion Generation for Open-set Motion Control Tasks HumanGaussian: Text-Driven 3D Human Generation with GS Turns out there's no overlap with the ones listed earlier😆😆
@yshan2u
Ying Shan
4 months
Excited that our PhotoMaker, YOLO-World, VideoCrafter2, SEED-Bench, DreamAvatar, EvalCrafter, &GS-IR etc. made #CVPR2024 ! A confirmation for academia, open-source, & industry unity. Big thanks to our teams & collaborators! 🎉🎉
Tweet media one
9
8
95
0
4
28
@yshan2u
Ying Shan
3 months
Thanks @_akhaliq for featuring! SEED-X is a unified MLLM designed for both real world understanding and generation tasks, with competitive results. Feel free to try it out! Project page: CC: @tttoaster_ , Sijie Zhao, Jinguo Zhu, @ge_yixiao , Kun Yi, Lin
@_akhaliq
AK
3 months
SEED-X Multimodal Models with Unified Multi-granularity Comprehension and Generation The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However,
Tweet media one
3
13
72
0
2
26
@yshan2u
Ying Shan
5 months
Thanks @_akhaliq for featuring! DynamiCrafter is a major upgrade to our image-to-video model.🚀 Echoing recent improvements in our text-to-video model, VideoCrafter2, the new model significantly improves motion, resolution, and coherence. 💡 Team credit: @Double47685693 ,
@_akhaliq
AK
5 months
DynamiCrafter Demo: model: Animating Open-domain Images with Video Diffusion Priors
9
74
314
1
2
23
@yshan2u
Ying Shan
5 months
Egocentric multimodal open dataset!
@_akhaliq
AK
5 months
Meta announces Aria Everyday Activities Dataset present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor
11
90
469
0
1
19
@yshan2u
Ying Shan
6 months
@_akhaliq
AK
6 months
Tencent released MotionCtrl for Stable Diffusion Video A Unified and Flexible Motion Controller for Video Generation demo: API docs: MotionCtrl can Independently control complex camera motion and object motion of generated
5
94
436
0
4
20
@yshan2u
Ying Shan
4 months
Multi-Concept Composition made easy!
@_akhaliq
AK
4 months
Gen4Gen Generative Data Pipeline for Generative Multi-Concept Composition Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This
5
65
241
0
2
21
@yshan2u
Ying Shan
6 months
Thanks to @_akhaliq for sharing! #VideoCrafter2 is NOW open-sourced - featuring improved visual quality, motion and concept combos! Feel free to try it out!🌐📷 Project Page: Thanks to the team: @haoxin_chen @Norris29973102 @shadocun @RichardXia101
@_akhaliq
AK
6 months
Tencent just released VideoCrafter2 demo on Hugging Face high quality text to video model demo: Overcoming Data Limitations for High-Quality Video Diffusion Models code, models and data are distributed under Apache 2.0 License
1
22
98
0
3
20
@yshan2u
Ying Shan
4 months
Thanks @_akhaliq for the update! DynamiCrafter applied to frame interpolation and looping video generation, check it out! cc: @Double47685693 , @Norris29973102 , @xinntao
@_akhaliq
AK
4 months
Tencent announces DynamiCrafter update 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐟𝐫𝐚𝐦𝐞 𝐢𝐧𝐭𝐞𝐫𝐩𝐨𝐥𝐚𝐭𝐢𝐨𝐧 and 𝐥𝐨𝐨𝐩𝐢𝐧𝐠 video generation model weights (320x512) released
2
59
308
1
3
19
@yshan2u
Ying Shan
4 months
Appreciate the insights from the paper, but starting with Sora in the title might be misleading. 🤔😊
@_akhaliq
AK
4 months
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and
Tweet media one
16
217
1K
1
0
17
@yshan2u
Ying Shan
3 months
My view of the world model, world simulator, etc., based on the original 'World Model' paper. Hoping this sheds some light on the subject, though it might cause more confusion. 😆😆
Tweet media one
2
1
18
@yshan2u
Ying Shan
4 months
Claude-3 had 20,000+ votes in three days on Arena!
@lmsysorg
lmsys.org
4 months
🔥Exciting news from Arena @Anthropic 's Claude-3 Ranking is here!📈 Claude-3 has ignited immense community interest, propelling Arena to unprecedented traffic with over 20,000 votes in just three days! We're amazed by Claude-3's extraordinary performance. Opus is making history
26
111
657
0
3
18
@yshan2u
Ying Shan
1 month
A new T2V model ranked top-1 on VBench🚀🚀
@WenhuChen
Wenhu Chen
1 month
Thrilled to work with @JiachenLi11 to release T2V-Turbo, which is a very fast yet high-quality consistency model. With only 4 diffusion steps (5 seconds), it can obtain high-quality video. T2V-Turbo currently ranks the first on VBench (), beating other
Tweet media one
3
42
169
0
7
17
@yshan2u
Ying Shan
2 months
We will be presenting ScaleCrafter (spotlight), SEED, TapMo, DragonDiffusion (spotlight), and FreeNoise at ICLR-2024. You are very welcome to come by and chat with our presenters! #ICLR2024 #iclr
Tweet media one
1
0
17
@yshan2u
Ying Shan
5 months
Thanks @_akhaliq for featuring. YOLO-World is for real-time open world detection! Thanks to the team and collaborators: Tianheng Cheng, Lin Song, @ge_yixiao , Wenyu Liu, @XinggangWang , @yshan2u
@_akhaliq
AK
5 months
Tencent presents YOLO-World Real-Time Open-Vocabulary Object Detection paper page: On LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed.
Tweet media one
3
58
284
0
2
16
@yshan2u
Ying Shan
3 months
The Keynote on 𝐏𝐡𝐨𝐭𝐨𝐫𝐞𝐚𝐥𝐢𝐬𝐭𝐢𝐜 𝐀𝐈 𝐀𝐯𝐚𝐭𝐚𝐫𝐬 was well received🎉👍
@MattNiessner
Matthias Niessner
3 months
I gave a keynote at China3DV about our research on 𝐏𝐡𝐨𝐭𝐨𝐫𝐞𝐚𝐥𝐢𝐬𝐭𝐢𝐜 𝐀𝐈 𝐀𝐯𝐚𝐭𝐚𝐫𝐬. Since many people asked, I have uploaded the slides of my talk here (PDF version):
Tweet media one
3
20
156
0
2
16
@yshan2u
Ying Shan
6 months
Thanks to @_akhaliq for sharing! EvalCrafter is our step towards tackling the challenge of video generation evaluation. It's designed to streamline the process for faster iterations, benefiting both our own development and hopefully the broader community. It's very much a work in
@_akhaliq
AK
6 months
Tencent released EvalCrafter Leaderboard on Hugging Face demo: Benchmarking and Evaluating Large Video Generation Models
Tweet media one
3
16
81
0
2
16
@yshan2u
Ying Shan
6 months
Explore the magic of #PhotoMaker by ARC! ✨ Create images with customized person and style in just seconds. 🎨 Try the Huggingface demo NOW! 🚀 Thanks to @xinntao , @zhenli1031 , and the team for making this happen, and to @osanseviero for sharing. 🙌
@xinntao
Xintao Wang
6 months
🥳 #PhotoMaker HuggingFace Gradio demo is ready. Try it out! Realistic version: Stylization version: Project Page: GitHub: Grateful to co-authors @zhenli1031 @yshan2u
6
56
229
1
1
16
@yshan2u
Ying Shan
6 months
Thrilled to witness the waves of ICLR acceptance posts! Great insights from each paper's crisp summary. Feel like folks will have a lot of fun in Vienna! @iclr_conf #ICLR2024 #ICLR 🌊📚🚀
1
0
15
@yshan2u
Ying Shan
3 months
InstantMesh in action!
@baptadn
Baptiste Adrien
3 months
Having fun with InstantMesh model on @replicate and @pmndrs react-three-rapier / @threejs 🪄 #react #r3f #threejs
15
144
1K
1
1
14
@yshan2u
Ying Shan
3 months
This is how plants move in 24hours. A great source of training data for Image-to-video models😆😆
@gunsnrosesgirl3
Science girl
3 months
This is how much plants move in 24hours
72
481
3K
2
0
14
@yshan2u
Ying Shan
2 months
A super stable, high-res flying camera that responds to your gaze and head movements!
@BetterCallMedhi
Mehdi (e/flλ)
2 months
La qualité des Drones DJI est de plus en plus impressionnante ! Ici il s’agit d’un drone FPV dernier cri où il semble que DJI a sans doute intégré la stabilisation Rocksteady et Horizon pour minimiser les vibrations de la caméra et garantir des séquences fluides même lors de
27
233
1K
0
3
13
@yshan2u
Ying Shan
2 months
As we promised, SEED-X is now open sourced with the model checkpoint, training code for instruction tuning , and newly collected data for instructional image editing! Feel free to check out this link for more details:
Tweet media one
@ge_yixiao
Yixiao Ge
2 months
Our model checkpoints, training code for instruction tuning, online demo, and newly collected data for instructional image editing have been fully open source! 🔥 Welcome to cook with SEED-X models and data. 🤗
1
5
16
2
0
13
@yshan2u
Ying Shan
5 months
Thanks @_akhaliq for sharing. YOLO-World is for real-time open world detection! Thanks to the team and collaborators: Tianheng Cheng, Lin Song, @ge_yixiao , Wenyu Liu, @XinggangWang , @yshan2u
@_akhaliq
AK
5 months
Tencent releases YOLO-World Real-Time Open-Vocabulary Object Detection demo: method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on
Tweet media one
1
65
345
0
3
13
@yshan2u
Ying Shan
3 months
Image-to-image material transfer.
@ChengTim0708
Ta-Ying Cheng
3 months
Today, with my collaborators @prafull7 (MIT CSAIL), @jampani_varun ( @StabilityAI ), and my supervisors Niki Trigoni and Andrew Markham, we share with you ZeST, a zero-shot, training free method for image-to-image material transfer! Project Page: 1/8
5
65
271
0
1
13
@yshan2u
Ying Shan
2 months
A annotation framework that produces hyper-detailed descriptions, and "performs better than GPT-4V outputs (+48%) on readability, comprehensiveness etc."
Tweet media one
@roopalgarg
Roopal Garg
2 months
📢 Excited to unveil our latest research, ImageInWords (IIW)! 🚀We're pushing the boundaries of image descriptions with a new seeded, sequential, human-in-the-loop approach producing SOTA, articulate, hyper-detailed descriptions. arXiv: 🧵1/12
Tweet media one
5
32
128
2
0
13
@yshan2u
Ying Shan
6 months
ECCV 2024 CFP.
@eccvconf
European Conference on Computer Vision #ECCV2024
6 months
Check out the #ECCV2024 call for papers:
Tweet media one
0
10
45
0
1
13
@yshan2u
Ying Shan
2 months
The latest survey on text-to-video generation. Paper:
Tweet media one
2
4
13
@yshan2u
Ying Shan
2 months
A prototype of AR glasses that is compact (without a projector-based light engine) and 3D! 🕶️✨
Tweet media one
@StanfordEng
Stanford Engineering
2 months
Stanford engineers have developed a prototype augmented reality headset that uses holographic imaging to overlay full-color, 3D moving images on the lenses of what would appear to be an ordinary pair of glasses. @stanford_ee @GordonWetzstein
0
5
17
0
2
13
@yshan2u
Ying Shan
7 months
Interesting work.
@elliottszwu
Elliott / Shangzhe Wu
7 months
🐎 Let the hooves pound! Our new method Ponymation learns a generative model of 3D articulated animal motions from raw unlabeled Internet videos. Page: Paper: Led by @skq719 & Dor Litvak, w/ @zhang_yunzhi Hongsheng Li @jiajunwu_cs
3
21
122
0
0
12
@yshan2u
Ying Shan
2 months
CVPR-2024 Seattle is around corner!
@CVPR
#CVPR2024
2 months
#CVPR2024 Seattle is just around the corner. What’s there to do in Seattle? Any favourite coffee shops? Share your favourites in the thread.
2
5
60
0
0
12
@yshan2u
Ying Shan
6 months
Thanks to @camendur for sharing!
@camenduru
camenduru
6 months
🎞 FreeNoise + AnimateDiff 🔥 FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling 🐰 Colab 🥳 Thanks to Haonan Qiu ❤ Menghan Xia ❤ Yong Zhang ❤ Yingqing He ❤ @xinntao @yshan2u @liuziwei7 ❤ 🌐page: 📄paper:
1
42
168
0
0
12
@yshan2u
Ying Shan
1 month
Once a model is trained, there is a fun phase to discover its capability. I've been experimenting with our SEED-X-I model by blending two images, which I call A Tale of Two Images. Here are some interesting results, with details in the thread below!
Tweet media one
Tweet media two
Tweet media three
1
2
12
@yshan2u
Ying Shan
4 months
Given the renewed interest in binarization, here is our earlier work (KDD23) focused on binary embedding for retrieval. It achieves a 16x reduction in memory footprint and has been rigorously tested in production with billions of vectors! Code is available! cc: Yukang Gan,
0
3
12
@yshan2u
Ying Shan
3 months
Neocognitron, the model directly inspired CNN was invented 44 year ago by Kunihiko Fukushima.
Tweet media one
@MIT_CSAIL
MIT CSAIL
3 months
This month in 1980: a Japanese computer scientist published a paper proposing the “Neocognitron,” the neural net that directly inspired CNNs. Kunihiko Fukushima’s paper:
Tweet media one
5
125
483
1
2
12
@yshan2u
Ying Shan
5 months
More to come!😊🌟
@BrianRoemmele
Brian Roemmele
5 months
MultiModal Large Language Models have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs. The models preserve the reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks.
Tweet media one
10
29
173
1
1
11
@yshan2u
Ying Shan
5 months
While Gemini 1.5 Pro may take some time to explain, it marked a breakthrough in multimodal content understanding!
@JeffDean
Jeff Dean (@🏡)
5 months
A nice video made by @Sam_Witteveen , an external developer with early access to the long context capabilities of Gemini 1.5 Pro, sharing some of the things this model can do. 🎉
14
76
408
0
0
11
@yshan2u
Ying Shan
5 months
We've just released a survey on 3D model Generation, encompassing 436 papers on the latest advancements. Hope it's helpful!🌟📈🚀 Thanks to the team: Xiaoyu Li, Qi Zhang, Di Kang, Weihao Cheng, Yiming Gao, Jingbo Zhang, Zhihao Liang, Jing Liao, @yanpei_cao , @yshan2u Paper:
0
1
11
@yshan2u
Ying Shan
6 months
Impressive results for tough prompts!🌟🚀
@pika_research
Pika Research
6 months
Excited to announce RPG-DiffusionMaster, a joint work with Peking University and Stanford University. RPG harnesses multi-modal LLMs to master diffusion models in complex and compositional text-to-image generation/editing, achieving state-of-the-art performance.
Tweet media one
8
31
160
1
1
11
@yshan2u
Ying Shan
2 months
ICLR 2024 (Oral): an unsupervised RL that learns diverse locomotion skills purely from pixels.
@seohong_park
Seohong Park
2 months
METRA is the *first* unsupervised RL method that can learn diverse locomotion skills purely from pixels, and is one of my favorite works! METRA got accepted to ICLR 2024 (Oral), and come to the sessions this Wednesday! Oral: Wed 4p, Halle A 2 Poster: Wed 4:30-6:30, Halle B #161
1
14
66
0
0
11
@yshan2u
Ying Shan
6 months
🌟 Harnessing Tech for Good: ARC Lab is thrilled to be a part of the team integrating cutting-edge AI to restore a stunning 4,500-year-old statue. #TechForGood #Innovation #tencent #ARCLab
Tweet media one
0
3
11
@yshan2u
Ying Shan
6 months
A full year to prepare for Vision innovation and beach vibes! 🤖🌴 #ICCV2023 #ICCV2025
@ICCVConference
#ICCV2023
6 months
Our next #ICCV2025 meeting will be in Honolulu, Hawaii 🌴
Tweet media one
1
29
204
0
0
10
@yshan2u
Ying Shan
2 months
Glad that MotionCtrl is accepted by SIGGRAPH-2024. Thank you all for featuring and following this work! Congrats to: Z Wang, Z Yuan, @xinntao , Y Li, T Chen, M Xia, P Luo, @yshan2u
@dreamingtulpa
Dreaming Tulpa 🥓👑
7 months
The future of AI video generation is gonna be so cool! MotionCtrl is a motion controller that can manage both camera and object motions with video generation models like VideoCrafter1, AnimateDiff and Stable Video Diffusion 🤯
30
124
628
2
1
10
@yshan2u
Ying Shan
2 months
One way of looking at Kolmogorov-Arnold Network.
Tweet media one
@bozavlado
Vlado Boza
2 months
Kolmogorov-Arnold Network is just an ordinary MLP. Here is the Colab, which explains: The main point is, that if we consider KAN interaction as a piece-wise linear function, it can be rewritten like this: 1/n
Tweet media one
21
212
1K
4
0
10
@yshan2u
Ying Shan
1 month
Portrait Video generated from a single image. In the same category of EMO and VASA-1, but open-sourced.
@camenduru
camenduru
1 month
🗣️ V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation 🔥 Jupyter Notebook 🥳 Thanks to Cong Wang ❤ Kuan Tian ❤ Jun Zhang ❤ Yonghang Guan ❤ Feng Luo ❤ Fei Shen ❤ Zhiwei Jiang ❤ Qing Gu ❤ Xiao Han ❤ Wei Yang ❤ 🌐page:
4
42
157
1
2
10
@yshan2u
Ying Shan
4 months
Probably the first in-situ generation of multiple 3D objects from a single image. 📸✨
@_akhaliq
AK
4 months
ComboVerse Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models
2
14
97
0
1
10
@yshan2u
Ying Shan
5 months
YOLO-World + EfficientSAM!
@skalskip92
SkalskiP
5 months
working on YOLO-World + EfficientSAM @huggingface space last touches and you will be able to run zero-shot image segmentation on short videos space:
Tweet media one
5
50
287
1
2
9
@yshan2u
Ying Shan
3 months
Diffusion-DPO: an example of connecting open source with close source.
@SFResearch
Salesforce AI Research
3 months
Check out Diffusion-DPO🌟 Bridging the gap between StableDiffusion & closed models like Midjourney v5. Our #TextToImage model uses human feedback for state-of-the-art alignment, marking a new era in AI creativity! Code: Blog:
Tweet media one
2
16
80
0
1
8
@yshan2u
Ying Shan
3 months
Wearable MLLMs (Multimodal LLMs) has chance going mainstream this time! 🚀✨
@Ahmad_Al_Dahle
Ahmad Al-Dahle
3 months
Multimodal Meta AI is rolling out widely on Ray-Ban Meta starting today! It's a huge advancement for wearables & makes using AI more interactive & intuitive. Excited to share more on our multimodal work w/ Meta AI (& Llama 3), stay tuned for more updates coming soon.
31
101
596
0
0
9
@yshan2u
Ying Shan
5 months
Thanks @camenduru for sharing! DynamiCrafter is a major upgrade to our image-to-video model.🚀 Echoing recent improvements in our text-to-video model, VideoCrafter2, the new model significantly improves motion, resolution, and coherence.
@camenduru
camenduru
5 months
🎬 576x1024 👀 DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors 🔥 Jupyter Notebook + @replicate 🥳 #image2video Thanks to @Double47685693 ❤ Menghan Xia ❤ Yong Zhang ❤ Haoxin Chen ❤ Wangbo Yu ❤ @_hanyuan @xinntao ❤ Tien-Tsin Wong ❤ Ying Shan ❤
10
30
189
0
6
9
@yshan2u
Ying Shan
7 months
In Dall-E3's vision, living in a post-labor economy following the advent of AGI looks like this: a world where advanced robotics and AI are seamlessly integrated into the environment, and humans are engaged in leisure and artistic activities.😊
Tweet media one
1
0
9
@yshan2u
Ying Shan
6 months
Self-nominating ECCV2024 reviewers. #ECCV2024
@eccvconf
European Conference on Computer Vision #ECCV2024
6 months
#ECCV2024 is encouraging potential reviewers to self-nominate. Know a great reviewer? Encourage them to self-nominate. Reviewer nomination form: Please do not send an email to the ECCV organizing committee, we cannot reply to all the individual emails.
1
18
52
0
0
9
@yshan2u
Ying Shan
5 months
Most comprehensive and enlightening tutorial on YOLO-World! Huge thanks to @skalskip92 for the efforts! cc: @tiahch @ge_yixiao @XinggangWang @yshan2u
@skalskip92
SkalskiP
5 months
The YOLO-World YouTube tutorial is out! please, let us know what you think! - model architecture - processing images and video in Colab - prompt engineering and detection refinement - pros and cons of the model watch here: ↓ more resources
12
137
804
1
2
9
@yshan2u
Ying Shan
3 months
The best world model is the world itself. 🌍😊✨
@bilawalsidhu
Bilawal Sidhu
3 months
Caught this fleeting beauty on my sleeve today 🦋 The design of ecological systems is a marvel indeed!
4
2
33
0
0
9
@yshan2u
Ying Shan
3 months
Generative creativity with lightning fast inference.
Tweet media one
@jfischoff
Jonathan Fischoff
3 months
New home page is pretty and functional
0
2
24
0
0
8
@yshan2u
Ying Shan
5 months
Immersive memory!
@LumaLabsAI
Luma AI
5 months
Here's a peak into the future of immersive memories with Luma #AppleVisionPro #LumaAI
32
167
1K
0
1
7
@yshan2u
Ying Shan
3 months
AI generated games.
@emmanuel_2m
Emm
3 months
Would you play a game like this? 🕹️ AI-generated using: - Platformer backgrounds: #Scenario ❤️‍🔥 - Retro music: #Udio 🎶👾 - Animated video sequences: #Runway 📽️ Details provided below. 👇👇
14
24
158
0
0
8
@yshan2u
Ying Shan
2 months
Looks like folks are having lots of fun testing the upcoming @Simulon 😉😉
@XRarchitect
I▲N CURTIS
2 months
Wow @Simulon has me acting like a little kid on Christmas again. Say hello to my new pet 🐲
46
111
710
1
0
8
@yshan2u
Ying Shan
5 months
Interestingly, the World Model bears some similarities to the I Ching (易经). In @ylecun 's formulation, it categorizes all life situations x into 384 categories s, each with a suggested action a. The mystical mapping from s(t) to a(t) is sometimes referred to as a controller.
Tweet media one
@ylecun
Yann LeCun
5 months
Lots of confusion about what a world model is. Here is my definition: Given: - an observation x(t) - a previous estimate of the state of the world s(t) - an action proposal a(t) - a latent variable proposal z(t) A world model computes: - representation: h(t) = Enc(x(t)) -
133
442
3K
1
3
8
@yshan2u
Ying Shan
4 months
Wow, auto-rigged 3D characters in one click!
@tripoai
Tripo
4 months
💥Generate auto-rigged 3D characters in one click, only with Tripo AI💥 👇 The auto-rigging feature for humanoid models is available in our Discord for beta testing. #Tripo #ImageTo3D #TextTo3D #3D #AI #Autorigging
5
20
79
0
0
8
@yshan2u
Ying Shan
1 month
And it also seems to be connected with the World Model (see the link below)😊
Tweet media one
@samim
samim
4 months
A diagram of I Ching hexagrams sent by Joachim Bouvet to the German mathematician Gottfried Wilhelm Leibniz in 1701.
Tweet media one
2
1
22
1
1
8
@yshan2u
Ying Shan
4 months
Upgraded body model with anatomically accurate skeleton rig and mesh!
@Marilyn59846278
Marilyn Keller
4 months
Working with body models 💃 but need to track bones 🦴? We released the SKEL model (SIGGRAPH Asia 2023) 🧵(1/6) Project Page: Code:
6
49
276
0
2
8
@yshan2u
Ying Shan
3 months
One of the highlights from the year 2023 report.
Tweet media one
@StanfordHAI
Stanford HAI
3 months
This year’s AI Index report offers a deep dive into the evolving landscape of AI. Covering key trends from technical performance to geopolitical dynamics, it's a must-read for industry leaders, policymakers, and anyone interested in the state of AI.
Tweet media one
9
207
533
0
0
8
@yshan2u
Ying Shan
4 months
A timely refresh of geometric constraints in the era of deep learning!
Tweet media one
@zhenjun_zhao
Zhenjun Zhao
4 months
Geometric Constraints in Deep Learning Frameworks: A Survey Vibhas K Vats, David J Crandall tl;dr: in title
Tweet media one
Tweet media two
0
28
91
0
0
8
@yshan2u
Ying Shan
4 months
Research and innovation were mostly works of burning brain power. In the new era of deep learning, a significant part of the thinking process has shifted to burning GPUs. This presents both challenges and opportunities for academia. 🔥💻🎓
@xiaolonw
Xiaolong Wang
4 months
Since Sora is out, I have been thinking about our role in academia. One thing we can do at school is fast prototyping with very talented students, showing the potential, the possibility. Of course, the future will always be scaling up.
3
15
198
0
0
7
@yshan2u
Ying Shan
3 months
Stanford's AI Index-2024 is now live.
Tweet media one
@StanfordHAI
Stanford HAI
3 months
📢 The #AIIndex2024 is now live! This year’s report presents new estimates on AI training costs, a thorough analysis of the responsible AI landscape, and a new chapter about AI's impact on medicine and scientific discovery. Read the full report here:
14
368
723
1
1
7
@yshan2u
Ying Shan
2 months
I'm increasingly convinced there's an "impossible trinity" in content creation tools: controllability, usability, and versatility. No tool excels in all three, and none seems able to.
Tweet media one
0
0
7
@yshan2u
Ying Shan
3 months
A compositional world model for multi-agent planning!
@gan_chuang
Chuang Gan
3 months
The ability to infer others' actions and outcomes is central to human social intelligence. Can we leverage GenAI to build cooperative embodied agents with such capabilities? Introducing 🌎COMBO🌎, a compositional world model for multi-agent planning!
5
17
103
1
1
7
@yshan2u
Ying Shan
2 months
A challenge for image-to-video models 😊🚀🎥
@gunsnrosesgirl3
Science girl
2 months
Sunflowers bloom 📹 mandalaexpert
221
4K
16K
0
0
7
@yshan2u
Ying Shan
4 months
Generative graphic design with solid results!
Tweet media one
@RainbowYuhui
Yuhui Yuan
7 months
Introducing COLE: a effective hierarchical generation framework that can convert a simple intention prompt into a high-quality graphic design, while also supporting flexible editing based on user input.🤗🤗🤗 Paper: Project page:
Tweet media one
6
19
69
0
0
7
@yshan2u
Ying Shan
4 months
Great effort assembling the map! The history of AI is essentially a history of massive tensor computation gradually taking the center stage of the evolution.
Tweet media one
@IntuitMachine
Carlos E. Perez
4 months
History of AI Poster by @okdaniellle Original:
Tweet media one
5
45
163
0
0
7
@yshan2u
Ying Shan
1 month
First screenless laptop: no screen means more screens✨✨
@sightful
Sightful
1 month
When you want to turn vision into action, #Spacetop is here.🦾 #Sightful #VR
125
377
1K
0
0
7
@yshan2u
Ying Shan
4 months
Text-to-music generation with specific controls on chords, tempo, and dynamics.
@nicolasguozixun
Zixun Nicolas Guo
4 months
I am excited to announce that Mustango has been accepted at #NAACL2024 ! MusTango is a controllable text-to-music generative system that can generate music audio from text prompts that contain music-specific descriptions (e.g., chords, tempo, dynamics, etc).
Tweet media one
Tweet media two
1
17
64
1
0
7
@yshan2u
Ying Shan
3 months
@ClementDelangue , this is our V0 data, we will push to hf once V1 is ready. @zzachzhang @xinntao
1
0
7