![Fan-Yun Sun Profile](https://pbs.twimg.com/profile_images/1734168871576440832/IRKcwubO_x96.jpg)
Fan-Yun Sun
@sunfanyun
Followers
824
Following
125
Statuses
107
cs phd candidate @StanfordAILab @stanfordsvl @NVIDIAAI (3D) vision/graphics, embodied AI
Stanford, CA
Joined October 2018
Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we “distill” embodied policies from foundational models? Introducing FactorSim! #NeurIPS2024 We show that by generating prompt-aligned simulations and training a policy on them without collecting any experience in the target environment, we can achieve zero-shot performance close to policies trained on millions of target environment experiences in many classic RL environments. You can generate RL simulations on our project website: More in 🧵 1/7
2
44
212
@io_nathaniel @cbames @io_nathaniel I have some ideas for a cool colab but can't DM you -- shoot me a message
0
0
1
RT @heyyalexwang: did you know you've been doing test-time learning this whole time? transformers, SSMs, RNNs, are all test-time regressor…
0
109
0
I think an intuitive way to explain o1/o3 is that the models are taught to be "self-consistent" through RL. Humans are not self-consistent, often jumping to contradictory conclusions (especially on the internet). LLMs end up being suboptimal after being trained on data with these incomplete or flawed reasoning paths. Can this sort of test-time compute scale beyond the data we have today? My best guess is that it can, especially in domains where "being a verifier is easier than being a solver/generator" (e.g., code, ARC). If a model can verify its own hypotheses, it can be trained to maintain self-consistency, enabling it to generate more accurate answers. This reminds me of those neuroscience/biomedial studies suggesting that our brains stop developing after age 30. If that's true, our intellectual growth after 30 doesn't come from an improvement over the "base model", but from learning how to think more rigorously and coherently.
0
0
9
RT @jiaman01: 🤖 Introducing Human-Object Interaction from Human-Level Instructions! First complete system that generates physically plausib…
0
111
0
Check us out at NeurIPS tomorrow! Unfortunately I can’t be there but @locross and Jonathan will present at East Exhibit Hall A-C
Training RL/robot policies requires extensive experience in the target environment, which is often difficult to obtain. How can we “distill” embodied policies from foundational models? Introducing FactorSim! #NeurIPS2024 We show that by generating prompt-aligned simulations and training a policy on them without collecting any experience in the target environment, we can achieve zero-shot performance close to policies trained on millions of target environment experiences in many classic RL environments. You can generate RL simulations on our project website: More in 🧵 1/7
0
2
5
RT @nickhaber: At #NeurIPS! Anyone who’d like to chat, please reach out! I like curiosity and exploration, reasoning and self-improvement,…
0
3
0
It’s widely believed that most pixels will be generated in a few years. I think it may be more accurate to say that most pixels will be generatively rendered because high-quality content almost always requires a "graphics" representation/control layer for precision. Here are some of my favorite examples by @MartinNebelong along this line of thought:
0
0
1
RT @ItzSuds: I stopped tweeting 4 years ago because I had to build a company and Twitter wasn’t the real world. Turns out Twitter is the r…
0
22
0