![Xingyu Fu Profile](https://pbs.twimg.com/profile_images/1809726334693961728/VVKp9G6H_x96.jpg)
Xingyu Fu
@XingyuFu2
Followers
834
Following
773
Statuses
111
PhD student @Penn @cogcomp. | Focused on Vision+Language | Previous: @MSFTResearch @AmazonScience B.S. @UofIllinois | โณ๏ธ๐บ
Philadelphia, PA
Joined September 2020
Teach GPT-4o to edit on charts and tables to ReFocus ๐ and facilitate reasoning ๐ง ! ๐ฅ We introduce ReFocus, which edits input table and chart images to better reason visually ๐ค Can we teach smaller models to learn such visual CoT reasoning? ๐ Yes -- They are better than QA and CoT data! ๐ ReFocus + GPT-4o brings +11.0% on tables and +6.8% on charts without using any tools๐ง! ๐ We release a 14K Visual CoT Reasoning *Training Dataset* that provides intermediate refocusing supervision. ๐ค+๐ > CoT: ReFocus VCoT is 8.0% better than QA data and 2.6% better CoT data with supervised Finetuning on Phi3.5v. Trained model also released. ๐ Check out This work is done during intern @Microsoft with amazing coauthors @minqian_liu
@zhengyuan_yang @JCorring36990
@YijuanLu @jw2yang4ai
@DanRothNLP @DineiFlorencio @ChaZhang. A huge shoutout to everyone!
5
32
148
RT @fwang_nlp: ๐ ๐๐ถ๐ฟ๐๐ฒ๐ป๐ฐ๐ต is officially accepted at #ICLR2025! ๐ Recent VLMs/MLLMs such as LLaVA-OneVision, MM1.5, and MAmmoTH-VL have demoโฆ
0
8
0
RT @XingyuFu2: Teach GPT-4o to edit on charts and tables to ReFocus ๐ and facilitate reasoning ๐ง ! ๐ฅ We introduce ReFocus, which edits inpuโฆ
0
32
0
@gabrielchua_ This is a hard problem for models and cannot solved by visual Sketchpad ! I think itโs really an exciting direction and please keep me tuned๐
0
0
0
@gabrielchua_ Unfortunately we finished the project before 4o could be finetuned with images๐ But we release all the training data with intermediate visual outputs, feel free to try with them!
1
0
0
@astro_nolan Intersting problem! To be honost I think similar to problems in ReFocus, python code + low-level vision tools can be very helpful, e.g. use cv2 tools to find curve coordinates and provide to GPT models.
1
0
2
ReFocus is inspired by many brilliant prior works, especially Visual SketchPad from @huyushi98 @WeijiaShi2 @LukeZettlemoyer @nlpnoah @RanjayKrishna, Visprog from @tanmay2099 @anikembhavi , ViperGPT from @SachitMenon @Surisdi @cvondrick , and many more!
0
0
5
RT @weichiuma: How to build an AI system that can generate 3D worlds from a single image? All you need is the **RIGHT** data! By trainingโฆ
0
114
0
RT @WeijiaShi2: Introducing ๐๐ฅ๐๐ฆ๐๐
๐ฎ๐ฌ๐ข๐จ๐ง: empowering Llama ๐ฆ with diffusion ๐จ to understand and generate text and images in arbitrary sequenโฆ
0
178
0
RT @Xiaodong_Yu_126: Life update: I defended my Ph.D. thesis today and have joined @AMD GenAI as a research scientist. ๐๐ #UPenn #AMD httpโฆ
0
18
0
RT @thoma_gu: Life update: Excited to share that I will be joining @CIS_Penn @PennEngineers as an Assistant Professor in Fall 2025!๐คฏ Iโmโฆ
0
52
0
RT @cmalaviya11: Excited to share โจ Contextualized Evaluations โจ! Benchmarks like Chatbot Arena contain underspecified queries, which canโฆ
0
28
0
RT @xiangyue96: ๐ Iโve always had a dream of making AI accessible to everyone, regardless of location or language. However, current open MLโฆ
0
78
0
RT @yuntiandeng: How many reasoning tokens does OpenAI o1 use? It turns out they are almost always multiples of 64 (99+% of the time in 100โฆ
0
47
0