![Xianjun Yang Profile](https://pbs.twimg.com/profile_images/1808285037177327616/ttu4jsOW_x96.jpg)
Xianjun Yang
@Qnolan4
Followers
727
Following
579
Statuses
254
GenAI safety, data-centric AI. Phd @ucsbnlp, BEng @tsinghua_uni. Open to collaboration. Research scientist on AI safety @AIatMeta. Opinions are my own.
Santa Barbara
Joined February 2020
📢New Paper📢 Happy to introduce our new work on whether multimodal LLMs achieved PhD-level intelligence across diverse scientific disciplines! It turns out that the most advanced MLLMs still lag behind a lot! #AI4Science
🚨 Introducing “MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension” 🧐Have current multimodal LLMs achieved PhD-level intelligence across diverse scientific disciplines? Are they ready to become AI scientific assistants? 📢We announce MMSci, a multimodal, multidisciplinary dataset sourced from published articles spanning 72 disciplines from Nature Communications journals #NatureComms, to evaluate and enhance models' comprehension of Phd-level scientific knowledge. ✨Highlights of MMSci: - 72 diverse advanced scientific disciplines, including physics, chemistry, materials science, nanoscience, optics and photonics, biochemistry, energy science, ecology, climate science, ocean science, genetics, immunology, social sciences, agriculture, etc. (. - 131k articles and 742k figures directly crawled from the web instead of extracted from PDF, ensuring diversity and quality. - Heterogeneous and complex multi-panel scientific figures, including charts/graphs, schematic diagrams, macroscopic/microscopic photographs, simulated images, geographical maps, and more. - Benchmarking LMMs' understanding of scientific figures and content, across varying settings. - Visual instruction-tuning data and interleaved article and figures for LMM visual pre-training. 📊 Results and Takeaways on evaluating OSS Models, #GPT4V and #GPT4o: > OSS LMMs showed limited capability in understanding scientific figures, performing near random guesses. #GPT4V and #GPT4o also faced difficulties in challenging settings, achieving 50%-70% accuracy. > Writing relevant and concise captions for scientific figures requires conditioning on the article content, especially the full content to achieve reasonable caption. > Our constructed visual instruction-tuning data improved a 7B LLaVA-Next(v1.6) model to achieve performance comparable to GPT4V/o on our benchmark. > The interleaved article and figure data could be used for LMM pre-training to infuse scientific knowledge, showing improvement on material science tasks.
0
3
19
RT @ZhiyuChen4: Our CBT-Bench paper has been accepted to #NAACL2025 Main! Congrats to the lead @_Guuuuuuuu_ and @Qnolan4 . See you in Albu…
0
2
0
RT @GoodfireAI: We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are, to the best of our knowledge,…
0
126
0
RT @peterbhase: Anthropic Alignment Science is sharing a list of research directions we are interested in seeing more work on! Blog post…
0
7
0
RT @AlbalakAlon: If you're interested in SoTA for reasoning in LLMs, I highly highly recommend reading @rm_rafailov 's thread on Meta Chain…
0
7
0
RT @_zifan_wang: (1/7) Excited to share our new red teaming work at Scale, Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents. We…
0
32
0
RT @MLamparth: Want to learn more about safe AI and the challenges of creating it? Check out the public syllabus (slides and recordings)…
0
30
0