stingning Profile Banner
Ning Ding Profile
Ning Ding

@stingning

Followers
1K
Following
660
Statuses
221

Researcher

Joined May 2015
Don't wanna be here? Send us removal request.
@stingning
Ning Ding
2 days
RT @JiaLi52524397: 🚀 NuminaMath 1.5 is here! 🚀 900k+ high-quality competition math problems with CoT solutions, new problem metadata, manua…
0
68
0
@stingning
Ning Ding
4 days
RT @lindsayttsq: 📝We've released the MedXpertQA dataset! 📚Check out more details: Preprint:
0
2
0
@stingning
Ning Ding
5 days
RT @yang_zonghan: Timely reminder for me to be grateful of how fortunate I am to have started my research journey from NLP, and what an hon…
0
1
0
@stingning
Ning Ding
8 days
RT @lifan__yuan: 1/ PRIME is alive on arXiv💡! Building on our blog, we've added extensive experiments exploring: - Implicit PRM design ch…
0
14
0
@stingning
Ning Ding
8 days
RT @stingning: 📜 We are releasing the PRIME paper: Let's be clear at first, dense rewards are not dead. And they…
0
15
0
@stingning
Ning Ding
8 days
RT @PhysInHistory: Euler's identity combines these five numbers in a simple and elegant equation. Despite each of the constants representin…
0
201
0
@stingning
Ning Ding
8 days
GitHub:
0
0
3
@stingning
Ning Ding
9 days
RT @_akhaliq: Process Reinforcement through Implicit Rewards
Tweet media one
0
19
0
@stingning
Ning Ding
9 days
RT @lindsayttsq: We improve clinical relevance through ⭐️Medical specialty coverage: MedXpertQA includes questions from 20+ exams of medica…
0
1
0
@stingning
Ning Ding
9 days
"Reasoning" encompasses much more than just mathematics and coding.
@lindsayttsq
Shang Qu
9 days
📈How far are leading models from mastering realistic medical tasks? MedXpertQA, our new text & multimodal medical benchmark, reveals existing gaps in model abilities. Compared with rapidly saturating benchmarks like MedQA, we raise the bar with harder questions and a sharper focus on medical reasoning. 📌Percentage scores on our Text subset: o3-mini: 37.30 R1: 37.76 - the clear frontrunner among open-source models o1: 44.67 - highest performance, but still much room for improvement! Preprint: Data files will be released shortly at: Key insights in 🧵
Tweet media one
0
0
9
@stingning
Ning Ding
9 days
RT @lindsayttsq: 📈How far are leading models from mastering realistic medical tasks? MedXpertQA, our new text & multimodal medical benchmar…
0
6
0
@stingning
Ning Ding
16 days
In 2023, I noticed that DeepSeek released advertisements recruiting a group of "Data Virtuosos." In their job requirements, they mentioned hoping to find individuals with broad knowledge, proficient in literature, history, culture, science, anime, films, and more—quick on their feet and full of imagination—to help DeepSeek build its own data moat. The recruitment targeted people in mainland China. Although I'm not sure about the subsequent outcomes of this recruitment, it likely played a role. By the way, I also think DeepSeek's general reasoning ability in English is excellent.
1
1
7
@stingning
Ning Ding
17 days
RT @sama: fun watching people react to operator. reminds me of the chatgpt launch!
0
403
0