OpenAI's o1-preview is the new king of DevQualityEval 👑 the best overall, the most reliable model, best at Ruby, but the slowest and most expensive model as well.
- o1-preview is
#1
with score 40806 (98.61%), o1-mini
#2
with 40089 (96.88%) and Sonnet