🚀Introducing OpenChat 3.6
🌟Surpassed official Llama3-Instruct—with 1-2M synthetic data compared to ~10M human labels
🤫GPTs are close to limits—excel at generation but fall short at complex tasks
🎯We are training next gen—capable of deterministic reasoning and planning
🔗
Introducing the 𝗪𝗼𝗿𝗹𝗱’𝘀 𝗕𝗲𝘀𝘁 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝟳𝗕 𝗟𝗟𝗠 - OpenChat-3.5-1210, further surpassing ChatGPT and Grok models.
This upgrade to the widely adopted OpenChat-3.5 is focused on increasing the performance in one of the most important areas for LLMs -
🚀Announcing OpenChat-3.5 Update 0106: 𝗪𝗼𝗿𝗹𝗱’𝘀 𝗕𝗲𝘀𝘁 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝟳𝗕 𝗟𝗟𝗠!
Experience ChatGPT & Grok-level AI locally 💿!
Surpassing Grok-0 (33B) across all 4 benchmarks and Grok-1 (???B) on average and 3/4 benchmarks 🔥.
🎯 This update mainly enhanced
🚀 The World's First Gemma fine-tune based on openchat-3.5-0106 data and method (C-RLFT). Almost the same performance as the Mistral-based version.
6T tokens = secret recipe?
HuggingFace:
🎉OpenChat 3.2 SUPER is here!
🚀 Built with innovative fine-tuning techniques, it outperforms all Llama-2-based 13B models, even with the same 80K mixed-quality ShareGPT data set.
🥇 Ranking
#1
on AgentBench, MT-bench, and AlpacaEval among 13B models.
🚀Announcing OpenChat-3.5 Update 0106: 𝗪𝗼𝗿𝗹𝗱’𝘀 𝗕𝗲𝘀𝘁 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲 𝟳𝗕 𝗟𝗟𝗠!
Experience ChatGPT & Grok-level AI locally 💿!
Surpassing Grok-0 (33B) across all 4 benchmarks and Grok-1 (???B) on average and 3/4 benchmarks 🔥.
🎯 This update mainly enhanced
🚀 OpenChat: our new paper on enhancing open-source language models!
C-RLFT utilizes mixed-quality data—no preference labels!
OpenChat-13B excels, using only ShareGPT data (like Vicuna). Discover more!👇
#NLP
#AI
#OpenSource
It is also available on our hosted demo:
If you’d like to deploy it yourself, you can follow the instructions on our GitHub to serve OpenChat models with a vLLM backend, API keys and more:
Magic recipe:
Set eps = 1e-5 in AdamW and you will get a very smooth loss curve
Tested with betas (0.9, 0.95) weight decay 0.1 and a lot of different learning rates
.
@_philschmid
Thank you! We added this feature to advocate reproducible evaluations with open-source LLMs. It should behave similarly to Prometheus. We're testing using the methodology in their paper😀
@andersonbcdefg
Conditioning: Use a different prompt for GPT4 and GPT3.5 data
Token-wise Loss: The same as HF loss calculation, the total loss is the average of all token losses