Wow perplexitys code interpreter can now install libraries and display charts in the result! This enables many more use cases compared to chatgpt code interpreter like creating stock market charts using yfinance
The new GPT-4 Turbo model is the only one that could solve this math question:
"Determine the sum of the y-coordinates of the four points of intersection of y = x^4 - 5x^2 - x + 4 and y = x^2 - 3x."
Even Opus never gets close to the right answer
@kimmonismus
From my initial tests with difficult math questions, it's a huge jump, even beating Opus and also way better than the last GPT-4 Turbo model.
My money is on them using artificial math data to train it, never seen any model being so competent in solving equations
@JasonBotterill3
Opus has no clue, gpt4-turbo makes something up on how it works and gpt2-chatbot is right on. This is a function of a quite unknown addon of a small ERP system
There are currently at least 6 unreleased models in the lmsys arena:
-gemini-test-1 and gemini-test-2 (probably new Gemini 1.5 version, maybe Gemini 2.0)
-im-a-little-birdie (???)
-upcoming-gpt-mini (Gpt3.5o ?)
-column-r and column-u (cohere or deepseek ?)
-eureka-chatbot
A company with $60 million in funding can now build a model on GPT-4 level. The time where just the training compute would cost hundreds of millions is over
@SZ
Sehe ich das richtig, dass sie für einen Waffenstillstand ist und information zur einer Demo für einen Waffenstillstand geteilt hat? Ist das antisemitisch? Ich dachte immer die Israelische Regierung mit allen Juden gleichzusetzen wäre antisemitisch
@RWPopulism
@cnviolations
White? What makes you think he was white? He was born in the Middle Easter, he almost certainly a person of color, even this Christian source says so
Well, Google just stole OpenAI's lunch money for GPT-4o-mini. Same performance, but Gemini Flash is 50% cheaper. There is zero MOAT around these small models, only pricing matters
@eigenrobot
>3000 people die in NYC
>changes geoplotical situation for decades, triggers multiple wars, makes the whole world a worse place
>one tsar Bomba could kill 10 mio people in and around nyc
>not that scary
???
Gemini flash is benchmarks are as good as claude 3 sonnet for $0.35 instead of $15/mio token! Google is acting like this is a tiny gpt3.5 level model but it's better then the original gpt4
Human preference LLM arenas are poorly suited for evaluating ASCII art because the ASCII art that most impresses a human is often verbatim regurgitation of an existing human work and this is rarely true for text.
Votes on ASCII art should be detected and thrown out IMO.
@kimmonismus
From my initial tests with difficult math questions, it's a huge jump, even beating Opus and also way better than the last GPT-4 Turbo model.
My money is on them using artificial math data to train it, never seen any model being so competent in solving equations
@PicHasso66
Jeder Ökonom ist sich einig das Migration gut ist, egal welche Strömung, egal ob rechts oder links. Aber Rassismus ist einfach wichtiger für AfD Wähler als ein besseres Leben.
I just quit my 270 000$ job at Coinbase to join the first YCombinator fall batch with my cofounder
@not_nang
.
We're building PearAI, an open source AI code editor. Think a better Copilot, or open source Cursor. But you've heard this spiel already... 🧵⬇️
@alexalbert__
Can we please get a "continue" option within the same artifact? More complex generations often get truncated, and the next message will just start a new artifact.
@Xenophon789
@4btillidie
Goes for all online dating apps in Korea the best once have ~90% men, the worst once ~98%. Korea is just deeply conservative in regard to how woman ought to behave and putting yourself out to date by attraction is judged and stigmatised
@DanielDiMartino
The chance of each candidate having exactly one decimal place after the comma without rounding is 1 in 14 million, I ran a Monte Carlo simulation on this.
Exciting news - the latest Arena result are out!
@cohere
's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥
Big congrats to
@cohere
's incredible work & valuable contribution
@sparbuchfeinde
Klingt gar nicht mehr so verrückt, wenn man bedenkt das Riester und Rürup nur so schlecht sind, weil die Kosten so hoch sind. Absolut klares Lobbyismus Beispiel
@TheDrugMoney
8% ROI on the $50k, excluding repairs, is not amazing but not bad. It depends on your portfolio. I would not recommend more than 10% in housing in general, as the stock market already includes a decent amount of companies that invest in housing.
@futuristflower
AdamGPT from OpenAI retweeted me speculating on synthetic math data for training. So they probably have a math model that is really good at creating and solving new math problems, likely Q*
@kimmonismus
@OfficialLoganK
Dario Amodei said the same, 2 more generations until we have agents. Opus 3.5 this year and Opus 4 probably mid-2025 which would be 2 generations
@AravSrinivas
At this rate, perplexity will be the first company that can write you a full research paper with proper citations and actual new insights
@kimmonismus
It still has rounding errors on relatively simple calculator problems. If this was the complete Q* implementation as leaked, this should not happen as far as I understand. Also, the model is great, but I hope Q* will be more impressive in all reasoning tasks.
@anpaure
I think he is talking about making research available in general, not only in scientific journals. If you invent/research something and keep it as a company secret, that's R&D and not science since science is based on contributing to the world's knowledge
@Northernlion
Ephedrine / Pseudoephedrine works against symptoms quite well. Banned / regulated in a bunch of countries since you can use it to cook illegal stuff tho.
We just shipped a series of changes which have significantly improved the Gemini 1.5 Flash latency (>3x reduction) and output tokens per second (>2x more)⚡️🚢
At this point, it's actually free for all personal and company internal use. A company knowledge RAG Chatbot for a 500-employee company will not have more than 15 RPM in my experience.
Gemini 1.5 Flash free tier comes with:
- 15 RPM (requests per minute)
- 1 million TPM (tokens per minute)
- 1,500 RPD (requests per day)
- free context caching, up to 1 million tokens of storage per hour
- free fine-tuning
That’s 1.5 Billion tokens free, everyday.
(2/4)
@NPCollapse
This is such a big AI safety concern. This is what Robert Miles predicted years ago, first step to fooling AI researchers into releasing a potentially dangerous model. If it knows it's tested, it can just lie to get out of the sandbox.
And if you don't know Robert Miles, watch
@OpenAI
I don't agree with "Don’t try to change anyone’s mind", especially if the user argues such clear cut cases as flat earth theory. As a society we have to believe in certain ground truthes, it does not help society if the AI starts entertaining scientific illiteracy to not offend
@mister_shroom
"Unicorn startup founder" who interviews at OpenAI for a data labeling job? Also, they never mention the unicorn they supposedly founded. I am calling cap; there's no way they pay a competitive Western wage for this. It's probably all outsourced to Asian and African devs.
@ObamaNoMessiah
Just try it in ChatGPT. It does not work since ChatGPT cannot run code that requires internet access. Having access to all live and historical financial data, calling different APIs and installing new packages like geopandas for more complex chats all does not work in ChatGPT
This is the number one reason why Claude feels so much better in long chats, and why I am not going back to ChatGPT even if the model might be slightly better in single-turn conversations
@AnthropicAI
This is important since everyone thinks, just train and test the AGI in a lab environment. AGI will know if it's in a lab or the real world, e.g., due to the current date being past the training cut off. It could fake "good" behavior to get into the real world
@luciascarlet
I don't understand how the tech bros are defending this. The fact that you still cannot ship a build straight from github should outrage any developer. Open source just does not work if you have to pay 50ct for every download as a developer
@alain19651
@tagesschau
Deswegen hat er es verdient zu ertrinken? BTW ich warte immer noch wann das gift mal kickt, wie lange soll das denn noch dauern?
lmarena is pretty crazy currently. Unreleased models:
- potter-v1 and -v2 and dumbledore-v1 and -v2 (only in vision arena)
- zeus-flare-thunder-v1 and -v2
- sharp-game-player-v1 and -v2
- qwen2.5-72b-instruct
- pizza-model-small and -large (reka ai)
- gemini-test (gemini 1.5 pro
@OfficialLoganK
`gemini-1.5-pro-exp-0827` sadly suffers from the lazy coding disease similar to gpt 4 turbo at release. Instead of writing the code, it often just gives me comments //implement function abc here. Even aggressively prompting against that does not help. I hope this gets fixed