Llama-3 based SQLCoder 8b is out! Open weights with a commercially friendly cc-by-sa license. Probably the best <10B param model for Postgres text to SQL right now.
Slightly better than gpt-4-turbo and claude opus for 0-shot text to SQL generation. Also approaches their
We just open-sourced SQL Coder, a 15B param text-to-SQL model that outperforms OpenAI's gpt-3.5! When fine-tuned on an individual schema, it outperforms gpt-4.
The model is small enough to run on a single A100 40GB in 16 bit floats, or on a single
We finally beat GPT-4 for SQL generation, after 3 months of trying! 🤓
SQLCoder now writes better Postgres SQL than GPT-4. Benchmarks aside, I'm amazed at how well it works even without fine-tuning.
WITH further fine-tuned on a particular schema, it's ridiculously good. We've
Welp, just finished training and evaluating CodeLlama-70B for SQL. This thing is a beast when fine-tuned.
Miles ahead of anything else (including GPT-4). Open-sourcing the weights either today or tomorrow!
We just opened sourced SQLCoder-70B! It outperforms all publicly accessible LLMs for Postgres text-to-SQL generation by a very wide margin.
SQLCoder is finetuned on
@AIatMeta
's CodeLlama-70B model that was released yesterday on less than 20,000 hand-curated prompt completion
Apple Silicon is seriously impressive!
Just got a refurbished M2 Max with 64GB RAM. It does ~50 tok/s on our q4 quantized 7b mistral fine-tune, with comparable speeds to GPT-4
Will run tests on our quantized 34B model soon 🤓
Been sitting on this for a while – we raised a $2.2M round led by
@ajhodls
and
@ycombinator
, with participation from some incredible angels (including
@dharmesh
, who I have looked up to for many years).
Excited to continue building – back to work now!
Running our new 7B model 100% locally on an M1 Mac 🤓
76% accuracy on sql-eval with GGUF. For reference, GPT-4 is 82.5% and SQLCoder-34B-v2 is at 85%
Pretty wild that this works locally *on a laptop*! Super excited about getting this on a Mac app soon.
We just got our 15B parameter SQLCoder2 model to match (and *slightly* beat) GPT-4 for complex SQL generation on out-of-training-set schemas. Releasing the weights (hopefully) next week!
Our previous model – SQLCoder – beat GPT-3.5 but lagged behind GPT-4. The new model
We just open-sourced SQLCoder2 and SQLCoder-7B! They outperform GPT-4 when fine-tuned on a specific database schema, and outperform GPT-3.5 on out-of-training-set schemas
SQLCoder2 is a 15B parameter model that uses the excellent Starcoder model by
@BigCodeProject
as a base
We now have a 15B parameter text=>SQL model that outperforms gpt-3 and is competitive with gpt-3.5-turbo!
Will open-source our evals framework and the model weights later this month.
Been a hermit the last month and a half as we tried to get this up and running! Worth it! :D
Incredibly excited to launch Agents today! Agents automate complex, repetitive work in SQL, Python and R - all while keeping data scientists in the loop for feedback and clarification.
We built Agents to help with the many tedious trial-and-error tasks involved in statistical
Launching the second generation of SQLCoder-7b on
@huggingface
today!
This is distilled from our 70B model, and performs around as well* as GPT-4 for text-to-SQL generation. Finetuned on
@AIatMeta
's CodeLlama-7b.
*To be more precise – this model is much better at ratios and
Since social media mostly has wins: got rejected by
@ycombinator
. But onwards and upwards!
@narratives_data
is a search+collaboration engine to help creators analyse the world better. As every co becomes a media co, this’ll become a massive market that few are building for rn
Happy Diwali! You will hear tons of emotional chatter about air quality today.
I have been tracking air quality in India for 6 years, and it’s annoying to see the debate reduced to “are crackers the main cause of pollution”. Here’s some nuance
The
@ycombinator
speed multiplier is real! Have shipped a major feature every day since the batch started, and I'll have more commits in the first 6 weeks of 2023 than in all of 2022 🤯
User feedback and fast iterations create a pavlovian loop that's all kinds of awesome!
Phew. Just pitched
@defogdata
at
@ycombinator
's demo day. My heart rate was at 112 bpm 😅
YC has been amazing. Excited about going back to building product (and a team) now!
Two big updates today!
We updated the weights for sqlcoder-7b-2, and it now outperforms GPT-4 for most SQL queries – specially if you give it the right instructions and prompt well
@huggingface
link here:
2) We've added basic instruction following
I made an Indian name to ethnicity, affluence, gender, and age classifier! All data used in making this was obtained from publicly available sources, and cleaning it was a total pain. Hope I never have to parse PDFs again :/
Won the GPU lottery and got an 8x H100 SXM on runpod. I don't think I can ever go back
Run that was going to take 8 hours on 4x RTX 6000 Ada took 15 minutes on the H100s with FSDP
15 minutes of compute to go to Llama3 => GPT-4ish performance on specific tasks. 15 minutes!
Swapped out our old model with
@OpenAI
's GPT3.5 (used for ChatGPT).
@defogdata
works *way* better now!
Very cool – specially because it's easy to fine-tune GPT3.5 on custom data. Even without fine-tuning, it's able to understand how to calculate COGS, gross profit ratio etc!
Just open-sourced a
#dataviz
library that let's you go convert maps like the one on the left to the one on the right. Also works with Excel files and CSVs. Would love feedback!
Got to chat with
@sama
and OpenAI folks in SG today!
Pretty cool that they're doing this world tour thing – and are actually taking complaints and feedback seriously. *Really* hoping for more stability and increased capacity in the near future 🤞🏼
Reflecting on our decision to shut down last year. The product consistently served 100M+ unique monthly IPs, and produced content that was always on the first page of Google
But it failed as a business, largely because of my management failures. A long🧵
We recently onboarded a large British customer. They are.. extremely polite when talking to LLMs 😅
Instead of questions like "what is X", most questions are phrased like "I am hoping to get X", or "could you please provide recommendations on how to get X". Cool to see!
Useful learning over the weekend – the RTX4090 is a beast. If you can get your model to fit in memory, it is almost as fast as an H100 SXM under low load⚡️
Realizing that number of CUDA Cores is often more important than memory bandwidth for inference. The 4090 really shines
Copilot converted an entire library from Python to NodeJS for me, and it worked nearly perfectly (I had to make just 2 edits)
It also wrote ~30% of the Python library
The more I work on
@databricks
, the more I'm amazed at how "complete" their product is
Probably the best positioned large co for enterprise adoption of AI and LLMs right now
- Fantastic semantic layer to understand data
- SQL warehouse that integrates with everything
-
Aight we have a pretty good Llama3-8B based SQLcoder brewing!
- GPT-4-turbo level performance on 0-shot text to SQL
- Almost GPT-4-turbo performance on instruction following and in-context k-shot learning for SQL gen
Lack of instruction following and in-context learning was the
Excited to finally share this!
Super proud of what we are building at
@defogdata
, and YC has helped us be a lot more ambitious about our vision.
Our group partners
@_puneetKumar
and
@bradflora
have already added so much value and pushed us to think bigger!
Welcome to YC,
@rishdotblog
,
@medha_basu
, and team
@defogdata
!
Defog is like ChatGPT for data - right within your app. It enables users to query data in seconds, using natural language.
Evaluated Claude-3 on SQL-Eval. Much better than Claude 2, but some way to go until GPT-4
For SQL generation, Opus has GPT-4 turbo level performance. Sonnet has similar performance as 3.5-turbo, but is also roughly 4x slower. GPT-4 is still significantly better
Llama3 (8b) performs much better than the code-focused on CodeLlama 7b in my tests so far – both for SQL generation and general programming tasks
Amazing to see in a model that's also has world knowledge. Makes it a great base for both agent planning and SQL-generation tasks.
Quantization (and AWQ) is amazing. Just got our 34B model running – with almost no accuracy loss – on an single RTX 4090!
Next up, GGUF and making it work inside a Mac app 🤓
I'll likely combine this data with step count, heart rate, and sleep data from Fitbit and then publish it to Github as a weekend project
Many startups (like ) are emerging in the wearable-enabled health space. Super excited about the future! [/fin]
FactGPT is pretty nuts. Possibly a preview of what Bing would look like with the ChatGPT integration
Has accurate information, works across geographies. Have not been able to get it to return "false" news
Take a bow
@AnkurPandey
,
@averma12
, and the
@LongShot_ai
team!
Just finished running evals for Postgres text-to-SQL on the new Llama 3 models
TLDR
- Unfinetuned llama models not (yet) as good as OpenAI and Claude models, but will easily outperform with finetuning on domain specific tasks
- Llama 3.1 8B is faaaar better than the Llama 3 8b
Important things I forgot to mention in the original tweet!
- SQLCoder is fine-tuned on StarCoder, an awesome initiative of the
@BigCodeProject
- We used a slightly novel training approach. We first trained the model on "easy" questions, and then trained the result of that on
We just opened sourced SQLCoder-70B! It outperforms all publicly accessible LLMs for Postgres text-to-SQL generation by a very wide margin.
SQLCoder is finetuned on
@AIatMeta
's CodeLlama-70B model that was released yesterday on less than 20,000 hand-curated prompt completion
We're *finally* SOC-2 Type II compliant 🤓
Took a fair bit of work, but getting our data security controls in place was so worth it – specially as we start serving more enterprise customers!
Got the beginnings of an AI data analyst that works 100% on the Macbook up!
Amazed at how fast it was (not sped up at all – literally just took the app 2 seconds to generate a query!)
llama-cpp and llama-cpp-python made dealing with Apple metal ridiculously easy :D
At an event with Jensen Huang right now — some notes
1. Automated production of intelligence at scale = new kinds of productivity. Ability to harness data into intelligence at scale will enable humans to do so much more
2. Much of this productivity will be independent of
You can now run SQLCoder with a GUI on Apple Silicon or any NVIDIA GPU-enabled device! On Apple Silicon, just run
CMAKE_ARGS="-DLLAMA_METAL=on" pip install "sqlcoder[llama-cpp]"
sqlcoder launch
The Apple Silicon version is not super accurate, but works great for simple
Wow,
@OpenAI
announcements today are 🔥
- chatgpt api, 10x cheaper than davinci and with generally better performance
- whisper-large as an api, cheaper and faster than anything else out there
- much better terms around data privacy and logging
Nuts!
Finished running SQL-Eval (200 text to SQL questions) on both the new GPT-4 turbo, and Gemini Pro 1.5
Caveat: this is for 0-shot responses only. Results might be different with k-shot prompting, and it's entirely possible that I used suboptimal prompts.
GPT-4 turbo reasons
Took an afternoon off for the first time in forever today to explore SF.
Such a beautiful city! So many people playing music, walking their dogs, exercising, and reading in the sun. Hoping to do more of this in the next few weekends :D
Low carb, high-fat foods led to almost no rise in blood sugar while being super satiating
For instance, the eggs & avocado toast below (cooked in olive oil, with some feta cheese) was around 600 calories and led to no blood sugar spike whatsoever! [5/]
Got SQLCoder-34B running on a Macbook (with minimal accuracy loss), using GGUF q5_k_m quantization!
@ggerganov
has opened so many doors for normies to experience AI and Apple Metal!
Quantized accuracy was 80%, compared to 84% for an unquantized model
Mean latency for SQL
Ashris is one of the best in the business and has a massive, living resume that speaks to his expertise. Can’t recommend this enough if you’re trying to learn more about data viz!
Data need not be boring! Let's learn to make data fun and insightful with IIP's course Introduction to Data Visualization on Unacademy!
Link:
Hurry! Limited discount period - students can avail of extra discount!
A surprising learning for me – carb heavy meals after not eating for a while cause a huge sugar spike!
Rajma + a whole meal wrap after 22 hours of fasting led to this. When intermittent fasting, will avoid a carb-heavy lunch from now [2/]
The impact of refined carbs was stark, too. Rotis (made with wheat atta) led to a really bad spike here
It would've likely been worse if not for a short (~10min) walk right after eating [3/]
Sigh I'm so glad we're moving to self-hosted LLMs for code gen
OpenAI keeps changing the underlying model without any notice. So frustrating to deal with
Back to traditional software engineering today after many days in fine tuning land and was super productive 🤓
Open-source app that makes LLM powered data analysis easy (and possible on a macbook!) coming this week
Cloudflare's new AI announcements look fun! Check out sqlcoder-7b-2 on their playground :D
Unfortunately allows only for chat-styled inference right now (which we are not optimized for) – but still outperforms other models for text to SQL tasks!
Air Quality will be bad tonight (though not as bad as last year). But average air quality over the next 3-4 months will be bad too, and causes more harm than just one day of bad air
Hoping that public angst around the issue won’t be restricted to just one cultural flash point!
Firecrackers (obviously) affect short-term air quality, but it's more complicated than that
AQI in Chennai (generally a very clean city) spiked because of firecrackers. But it will rapidly improve tomorrow because of the city's geography
Delhi is a different story [1/]
Pushed this yesterday!
Still crappy, but Data Narratives now supports video creation with AI voiceovers! Users make reports from charts they've saved, and a single click converts those reports into videos
Loads to improve (animations, titles, transitions) – but it's a start :D
Will definitely get a CGM for my parents so they can see what food items lead to a sugar spike for them. Continuous measurement will help identify dietary culprits
Also hope that the Apple Watch 7 has an optical glucose monitor – will help diabetics save so much money! [7/]
After a bit more testing, gpt-4o is a remarkable model for programming, tool use, and planning. Much better than gpt-4-turbo in ways that aren’t always captured by evals.
Also relies far less on prompt engineering, and tends to “just work” most of the time. Excited to see what
Such a great time to be building AI apps right now. GPT-4 is super promising. Google Cloud announcements are great. Fine-tuned Llama and Flan-UL2 are amazing for self-hosted models.
So many great options to choose from. *Amazing* time to be a builder!
Equally importantly, it is 2x faster than GPT-4 and GPT-4-Turbo when deployed on a single A100 80GB GPU. And as fast when deployed on 4x A10 GPUs (using vLLM)
We haven't had a chance to play with Nvidia's TensorRT LLM, but might get more speed gains with that.
Huge props to
Useful finding about fine-tuning today – training on the same dataset and hyperparameters can still give you dramatically different end results, even if your train and eval loss are pretty much the same
Consider this. I did 3 finetuning runs back to back on the same machine,
PSA: if you use GPT-4 prompts for other models (Gemini/Mistral/open-source), they won't work as well.
Spend the time to play around with different prompts for different models – what works for one is rarely what works for others
Gah my GitHub commits went to near 0 in the last two weeks. Spent most of my time in investor calls
Super hungry to make up for lost time 🤓 Starting off with onboarding and efficiency improvements. Then more "fun" features!
But going for long walks after eating high-carb, high-calorie meals can lead to a less extreme response
In this graph, I had the same amount of paneer as in the graph above and the wraps had the same amt of carbs as the rotis. But a long (~5km) walk meant no sugar spike [4/]
Aight starting another 1x/day shipping challenge for the month of May – with the added constraint of exercise and sleep
The plan:
- push 1 feature live every day
- do one of a 4km run OR strength training OR a 10km hike every day
- sleep atleast 7 hours every day
Should be a
At a Singapore x TechCrunch event in SF today and
@AndrewYNg
gave an awesome talk around where he sees AI going.
I got my start with ML on Coursera 10 years ago! Pretty cool to see him talk about ML all these years later!
As winter sets in, winds slow down, temperatures are lower, stubble burning increases as farmers clear their fields… and Diwali coincides with all these things
Lol I love the RELEASE file in
@MistralAI
's torrent!
Also, look at how tiny that team is. Amazing what a small group of smart, motivated people can do
Epiphany today: thinking about the same stuff over & over can feel like going around in circles. But it’s an upwards spiral
Quality of insights compounds over time. Engaging with novel things *feels* great, but meditating on the same stuff over years can lead to better outcomes
So happy to see Data Narratives being adopted by giants like
@timesofindia
!
Excited that our collaboration workflows are coming together. Massive thanks to
@indianeconomy
&
@drindrajeetrai
for trying an initially buggy product & making it more useful :D
Just had a 10th grader reach out who runs a freemium newsletter for value investors, has built a SaaS app, runs a podcast, and is trying to get better at machine learning right now
Love the drive. The next gen is alright!
If you want traction, don't say 'I have a category defining product'. Instead, say 'the world is broken in this way'. The former is narcissistic, the latter empathetic
Great
#MastersofSaaS
session by
@dharmesh
and
@MohapatraHemant
. Notes at
🧵below [1/]
Lastly, it's not just about what you eat. Portion size is as important. If I overeat healthy things (like chicken breast+feta+capsicum wraps, or wholemeal oats), my blood sugar still spikes a lot
Though overeating high carb things (rajma, pizza) is much worse [6/]
We just enabled self-adaptive learning! When Defog gets a query wrong, it now debug itself automatically.
In this example, the initial query had an edge case for a divide by 0 error. Defog saw the error and fixed it - like a human would.
Day 13/30 in our feature-a-day streak!
TIL about Google's Deplot – an chart to table VLM that works surprisingly well!
Just 282M parameters – quite fast even on CPUs! Can probably fine-tune this to also give good results for statistical charts (like boxplots etc). Will play around!
Exactly 5 years ago, we started making data-driven election videos that would get eventually get 4 million+ views on a shoestring budget – and powered production across YouTube, FB live, and TV!
@nalinmehta
,
@sanjeevrsingh
and I worked our butts off, but had a ton of fun!
Can't believe this works :D This model is all of 12 minutes old at this point (and has been in the works for a month). Will improve over time, but super happy with where it is rn!
Defog now has (rudimentary) reasoning abilities!
You can now ask broad questions, like whether higher prices lead to lower sales, and get human-interpretable answers!
This is Day 9 of our daily product pushes. We like moving fast, and we are just getting started 🛠️
Trained an LLM to answer questions based on my book notes (around a million words in total) – works *really* well!
Inspired by
@NirantK
's Roam model. Getting it to stop hallucinating was a fun challenge. As was getting it to return "I don't know" answers
So
@medha_basu
and I A/B tested our elevator pitch on
@collision
today!
Incredibly kind of John to talk to 2 no-name early stage founders. Very struck by how carefully and thoughtfully he listened
Thanks
@caitbhri
,
@42piyush
and the Stripe folks for making this happen!
Pretty cool to see this on HF trending today :D
Also, building some fun MLX integrations, thanks to
@Ubunta
's awesome MLX port. Already a part of sql-eval in this PR:
ChatGPT has replaced Google as my primary go-to place for technical questions
Doesn't always get it right. But WAY faster to try + debug ChatGPT code than go through SEO clickbait
Me this morning, two americanos in, manically banging out code and headbanging to great music, thinking "it's such a wonderful day!"
Barista, 30 minutes later: Sir, I love that you're having a great time, but could you relax a bit? A patron think that you're high on drugs
😅
Some nerd stuff! 🤓
- If you want a very simple way to play with it, check out our Github repo:
- You'll need a TON of VRAM to this fast. We've found doing AWQ quantizations a really good to keep accuracy high while keeping latency and VRAM low. Would
Wow.
@OpenAI
's product velocity is incredibly inspiring. Amazed to see them move as quickly as they have in the last 12 months. Something for all builders to aspire to!