🧠 Data-centric AI platform
⚡ 10-100x faster dev
🤖 Programmatic labeling + models
🚀 Powering Fortune 500 & gov't AI
🗽 Don’t miss
#SnorkelCon24
: Oct 16-17, NY
[1/5] **Spoiler alert** We trained a model with the same accuracy as GPT-3 (fine-tuned) that was 1400x smaller with 0.1% of the inference cost. How? With Data-centric Foundation Model (FM) Development in Snorkel Flow. Highlights in the thread 👇:
Very excited to announce Snorkel v0.9, the biggest update to our open source framework for programmatically labeling, transforming & structuring training datasets for
#ML
. We add new core ops, algs, tutorials, and a full redesign of the core lib
#snorkelML
This week we brought the
#AI
community together to share transformative ideas, practical applications, and new research on
#DataCentricAI
. If you weren't able to join us or want to view the insightful talks again, check out ↓
How to Use Snorkel to Build AI Applications: The why, what, and how of Snorkel’s programmatic data labeling approach and the state-of-the-art
#SnorkelFlow
platform by our Head of Technology and Co-founder,
@bradenjhancock
↓
A big part of the ML workflow is in debugging. However, debugging for ML is hard!
In this post,
@chipro
analyzes major sources of errors & their solutions at the four steps:
* labeling
* feature engineering
* model training
* model evaluation
We're excited to announce Snorkel Flow, a new data-first ML development platform based on the core ideas of Snorkel! After years of research, deployments, and user conversations, we saw that Snorkel was just the first step- read about our path forward here
We are starting a new vodcast called Snorkel
#ScienceTalks
, exploring some of the best ideas to make AI practical. In the 1st episode,
@bradenjhancock
talks to
@Thom_Wolf
about
@HuggingFace
Datasets & Transformers, and taking ML research into production.
Most organizations are in early phases of machine learning adoption, and there are many misperceptions of ML production.
@chipro
explained the 6 common myths in her recent talk at Stanford MLSys Seminar.
What other myths have you encountered?
[1/5] Today, we’re excited to introduce Data-centric Foundation Model Development, a new paradigm for enterprises to use foundation models to solve complex, real-world problems.
Interested in becoming the newest Snorkeler? We have open roles across engineering, sales, marketing, and more to accelerate
#DataCentricAI
for the enterprise. Come join one of the most talented, passionate, and supportive teams in tech! ↓
Design principles for iteratively building
#AI
applications with Snorkel Flow's Application Studio, by Founding Engineer and
#MachineLearning
Engineering Lead,
@vincentsunnchen
↓
In the latest episode of Snorkel
#ScienceTalks
,
@seb_ruder
,
@DeepMind
researcher and
@bradenjhancock
discuss
- NLP repositories of datasets and models
- New benchmarks (GLUE, SuperGLUE,...)
- NLP for low-resource languages
- Emerging trends in NLP
Enjoy!
Team
#Snorkel
is made up of some of the most talented people in ML. Meet
@aarti_bagul
, a machine learning engineer who loves working at the intersection of state-of-the-art research and product management. Learn more about Aarti →
The next episode of Snorkel
#ScienceTalks
is out. Tune in to learn more about
@spacy_io
, industrial-strength NLP, & the importance of bringing together different stakeholders in the ML dev process, from our chat with
@_inesmontani
, founder of
@explosion_ai
In case you missed our
#MLWhiteboard
, where
@HiromuHota
and
@realDanFu
reviewed: "Multi-Resolution Weak Supervision for Sequential Data," presented at NeurIPS 2019, check it out ↓
Team Snorkel is growing fast with some of the brightest minds in AI. Meet Hiromu Hota, a
#MachineLearningEngineer
who enjoys brainstorming, designing, and implementing solutions with fellow Snorkelers to make AI practical.
.
@4shub
from our UX Engineering team dives into some frontend best practices for working with lots of data in this blog post on web virtualization to optimize data-intensive app performance. Check it out ↓
#frontend
#react
Two Snorkel papers at
@NeurIPSconf
this year!
(1) *slicing functions* for monitoring and modeling critical data subsets ();
(2) handling multi-resolution weak supervision for sequential data
@vincentsunnchen
@paroma_varma
@HazyResearch
. Blog posts soon!
In the latest episode of Snorkel
#ScienceTalks
,
@GreylockVC
's Partner
@SaamMotamedi
and our VP of Marketing,
@DevangSachdev
, discuss how data scientists and machine learning engineers can get started with their startup journey. Check it out ↓
In the latest episode of Snorkel
#ScienceTalks
, Snorkel's
@bradenjhancock
chats with
@abigail_e_see
on
#AI
's facts and myths, the challenges of natural language generation (NLG), and the path to large-scale NLG deployment. Check it out ↓
In this paper at
#ICLR2022
), Chris Ré & team at
@Stanford
outline a new principled evaluation framework for comparing slice detection methods, & introduce a new technique motivated by their discoveries that outperforms existing methods by double digits ↓
Team
#Snorkel
is a cross-functional, growing team. Meet Priyal Aggarwal
@priyal_aggarwal
, a
#MachineLearningEngineer
who enjoys building the next generation of AI applications and making memes about software bugs. Learn more about Priyal →
We need to move beyond manual
#datalabeling
for AI to live up to the hype. Make training data creation and management part of the development process with Snorkel Flow, the first
#datacentricAI
platform powered by a programmatic approach ↓
We are honored by Snorkel's selection in the
#EnterpriseTech30
List, a list of the top 30 most promising companies voted by 100+ leading VCs. Excited to propel the data-first revolution for enterprise AI.
Thank you,
@Wing_VC
.
Thank you
@IgorBosilkovski
@Forbes
for the great overview of what we're building at Snorkel AI, tackling the training data problem with a new data-first ML platform, Snorkel Flow. It was great chatting!
Meet Ryan Smith, an ML Research Engineer at Snorkel AI. He is passionate about discovering new ways to solve NLP problems, football, softball, and reading sci-fi and fantasy literature. Learn more about Ryan →
Team Snorkel is growing fast! Meet
@robiriondo
, our head of content, who is in charge of spreading the word from all ends about Snorkel AI. Loves family time, movies, reading, and playing world of warcraft. Learn more about Roberto →
Chris Ré,
@SnorkelAI
co-founder, and
@Stanford
associate professor, talks about Snorkel's research-focused journey to
#DataCentricAI
, from current bottlenecks in ML to tackling these with SotA programmatic labeling and weak supervision approaches ↓
Congrats to
@paroma_varma
, Co-founder and Head of Solutions at Snorkel, for being recognized on
@GVteam
's Impact List, highlighting 25 exceptional women for
#IWD2021
— . We are proud and privileged to work alongside you.
#GVImpact
Last week
@SnorkelAI
was featured on
@Wing_VC
's
#EntepriseTech30
list and on
@Nasdaq
. Read more about
@ajratner
's insights into how the shift to data-first software development shapes the enterprise AI roadmap in an interview with Nasdaq.
Team Snorkel is one of a kind! Meet Aubrea Stone, a talented executive operations manager, passionate about improving efficiency and structure for the executive team, loves family time and the Arizona sunshine. Learn more about Aubrea →
We’re at
#NeurIPS2019
!
Come say 👋 at the poster session on Thu Dec 12 (10:45am - 12:45pm) to chat about *slicing functions* (poster
#67
) and weak supervision over *sequential data* (poster
#110
)!
Two Snorkel papers at
@NeurIPSconf
this year!
(1) *slicing functions* for monitoring and modeling critical data subsets ();
(2) handling multi-resolution weak supervision for sequential data
@vincentsunnchen
@paroma_varma
@HazyResearch
. Blog posts soon!
If you want to learn about:
- weak supervision
- programmatic labeling for creating massive training datasets
Check out
@paroma_varma
's webinar this morning with
@aicampai
. Thank you for having us!
Tune in to
@HazyResearch
and
@StanfordAILab
hosted MLSys Seminar Series on Nov 5th to hear from
@ajratner
on real-world challenges faced by enterprises when deploying ML systems and how to solve them using programmatic training data creation:
Meet
@charli3_w
, a talented full-stack
#softwareengineer
who is helping us build Snorkel Flow across the stack. Loves gaming, yoga, reading, swimming, and playing the piano. Learn more about Charlie ↓
The second episode of Snorkel
#ScienceTalks
will be available on March 10th. In this episode,
@seb_ruder
from
@DeepMind
discusses advances in natural language processing.
An exciting chat between
@ajratner
and
@simran_s_arora
about new research from
@HazyResearch
on how prompting methods enable a 6B parameter model to outperform the 175B parameter GPT-3. Join us on Jan 17, where Simran will dive deeper into her research:
In case you missed our
#MLwhiteboard
, where
@bradenjhancock
talked about his
#NLP
research paper: "Training Classifiers with Natural Language Explanations," presented at ACM 2018, check it out ↓
Team Snorkel has some of the brightest minds in
#softwareengineering
. Meet David Hao, a platform engineer that enjoys solving infrastructure and reliability engineering problems, loves traveling, hiking, and playing indie games. Learn more about David →
📢
@OpenAI
just released their guide to model selection, and it’s music to our data-centric AI hearts! 🎶
💡"By switching from GPT-4o to GPT-4o-mini with fine-tuning, we achieved equivalent performance for less than 2% of the cost using only 1,000 labeled examples."
Read more
We are starting a new thing: ML Whiteboard - an informal session where data scientists, ML engineers, and developers along with Snorkel AI team members join to discuss the latest research and new techniques for machine learning, deep learning, NLP, and more.
AI is having its Linux moment 💥 Models are open-sourced like never before. We are excited to have
@huggingface
as a partner at on Jun 7-8. Join us to learn how to build predictive &
#genAI
apps using the latest open-source models, datasets, & tools.
Team
#Snorkel
is made of some of the most talented people in
#softwareengineering
. Meet Sakshi Gupta, a backend software engineer that works on the ML foundations team, loves reading science fiction, improv, and snorkeling! Learn more about Sakshi →
We used weak supervision to programmatically curate instruction tuning data for open-source LLMs like Llama 2 and RedPajama, enabling more granular error analysis and higher quality—without an army of manual annotators. Links to data and models on the blog!
We had a fantastic time meeting you all!
In case you missed us, check out the write-ups about the work we presented:
* Slice-based Learning:
* WS for Sequential Data:
Meet Molly Friederich, head of solutions marketing at Snorkel AI. She is a product marketing champion. Before Snorkel, she spent six years at
@SendGrid
, followed by
@Twilio
. Learn why Molly joined the Snorkel team ↓
We’re thrilled to partner with Together AI to enable any enterprise to build proprietary LLMs on their data tailored to their specific needs. Learn how both data-centric and model-centric operations are needed to build GPT-You for your business.
We are excited to attend
@NeurIPSConf
#NeurIPS2022
on 11/28 in New Orleans where the
@SnorkelAI
Research team and our academic partners will present five peer-reviewed papers on the latest data-centric AI approaches listed in the 🧵 [1/7]
(1/2) The doors to our new HQ in Redwood City have been opened! We were thrilled to see our Snorkelers, most for the very first time in person, as our team has grown many times in the past 24 months.
Roshni Malani, from Snorkel AI's engineering leadership, discusses how collaboration is one of the fundamental pillars of data-centric AI in this blog post on building AI applications using
#datacentricAI
. Learn more ↓
At 11:15 AM Pacific,
@ananyaku
, ML Researcher
@StanfordAILab
will walk us through a tutorial on FMs and fine-tuning by selectively tuning parts of the model to preserve pretrained information & deliver better out-of-distribution performance. Sign up here:
Drawing from her experiences at
#Netflix
,
#NVIDIA
, and
@SnorkelML
,
@chipro
will talk about bridging the gap between research and production for ML at
@Ai4Conferences
. The talk is online and Ai4 has free passes. Join us today at 2.50 PM EST!
Another Snorkel-driven
@NatureComms
paper published this week, led by Stanford Researcher and Snorkel Research member
@jasonafries
: weakly supervised
#NER
, combining large medical ontologies for state-of-the-art performance and rapid COVID-19 research!
#SnorkelAI
will be at
#MLOps2021
! Including
@vincentsunnchen
, a founding engineer and MLE lead, and
@priyal_aggarwal
, ML engineer, will discuss "Iterative Development Workflows for Building AI Applications." Join us on June 17 at 11:45 AM PT ↓
It is nearly impossible to get perfect data in this imperfect world. Labels can be inexact/indirect, incomplete/limited, inaccurate/noisy, multimodal, sparse, sequential, or flawed in other ways.
@AnimaAnandkumar
discusses overcoming data imperfections →
We are honored to be named one of America's Best Startup Employers by
@Forbes
😊. This award recognizes startup companies that carry out groundbreaking work, invest in employees, and demonstrate strong growth. Read more 👇
(1/2) Large language models embed a lot of useful knowledge in their pre-trained weights, but they are typically insufficient solutions on their own, either due to knowledge gaps or the inability to transfer what they know. But there’s another way →
#TBT
to 2016 when
@SnorkelAI
team and researchers at
@StanfordAILab
introduced Data Programming - a new paradigm in which users express weak supervision strategies or domain heuristics as labeling functions at
@NeurIPSConf
Meet Tim Sedwitz, head of revenue operations at Snorkel AI. He has deep strategy experience in enterprise software and is passionate about working out, hiking, and golfing. Learn more about Tim →
Meet Shenell Glover, our federal business development and capture manager, who helps us identify and establish federal relationships. Loves helping organizations scale financially, cooking, and being a tech enthusiast. Learn more about Shenell →
🌟 Thrilled to announce: "MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records" just won the "Best Findings Paper in Generative AI for Health" at the ML4H Symposium! 🏥💡 Authored by Scott Fleming, Alejandro Lozano, Snorkel AI
AI has so much potential to improve healthcare. However, there are still many practical and ethical challenges to be overcome for AI to deliver value.
In this post, one of our engineers,
@bclyang
, analyzes these challenges and possible solutions.
We’re excited to welcome Devang Sachdev as VP of Marketing to help us bridge the gap between machine learning developers and business leaders looking to make AI their edge.
Check out
@vincentsunnchen
presenting Snorkel-based weak supervision methods for scene graph prediction at
#iccv19
, today at the SGRL Workshop and tomorrow afternoon at the conference (Poster 1.2,
#130
)!
Structured prediction requires large training sets, but crowdsourcing is ineffective— so, existing models ignore visual relationships without sufficient labels.
Our method uses 10 relationship labels to generate training data for any scene graph model!
At Snorkel AI, we believe in data, diversity, and democracy. Our team is excited to
#vote
and we want you to vote too. Retweet to receive a custom designed t-shirt featuring our very own Dr. Bubbles.
#EveryVoteCounts
At 11:45 AM Pacific,
@simran_s_arora
, ML researcher
@HazyResearch
, will dive deeper into how a 6B parameter open-sourced model performed better than 175B parameter GPT-3 on 15 tasks. Check out other talks and sign up here:
Finance has so much potential use for AI.
@ManasJoglekar
shared challenges & solutions for AI finance, drawn from his experience working with the US's top banks:
* Unstructured, multimodal, long-tail data
* Regulation
* Changing business objectives
* From electrical engineering to ML
* From MATLAB to Python
* From PhD to building one hell of a startup
* From OSS to a commercial platform
Catch
@bradenjhancock
sharing his journey into ML and challenges of ML in production in
@PracticalAIFM
!
Check out
@paroma_varma
talking about Snuba, a system for automating generation of labeling functions for Snorkel, at
#VLDB2019
today! Chat with her at the conference (weds. poster 1.3) if interested in Snuba, , or weak supervision more broadly!
Honored to sponsor this effort to:
- bring more researchers from underrepresented groups to
#ICLR2021
- advance theory, methods, and tools for weak supervision!
🌟Diversity Funding🌟
To increase diversity,
#WeaSuL2021
will offer subsidies to researchers from underrepresented groups to facilitate their participation in the workshop and in
#ICLR2021
.
Please fill out this form to apply:
Also, please retweet!
(1/5) In this post,
@MayeeChen
discusses Liger: a simple method that provides theoretical and empirical improvements over standard weak supervision methods and empirically outperforms KNN and adapter baselines on FM embeddings ↓
Meet Victoria Lo, a frontend software engineer at Snorkel AI. She is a talented and versatile engineer, passionate about CSS, memes, and trying new things. Learn more about Victoria →
ML models are becoming commodities you can pip install. The hard part is in the long tail, fine details that confound even the large, powerful models.
Our co-founder Chris Re discussed how software 2.0 is an approach to build AI-based quality software!