Matei Zaharia Profile Banner
Matei Zaharia Profile
Matei Zaharia

@matei_zaharia

Followers
42,108
Following
1,199
Media
164
Statuses
2,638

CTO at @Databricks and CS prof at @UCBerkeley . Working on data+AI, including @ApacheSpark , @DeltaLakeOSS , @MLflow , .

Berkeley, CA
Joined October 2010
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@matei_zaharia
Matei Zaharia
1 year
Lots of people are wondering whether #GPT4 and #ChatGPT 's performance has been changing over time, so Lingjiao Chen, @james_y_zou and I measured it. We found big changes including some large decreases in some problem-solving tasks:
Tweet media one
122
789
3K
@matei_zaharia
Matei Zaharia
2 years
Building a ChatGPT-like LLM might be easier than anyone thought. At @Databricks , we tuned a 2-year-old open source model to follow instructions in just 3 hours, and are open sourcing the code. We think this tech will quickly be democratized.
43
507
3K
@matei_zaharia
Matei Zaharia
7 months
Interesting trend in AI: the best results are increasingly obtained by compound systems, not monolithic models. AlphaCode, ChatGPT+, Gemini are examples. In this post, we discuss why this is and emerging research on designing & optimizing such systems.
30
262
1K
@matei_zaharia
Matei Zaharia
2 years
ChaatGPT: More real than anyone thought?
Tweet media one
29
75
1K
@matei_zaharia
Matei Zaharia
1 year
Very excited to return to UC Berkeley as a professor starting this week. I’ll be collaborating with the Sky Lab, @UCBEPIC , @berkeley_ai and others!
@Berkeley_EECS
UC Berkeley EECS
1 year
@Berkeley_EECS welcomes @matei_zaharia , who returns to Berkeley EECS as an Associate Professor. Matei’s research interests include computer systems and machine learning. He is also the co-founder and Chief Technologist of Databricks. Welcome back, Matei!
0
6
147
51
44
868
@matei_zaharia
Matei Zaharia
4 years
Pretty sure I've seen people driving with only 19 neurons too!
@MIT_CSAIL
MIT CSAIL
4 years
This autonomous car can drive itself using only 19 control neurons. Video: More: (work w/ @ISTAustria @tuvienna ) #SelfDrivingCars #Autonomy #ML #DL #MachineLearning
11
273
698
6
124
783
@matei_zaharia
Matei Zaharia
1 year
We're launching two comprehensive online courses on building and using Large Language Models! The first is on using LLMs in applications, covering topics like prompt engineering, embeddings, chains, and MLOps. The second teaches you to build your own LLMs.
9
147
674
@matei_zaharia
Matei Zaharia
6 months
At Databricks, we've built an awesome model training and tuning stack. We now used it to release DBRX, the best open source LLM on standard benchmarks to date, exceeding GPT-3.5 while running 2x faster than Llama-70B.
13
134
668
@matei_zaharia
Matei Zaharia
2 years
We've just launched a version of Dolly on HuggingFace, with new examples showing its capabilities. This is all with just 50k training examples. Stay tuned for new versions with other datasets soon.
9
93
566
@matei_zaharia
Matei Zaharia
2 years
Who are the World Cup champions? I knew ChatGPT would get it wrong when it launched, but it's surprising that all the new search+LLM engines do too. Combining retrieval+LMs won't just be a matter of prompting. That's why we've been building tools like DSP at Stanford to do it.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
22
58
531
@matei_zaharia
Matei Zaharia
11 months
Thrilled to receive this award; the credit is due to my students, my mentors, my collaborators in academia and open source, and my colleagues at Databricks for making all this work happen!
@ACMSIGOPS
ACM SIGOPS
11 months
The 2023 @ACMSIGOPS Mark Weiser was presented to @matei_zaharia for innovation and impact in large-scale data processing. The award was announced at @sospconf From next year, awards will be announced annually as @sospconf is now an annual conference. See you in #austin in 2024
0
9
59
55
38
462
@matei_zaharia
Matei Zaharia
2 months
Not a problem with Lakehouse.
@roshanpateI
Roshan Patel
2 months
my friend works in fashion. i set her up with one of my tech homies. this is how it went.
Tweet media one
4K
5K
273K
17
25
434
@matei_zaharia
Matei Zaharia
4 years
Thanks @pbailis , hope you let me graduate soon!
Tweet media one
22
10
433
@matei_zaharia
Matei Zaharia
6 years
Super excited to announce MLflow, a new open source Machine Learning platform from Databricks to manage the complete machine learning lifecycle:
6
208
424
@matei_zaharia
Matei Zaharia
1 year
For example, GPT-4's success rate on "is this number prime? think step by step" fell from 97.6% to 2.4% from March to June, while GPT-3.5 improved. Behavior on sensitive inputs also changed. Other tasks changed less, but there are definitely singificant changes in LLM behavior.
11
60
420
@matei_zaharia
Matei Zaharia
1 year
One of my favorite announcements: English SDK for @ApacheSpark ! No more need to remember weird syntax, just chain transformations in natural language with the familiar Spark API. So many fun examples.
Tweet media one
12
73
416
@matei_zaharia
Matei Zaharia
4 years
We've started a great collaboration between @PyTorch and @MLflow , to bring a rich set #MLOps functionality to PyTorch users. We've been working on this with the PyTorch team for a while and we're super excited to release a first wave of integrations today:
3
85
380
@matei_zaharia
Matei Zaharia
4 years
Due to COVID19, we decided to make #SparkAISummit virtual and also *free* for anyone to attend this year! We still have the same great program with over 200 talks and keynotes from @NateSilver538 , @jenniferchayes , @apaszke and more. Tune in for the largest data & AI summit ever.
@Data_AI_Summit
#DataAISummit
4 years
We can’t wait to solve the world’s toughest problems — and it starts with #SparkAISummit , the world’s largest data and machine learning conference. As a global virtual event, we'll converge to shape the future of big data, analytics and AI. Join us:
Tweet media one
0
25
57
10
165
370
@matei_zaharia
Matei Zaharia
1 year
Our MOOC on Large Language Models: Application through Production started today! Join me, Sam Raymond, Chengyin Eng and Joseph Bradley from Databricks as we cover how to build end-to-end apps with LLMs, including components like vector DBs and chains.
4
64
367
@matei_zaharia
Matei Zaharia
1 year
Our new MOOC on #LLM Foundation Models from the Ground Up is now available! Join me, Chengyin Eng, @sjraymond , Joseph Bradley and @abhi_venigalla for a detailed look at how LLMs are built, how to improve them, and where the field is going.
5
86
360
@matei_zaharia
Matei Zaharia
3 years
Congrats to my student @codyaustun (with @pbailis ) on defending his PhD today! Cody did amazing work improving the resource and data efficiency of deep learning, including widely used benchmarks (DAWNBench/MLPerf), perf analysis, and new 10-1000x faster algorithms (SVP & SEALS).
Tweet media one
Tweet media two
Tweet media three
14
36
352
@matei_zaharia
Matei Zaharia
2 months
Does long context solve RAG? We found that many long-context models fail in specific and weird ways as you grow context length, making the optimal system design non-obvious. Some models tend to say there's a copyright issue, some tend to summarize, etc.
12
78
339
@matei_zaharia
Matei Zaharia
1 year
How can you efficiently evaluate RAG-based LLM applications like document question answering? We've tested several methods on our internal question answering applications at Databricks and found some effective ways to do this using LLMs.
1
61
326
@matei_zaharia
Matei Zaharia
5 years
I'm super honored to have received a #PECASE award this year. Percy Liang from @StanfordNLP also got one, which is great news for Stanford CS. Congrats to everyone else who received one!
19
26
316
@matei_zaharia
Matei Zaharia
10 months
This thread highlights a point we've been seeing in for a while: you can't meaningfully talk about capabilities of a *language model*, you have to talk about capabilities of a *system*, including the inference algorithm. 32-CoT is not the same as 5-shot.
@AravSrinivas
Aravind Srinivas
10 months
. @JeffDean why the need to do 32-CoT Gemini Ultra vs 5-shot GPT-4? Why not just report 5-shot vs 5-shot?
20
20
724
6
41
316
@matei_zaharia
Matei Zaharia
4 years
For @VLDB2020 , we wrote a paper on @DeltaLakeOSS , one of the most exciting new technologies from Databricks. By adding ACID transactions over cloud object stores, we can provide data-warehouse-like capabilities & performance on low-cost, HA cloud storage.
Tweet media one
5
91
306
@matei_zaharia
Matei Zaharia
4 years
AI research today
Tweet media one
5
46
298
@matei_zaharia
Matei Zaharia
3 years
Databricks just set a new record on the official TPC-DS data warehousing benchmark, showing that a lakehouse system based on open data formats can outperform previous DW systems. Don't listen to folks who say open means bad performance!
4
69
289
@matei_zaharia
Matei Zaharia
4 years
Excited to share our #Lakehouse technical paper published at #CIDR21 . We describe a new class of data platforms that are (1) completely open, (2) efficiently support #MachineLearning , and (3) provide all traditional #DataWarehouse capabilities+performance.
Tweet media one
6
89
288
@matei_zaharia
Matei Zaharia
4 years
#ApacheSpark 3.0 greatly simplifies writing Python user-defined functions through type hints, and makes it easier for your functions to process data efficiently in batches via Pandas and Apache Arrow. Check out how to use them:
2
77
277
@matei_zaharia
Matei Zaharia
4 years
To be fair, if you're asking someone who worked on Windows, "shut down and restart" worked pretty well there.
Tweet media one
5
31
277
@matei_zaharia
Matei Zaharia
1 year
This is a big release: we've spent the past 3 years working on LLM pipelines and retrieval-augmented apps in my group, and came up with this rich programming model based on our learnings. It not only defines but *automatically optimizes* pipelines for you to get great results.
@lateinteraction
Omar Khattab
1 year
🚨Announcing 𝗗𝗦𝗣𝘆, the framework for solving advanced tasks w/ LMs. Express *any* pipeline as clean, Pythonic control flow. Just ask DSPy to 𝗰𝗼𝗺𝗽𝗶𝗹𝗲 your modular code into auto-tuned chains of prompts or finetunes for GPT, Llama, and/or T5.🧵
Tweet media one
Tweet media two
24
138
644
1
50
279
@matei_zaharia
Matei Zaharia
6 years
#ApacheSpark 2.4 is out today! This release has tons of new features including barrier execution mode for ML applications, higher-order functions in SQL, optional eager evaluation for previewing DataFrames in Jupyter, Scala 2.12 support and more.
0
126
263
@matei_zaharia
Matei Zaharia
7 years
I'm co-organizing a new conference on Systems for Machine Learning starting in February; our first call for papers is up at , so submit your interesting SysML work by Jan 5th!
7
147
257
@matei_zaharia
Matei Zaharia
1 year
As we worked with customers using LLMs, a common pattern we saw was that everyone wanted to add a layer in front of the LLM API to manage credentials, rate limits, etc, and to easily swap between models. We've built this the open source @MLflow AI Gateway:
4
52
248
@matei_zaharia
Matei Zaharia
1 year
We're very excited to be one of the launch partners for Meta's Llama 2 🦙! We got to test Llama 2 in advance and were very impressed. The new version also has a much more permissive license. We've set everything up so you can run it on Databricks today.
1
44
241
@matei_zaharia
Matei Zaharia
1 year
Cool to see this model from @MosaicML being trained on RedPajama and Dolly data. Fully open source AI is becoming a reality -- open source efficient training, curated web dataset, and instruction data. Still early and small model but it will get better.
1
45
247
@matei_zaharia
Matei Zaharia
4 months
Super excited about the new Agent Framework, Tool Catalog, Vector Search, Evaluation and Training capabilities we launched today in Mosaic AI. We see more companies building compound AI systems, and we have created an end-to-end environment to do this.
Tweet media one
2
49
239
@matei_zaharia
Matei Zaharia
3 years
We just posted ColBERTv2, which dramatically reduces the space usage of ColBERT and gets state-of-the-art information retrieval quality on MS MARCO as well as out-of-domain on BEIR🍺, Open-QA retrieval, and our new long-tail task benchmark LoTTE☕️.
5
52
239
@matei_zaharia
Matei Zaharia
7 months
Want to efficiently query a vector DB while filtering on structured attributes? My student Liana Patel, together with @petereliaskraft and @guestrin , modified HNSW to do this efficiently in ACORN, to appear at SIGMOD:
Tweet media one
4
41
236
@matei_zaharia
Matei Zaharia
1 year
Databricks just published our #StateofDataAI report, with interesting trends at our enterprise customers: 1. Adoption of LLMs is booming, with use of SaaS LLM APIs exploding since #ChatGPT launched, but the largest use (and growth) still in custom LLMs.
Tweet media one
2
57
232
@matei_zaharia
Matei Zaharia
1 year
Very cool to see Dolly-v2 hit #1 trending on HuggingFace Hub today. Stay tuned for a lot more LLM infra coming from Databricks soon. And register for our @Data_AI_Summit conference to hear the biggest things as they launch -- online attendance is free.
Tweet media one
2
39
228
@matei_zaharia
Matei Zaharia
3 years
Large NLP models are expensive and opaque, but maybe it doesn't have to be that way. This exciting work with Omar Khattab and @ChrisGPotts uses retrieval to set SotA results in hard NLP tasks at low cost. Our Baleen paper will be a spotlight at NeurIPS.
3
43
233
@matei_zaharia
Matei Zaharia
1 year
Want to build your own chat AI from scratch? We're launching a Building LLMs course at @Data_AI_Summit to teach everyone how to build a Dolly clone: . Tiny model, big attitude, for anyone. #DemocratizeAI
Tweet media one
6
40
216
@matei_zaharia
Matei Zaharia
4 years
It's hard to believe that #ApacheSpark was first released as a research project 10 years ago! My @SparkAISummit keynote (live now) goes through the lessons in the past 10 years and what's new in #ApacheSpark 3.0.
Tweet media one
6
42
212
@matei_zaharia
Matei Zaharia
11 months
As good a time to say this as any: if you’re on the AI research job market, Databricks is hiring, with the mission to democratize AI. We power amazing customer use cases and we publish. Check or reach out.
5
29
212
@matei_zaharia
Matei Zaharia
4 years
Databricks is now available on @googlecloud ! We've also built great integrations with BigQuery, Looker, GCS and Google AI services across the product.
@databricks
Databricks
4 years
Open #lakehouse platform meets open #cloud with unified data engineering, data science and analytics. Learn more about Databricks on @GoogleCloud :
0
20
48
7
42
211
@matei_zaharia
Matei Zaharia
2 years
Very excited that @ApacheSpark won the SIGMOD System Award this year. Congrats to the whole community behind the project!
@sigmod
ACM SIGMOD
2 years
2022 ACM SIGMOD Awards Edgar F. Codd Innovations Award goes to Dan Suciu. Contributions Award goes to Christian S. Jensen. Test-of-Time Award goes to “NoDB: Efficient Query Execution on Raw Data Files”. Systems Award goes to “Apache Spark”. Congrats!
2
33
142
5
24
207
@matei_zaharia
Matei Zaharia
2 years
We updated the code for Dolly so it only trains in 30 minutes now. It’s nice to be able to experiment quickly with instruction tuning.
@vagabondjack
Mike Conover
2 years
We’re actively updating the Dolly repo with model improvements! Make sure to pull the latest changes. At $30 / 30min per training run it’s dead simple to run multiple experiments. Also, 688 stars in 20 hours! Neat!
8
42
238
2
45
205
@matei_zaharia
Matei Zaharia
5 years
I gave a keynote at @ACMSoCC about lessons from building a large-scale cloud service at @Databricks . Did you know that Databricks runs millions of VMs/day to process exabytes of data with <200 engineers? Slides here:
Tweet media one
2
55
203
@matei_zaharia
Matei Zaharia
4 years
Congrats to the #ApacheSpark community on the 3.0 release! Over 440 developers contributed 3400 patches to this release, with big improvements in SQL performance, ANSI SQL support, Python usability and management features.
@ApacheSpark
Apache Spark
4 years
[ANNOUNCEMENT] Congrats to the Apache Spark community and all the contributors! The Apache Spark 3.0 is here. Try it out!
9
302
617
1
62
194
@matei_zaharia
Matei Zaharia
4 months
Congratulations and so well deserved, Omar! It's been fantastic working together.
@lateinteraction
Omar Khattab
4 months
I'm excited to share that I will be joining MIT EECS as an assistant professor in Fall 2025! I'll be recruiting PhD students from the December 2024 application pool. Indicate interest if you'd like to work with me on NLP, IR, or ML Systems! Stay tuned for more about my new lab.
251
93
2K
3
8
197
@matei_zaharia
Matei Zaharia
4 years
Exciting times at @Databricks . We're hiring in all departments, so take a look if you want to help shape the next generation of infrastructure for data and AI.
@TechCrunch
TechCrunch
4 years
Databricks raises $1B at $28B valuation as it reaches $425M ARR by @alex and @ron_miller
0
22
78
3
25
196
@matei_zaharia
Matei Zaharia
1 year
Meet #LakehouseIQ : a knowledge engine from your enterprise that understands your business & data to power AI apps. Every platform is adding an AI assistant, but in data, LLMs don't just work out of the box, because every org has its own jargon, data, etc.
Tweet media one
11
88
185
@matei_zaharia
Matei Zaharia
5 months
I'm co-organizing the inaugural research workshop on Compound AI Systems on June 13th: . Send in your work on designing & optimizing such systems! Thrilled to have @RichardSocher , @MonicaSLam and @polynoamial as speakers, and host this at @Data_AI_Summit .
2
35
194
@matei_zaharia
Matei Zaharia
5 years
Presentation videos from #SysML19 are now up. Find all the talks on YouTube here:
0
81
186
@matei_zaharia
Matei Zaharia
4 years
We also have a big announcement for @MLflow today: it's joining the @linuxfoundation as a long-term vendor-neutral home to host the project! We've been blown away with how fast MLflow has grown and hope this leads to even more contributors.
1
79
184
@matei_zaharia
Matei Zaharia
16 days
Really cool to see OpenAI o1 launched today. It's another example of the trend towards compound AI systems, not models, getting the best AI results. I'm sure that future versions will not only scale inference, but also use tools (coding, search, etc) for better results.
@matei_zaharia
Matei Zaharia
7 months
Interesting trend in AI: the best results are increasingly obtained by compound systems, not monolithic models. AlphaCode, ChatGPT+, Gemini are examples. In this post, we discuss why this is and emerging research on designing & optimizing such systems.
30
262
1K
5
25
188
@matei_zaharia
Matei Zaharia
5 years
Second big announcement is open sourcing Databricks Delta as Delta Lake. Delta dramatically simplifies building reliable data lakes on HDFS and cloud storage through ACID transactions, indexes and scalable metadata handling. More info here:
5
108
176
@matei_zaharia
Matei Zaharia
3 years
Really cool to see @MLflow as the second-most popular ML tracking tool in this year's @kaggle survey (only behind TensorBoard), given that it started in 2018! We're excited to bring easy, open source observability to all ML workflows.
Tweet media one
2
25
180
@matei_zaharia
Matei Zaharia
6 months
The great thing is that for customers wishing to build such models that natively understand their data, the cost could be even less. We have the checkpoints, data cleaning pipeline, instruction tuning pipeline, etc from DBRX — just apply these to your data.
@ClementDelangue
clem 🤗
6 months
Just $10M and two months to train from scratch a GPT3.5 - Llama2 level model. For context, it probably cost 10-20x more to OAI just a year ago! The more we improve as a field thanks to open-source, the cheaper & more efficient it gets! All companies should now train their own
Tweet media one
11
56
502
1
21
164
@matei_zaharia
Matei Zaharia
4 months
We just posted the first release of open source Unity Catalog! It supports tables, unstructured data, and AI, and we have a great set of partners across data and AI integrating with it. Read more at
@databricks
Databricks
4 months
. @matei_zaharia just open sourced Unity Catalog LIVE at #DataAISummit !
Tweet media one
2
8
60
3
32
180
@matei_zaharia
Matei Zaharia
6 months
Probably the thing I’m most excited about with DBRX, it’s super fast! Easily 150 tokens/s for quality comparable to much slower closed models.
@natolambert
Nathan Lambert
6 months
Okay @databricks what're you cooking behind this space its so fast lmao
Tweet media one
7
18
174
6
30
174
@matei_zaharia
Matei Zaharia
2 months
How can you make LLM-as-judge reliable in specialized domains? Our applied AI team developed a simple but effective approach called Grading Notes that we've been using in Databricks Assistant. We think this can help anyone doing domain-specific AI!
4
30
171
@matei_zaharia
Matei Zaharia
3 years
Congrats to my student @deepakn94 for defending his PhD! Deepak worked on a ton of exciting systems and ML research, including Weld, DAWNBench/MLPerf, and most recently pipelining methods for efficient DNN training, including PipeDream-2BW (ICML'21) and Megatron's 1T param model.
Tweet media one
5
7
168
@matei_zaharia
Matei Zaharia
11 months
MLflow 2.8 is out today, with new support for LLM-based eval metrics among other features. Read about how we've been using it to improve our RAG apps at Databricks, like our docs assistant:
2
29
169
@matei_zaharia
Matei Zaharia
10 months
Everyone is doing RAG on unstructured docs, but what if you want to mix in structured business data? Databricks RAG can connect to feature tables & functions to query the latest data in your catalog, all with centralized governance, security and MLOps.
3
29
162
@matei_zaharia
Matei Zaharia
4 years
Really proud of my student @sppalkia who passed his (online) PhD defense today! He's the first of my students to graduate, and he did awesome work accelerating data applications with Weld, Mozart and other systems. You can see his talk and slides here:
1
26
164
@matei_zaharia
Matei Zaharia
4 months
Thrilled that Forrester named Databricks a Leader in their report on AI Foundation Models in enterprise! We help organizations build the best AI for *their* domain and data, using the best techniques available, with a world-class research team to back it.
Tweet media one
5
39
165
@matei_zaharia
Matei Zaharia
1 year
Apache Spark (and Databricks) are getting first-class support in @HuggingFace ! You can now rapidly load data from these engines for HuggingFace training and inference, giving up to 40% speedups.
2
20
160
@matei_zaharia
Matei Zaharia
4 years
One of my favorite features in the upcoming #ApacheSpark 3.0 is Adaptive Query Execution (AQE), which tunes number of reduce tasks, join algorithms and skew joins automatically. Learn how it works and how it speeds up TPC-DS queries by up to 8x:
0
42
159
@matei_zaharia
Matei Zaharia
1 year
Everyone’s excited about vector DBs, but there’s a lot to do to get truly high quality retrieval systems! Check out this paper benchmarking quality, latency and cost.
@aviaviavi__
Avi Sil
1 year
#acl2023 findings paper for folks working on retrieval leaderboards- Read on: ✅ We show multi-dimensional tradeoffs e.g. quality , latency & cost (instead of just F1) ✅ Metrics that include concrete efforts e.g. DynaScore. -- Code in PrimeQA:
Tweet media one
1
28
87
2
26
160
@matei_zaharia
Matei Zaharia
7 years
The new research group I'm part of at Stanford, DAWN, is building infrastructure for usable machine learning:
1
71
154
@matei_zaharia
Matei Zaharia
6 months
We’re hiring for the RAG / AG research team at Databricks. Come help make AI even better at incorporating real-time data and external tools.
@mcarbin
Michael Carbin
6 months
“How’s your sabbatical?” Well…DBRX is GREAT at RAG! If you’ve been using Mixtral/Llama2/GPT3.5, then try DBRX! The combination of RAG with its SoTA capabilities on knowledge/code/reasoning will unlock new CompoundAI opportunities.
Tweet media one
5
21
149
2
21
154
@matei_zaharia
Matei Zaharia
1 year
So excited about this -- bringing amazing platforms for data and AI together. @NaveenGRao , @hanlintang and @jefrankle have built an amazing team that has steadily reduced the cost of AI training and released breakthroughs like the first open source LLMs with >64K context.
@NaveenGRao
Naveen Rao
1 year
Today we’re announcing plans for @MosaicML to join forces with @databricks ! We are excited at the possibilities for this deal including serving the growing number of enterprises interested in LLMs and diffusion models.
58
66
659
4
16
152
@matei_zaharia
Matei Zaharia
1 year
Want to build your own conversational AI from open datasets and your own data? Join this free webinar on April 25th with some of the Dolly authors:
Tweet media one
2
44
153
@matei_zaharia
Matei Zaharia
5 years
Just in time for my lecture on data quality at Stanford.
Tweet media one
5
12
154
@matei_zaharia
Matei Zaharia
10 months
Sad about the chaos around OpenAI, which was crazier than anyone imagined, and how it’s affecting people, especially those on visas. I hope everyone lands on their feet!
4
13
152
@matei_zaharia
Matei Zaharia
1 year
We want to run a longer study on this and would love your input on what behaviors to test!
21
9
148
@matei_zaharia
Matei Zaharia
5 years
Congrats to the whole team at Databricks for the continued ultra-fast growth! We're hiring in all roles to continue simplifying how organizations work with data through technologies such as @DeltaLakeOSS , @MLflow , @ApacheSpark and more.
@databricks
Databricks
5 years
We're excited to announce that we've raised $400 million to continue our rapid global growth and engineering expansion, an investment that brings our valuation to $6.2 billion. Learn more:
3
61
217
1
27
146
@matei_zaharia
Matei Zaharia
1 month
Welcome Omar, and really excited to keep working together on research along with the DSPy community.
@lateinteraction
Omar Khattab
1 month
Some personal news: I'm thrilled to have joined @Databricks @DbrxMosaicAI as a Research Scientist last month, before I start as MIT faculty in July 2025! Expect increased investment into the open-source DSPy community, new research, & strong emphasis on production concerns 🧵.
49
28
638
5
6
146
@matei_zaharia
Matei Zaharia
10 months
We've just released a suite of awesome features for building high-quality RAG apps on Databricks: . In talking with enterprises, we found quality was often the top concern with RAG, so we help teams monitor and improve it at all levels of the stack.
3
30
140
@matei_zaharia
Matei Zaharia
4 years
#PySpark downloads are growing 3x year-on-year. As a result, the @ApacheSpark community is investing a lot in making its Python APIs easier as part of "Project Zen". Read about some of the work currently in progress, including type hints, viz and docs:
3
42
138
@matei_zaharia
Matei Zaharia
2 years
Pretty accurate!
@vijayv500
Vijay Vankayalapati
2 years
3
25
110
5
18
138
@matei_zaharia
Matei Zaharia
4 months
We're serious about an open, compatible foundation for all enterprise data. Very excited to work with the @tabulario team to make the open source data ecosystem even better.
@alighodsi
Ali Ghodsi
4 months
Databricks to acquire @tabulario , a data platform from the original creators of Apache Iceberg. Together, we will bring format compatibility to the lakehouse for @DeltaLakeOSS and @ApacheIceberg
11
84
376
4
21
130
@matei_zaharia
Matei Zaharia
2 years
Super excited about this work, and it's open source! One of the coolest open source frameworks from my research group. It lets developers use language-based models (including retrievers) in a composable way to build complex apps.
@lateinteraction
Omar Khattab
2 years
Introducing Demonstrate–Search–Predict (𝗗𝗦𝗣), a framework for composing search and LMs w/ up to 120% gains over GPT-3.5. No more prompt engineering.❌ Describe a high-level strategy as imperative code and let 𝗗𝗦𝗣 deal with prompts and queries.🧵
Tweet media one
32
197
986
2
18
138
@matei_zaharia
Matei Zaharia
5 months
Tweet media one
2
7
140
@matei_zaharia
Matei Zaharia
1 year
I'm excited to participate in the LLMs in Production virtual conference on June 15-16! I will be speaking about "The Emerging Toolkit for Reliable, High-quality LLM Applications". Register here to join:
Tweet media one
4
26
135
@matei_zaharia
Matei Zaharia
9 months
Proud to see Databricks named a leader in the Gartner CDBMS MQ for the 3rd year, advancing in both dimensions! We’ve made so many improvements to the platform this year and we’re just getting started with data intelligence, marketplace, cleanrooms & more.
Tweet media one
6
22
136
@matei_zaharia
Matei Zaharia
8 months
A lot happened in Databricks SQL in 2023 -- no wonder it's one of the fastest growing data warehouse platforms. Read how we improved latency and concurrency, made it serverless, and began automatically optimizing most workloads with AI:
3
21
133
@matei_zaharia
Matei Zaharia
4 years
I'll be opening up @SparkAISummit tomorrow with a talk on what's new in @ApacheSpark 3.0. This release greatly improves SQL & Python support, including 2x speedup on TPC-DS, adaptive execution to reduce tuning, ANSI SQL, and new Python APIs. Short summary:
1
39
131
@matei_zaharia
Matei Zaharia
7 years
All the videos from #SparkSummit 2017 are now up online for free! Check them out at
0
89
128