🔥 It's finally here! I'm excited to announce that
@ShadowTrafficIO
is now available. Head to the home page to get started for free.
For my entire career, I've been baffled by how long it takes to build demos, load tests, and proof-of-concept projects. Everyone's built little
I've been working with
@apachekafka
for 7+ years. I find myself doing the same set of activities every time. Here's a set of Kafka productivity hacks for doing a few things way faster than you're probably doing them now.
@confluentinc
🔥
Things I wish I knew as a 22 year-old engineer:
1. Software engineering is mainly about people, not code. Technical expertise will only take you so far.
2. There's a time to hack together a house of cards and a time to build a masterpiece. Learn the difference.
3. Prioritize
🔥 I can't believe it: 8 months in, and I've just closed my 3rd enterprise customer.
Will I make it long term? Who knows.
But so far, my revenue per employee is higher than many seed-funded startups, and all of my existing customers have expanded 3-4x their initial use cases.
Thinking about writing something on what’s changed since
@martinkl
’s turning the database inside out talk (spoiler: a lot. The practical side is way different now.) Anyone interested?
Since leaving Confluent, I've been asked by so many people how their company can win at stream processing.
Honestly, I don't know, but one thing's been apparent: latency isn't the killer feature.
Whoever wins is going to figure out what it really is.
I’m incredibly excited to share that Distributed Masonry will be joining
@confluentinc
! It’s been amazing journey, and I couldn’t ask for a better exit. 1/
I’ve scratched an itch I’ve had for ages.
Introducing Voluble, an intelligent data generator for
@apachekafka
. Voluble generates streams of realistic events with support for cross-topic relationships, tombstoning, configurable rates, and more.
Many teams have created best practices for how they should name & describe their
@apachekafka
events. When they do this, a kind of murky categorization emerges, but no one can quite make out where the boundaries are. As it turns out, linguists beat us to it a long time ago. 1/
Every now and again I see criticism for using
@apachekafka
for small/low volume projects. If you host it yourself, sure, there’s an ops burden. But I think this misses the point: commit logs are a *hugely* useful primitive for any volume. At least as useful as database tables.
Serious question: are we done with open source?
Projects like Linux aside, just look at what's happened in the data ecosystem.
1. HashiCorp hard-prioritized their commercial offering, creating an irreparable rift with the OpenTofu community.
2. Confluent and Databricks tried
This weekend, I learned that
@NOAA
is just about ready to be take
@OnyxPlatform
to production as the backing streaming engine for their automated surface observing system (ASOS). ASOS serves as a primary climatological observing network in the US.
What if you could have an instantaneous
@GraphQL
API over your
@apachekafka
topics? I wondered just that a few months ago. Kafka would get so much more reach almost for free. Subscriptions over streams, and queries + subscriptions over tables of state — right to your front-end.
🔥 Can't believe I get to post this: two days after I closed my first enterprise customer, I've just closed my second!
I just want to say thanks to everyone who's been following along and encouraging me. It's scary trying something high risk, but I feel like I'm doing exactly
💥 3 months, 3 weeks, and 2 days into my journey, I've closed my first enterprise customer.
When I started my company, I had no users, no product, no code. All I had was a problem statement that I thought might be true.
Since then, I spent nearly all my time doing two things:
Property-based testing is one of those things that gives me childhood excitement every time I use it. I know how it works, but it still feels like magic.
In other
@confluentinc
news, I'm happy to share that I'm now leading up the stream processing product team! Want to work on Kafka Streams and KSQL with me? My group is hiring. DMs open!
My post on streaming+ChatGPT () is dense (it’s a complex subject to unpack!), so here’s a quick thread on just the essentials of what is going on here.
🧵…
Just got my copy of Mastering
@kafkastreams
and
@ksqlDB
. If you’re looking for a complete guide to the practical aspects of stream processing, look no further. Brilliant writing
@kafka_book
!
Introducing Pyrostore: a new streaming storage product that complements Kafka with inexpensive, virtually limitless storage. Replay your entire dataset seamlessly, losslessly, on demand.
So happy to finally take all the wraps off what my team has been working on at
@confluentinc
.
@ksqlDB
is the start of a really powerful idea to make a substantial dent in the complexity of building stream processing applications. 1/
Announcing ksqlDB: the event streaming database for stream processing apps. ksqlDB combines the power of a stream processor with the familiarity of a relational database. Integrates natively with
@apachekafka
. Learn more from
@jaykreps
:
Regular programming is like chess; Stream processing is like poker.
With the former, all the data you need is visible. With the latter, you never know when the next piece of data will arrive—if at all. It's a game of hidden information.
That’s what makes SP so fun to work on.
Today we added 8 new
@apachekafka
tutorials to learn stream processing, including my favorite: rekeying a stream by a function (easy to pick the wrong key when starting and hard to change it later)
@confluentinc
In all my time working with
@apachekafka
, the hardest part has always been step 1: getting something working end-to-end. We're fixing that. Here's a growing collection of use cases with idiomatic development/automated test/prod deploy patterns.⚡️
We’re excited to announce Tutorials for
@apachekafka
, a collection of common event streaming use cases, with each tutorial featuring an example scenario and several complete code solutions. Learn more in our latest blog post by
@MichaelDrogalis
:
Right on. My
#KafkaSummit
Europe talk "How ksqlDB works" was accepted. If you liked my visualization blog series going through the basics, you'll love what I have in store for this one.
Classic database queries vs. streaming database queries.
Processing a query is like solving a puzzle. With a classic DB, you can only see the face of the puzzle (the results) after all the pieces (the data) fit together. With a streaming DB, you can see the face as it evolves.
🔥 Enterprise customer number 4 has entered the chat.
I'll post updated revenue numbers after a few more closes, but right now I want to tell you guys a story.
From 2020 to mid-2023, I'd been puttering on a completely different start up idea. I probably spent 200+ hours of my
If you blinked, you could be excused for missing the herding effect to Apache Iceberg.
In less than a year, all the major players—Snowflake, Confluent, Dremio, and of course Databricks—have professed Iceberg as the table format of choice.
While public sentiment changed almost
People say nothing can prepare you to have a child or lose a parent, and it's true.
I love you mom. Everything good in my life started with you.
Honestly, going back to work, much less doing any of this in public, is the last thing I want to do. But my mom always encouraged me.
Huge update to just landed! The team put a ton of work into making this the best place to learn
@apachekafka
, and it really shows. Neat animations, too. :)
People sometimes perceive the fact that
@ksqlDB
only runs on
@apachekafka
as a limitation, but that's backward. Because connectors are a dedicated part of Kafka, all the problems of talking to external systems are centrally managed.
The result: 200+ ksqlDB compatible connectors.
This is cool: in about 20 lines of Python, I have a public AWS-hosted URL that can stream synthetic data to any Kafka cluster.
I think if I can build some traction around this, it would be a game-changer for tutorials.
No more messing with setting up tedious sample data. Just
Building a new kind of database at Confluent has been fun, but a few of us wanted to talk about why it's not just a slight deviation on the norm.
Here comes a 3-part blog series over the new couple of days about the architecture, SQL layer, and runtime.
🔥 Just released: orchestrate an entire series of streaming, synthetic data generators.
In the video, I show how I populate 4 Kafka topics in precise order:
🌱 First, seed a users stream with base information
💰 Second, generate overlapping login and transaction events
📎
Want to work on something that matters for almost every
@apachekafka
installation? I’m hiring a Product Manager for the stream processing team at
@confluentinc
. Come work on a team with two start-up founders and a founding PM.
DMs are open. :)
For all the good things databases can do, there’s something critical they can’t. I wrote an essay about the query your database can’t answer, and what it might take to get there.
🚕 This is cool: I converted the New York City taxi data set, which is a set of batch files, into a stream of infinite rides using ShadowTraffic. Works for
@apachekafka
and
@PostgreSQL
.
All I had to do was swap in the right generators for geolocation, ride cost, and a few
One of the most impressive parts of
@confluentinc
's tiered storage is that it's completely seamless - something that's been historically hard to achieve. No background jobs, no special clients, no hacky segment copying - simply Kafka with infinite storage.
Infrastructure projects come in all shapes & sizes, but nearly all of them are hard to explain how they work. This is a trend we want to buck with
@ksqlDB
because you cannot rely on what you do not understand. In that vein, here's something new from me.
A little personal news—I'm joining
@benstopford
's team at Confluent in the Office of the CTO! I'll be working on one of my favorite things, making our tech easier to learn/use.
Feels delightfully full circle as Ben was a big reason
@PyrostoreIO
joined Confluent 4 years ago. :)
If you’re a completionist like me, here’s something you can’t hear often enough: give up on poorly written books. The achievement of finishing it isn’t worth the time. Use it on something better.
One of my favorite interview techniques is asking people to spend a meaningful amount of time talking through tradeoffs. You learn so much watching someone scan the surface area of a problem.
If you follow me because you like reading about my open, solo start-up journey, this post is for you. Everyone else, keep scrolling, because this one is raw.
Having done this for almost a year now, here are a few things I now know to be true:
1. Imposter syndrome doesn't go
Building a successful open-source project is hard, but the formula is straightforward:
1. Make something people find useful
2. Make a website that clearly explains what it is
3. Write A+ documentation
4. Spend a lot of time helping people when they get stuck
That's it.
Gonna log off till Monday.
I've dramatically underestimated how much rest I've needed after a difficult life event, so time to go take a bunch of naps and get away from the computer.
While I'm at it, I get asked once in a while about how to make an open source project go big. A lot of it is blunt effort, but there are a few things that will help your chances *a lot*:
The animation on
@ShadowTrafficIO
's home page is uniquely cool, but here's the secret: it's not a toy purpose-built for the website.
It's powered by ShadowTraffic ACTUALLY running in the browser.
Why? I wanted a marketing asset that's a true purple cow. No one else has anything
❤️ As I spend today putting the finishing touches on things, I just want to say thank you to everyone who's supported me over the last 3 months.
Building in public is fun, but it doesn't come naturally to me. I'm just some guy cranking out software alone in my home office.
Transformation complete!
One year ago I decided to put my ~20 year competitive distance running career on pause to see what I could do on a bike. I bought a Specialized road bike, an old Kinetic trainer, and got on
@GoZwift
.
Jan: FTP <200
Dec: FTP 260, hanging with cat A races
Happy 4th birthday
@OnyxPlatform
! You got a major reliability and perf upgrade this year, and tons of usability improvements. This is also the first year that I spent more time as a (happy) user than a contributor. More in store for you in 2018!
🔥 It's alive!
@ShadowTrafficIO
can now connect directly to Kafka or Postgres and generate matching synthetic streams—without you having to learn anything.
Over the last week, I've been hacking with Meta's CodeLlama-instruct models.
I put together a set of prompts teaching
Incredible
@apachekafka
scale at Tencent. 10 trillion messages/day (4 million/second) using federated clustering. Neat to look inside the mechanics of a giant workload.
After days of profiling and grinding, new ShadowTraffic write speeds:
- ~100K events/s to
@apachekafka
, up from 8K/s
- ~150K rows/s to
@PostgreSQL
, up from 3K/s
It's now a heck of a load more suitable for load/stress testing!
If you liked the stream processing animations that I built out earlier in the year, good news—I'm back at it. Some new content coming up at
#KafkaSummit
Europe, and fresh material for an upcoming video series.
Just wanted to say thanks to everyone who's supported us over the last few years. Starting a company is the second hardest thing I've ever done, and getting Pyro off the ground would've been impossible with everyone's help and encouragement.
Traits I've learned from the best engineers I've worked with:
1. Obsess about details when it matters and ignore them when it doesn't.
2. Understand the business reason for everything you work on. The best technical plan might not be the best overall plan.
3. Prioritize the
💰I made an easy way to test Debezium change data capture against Postgres.
The problem: to get interesting data out of Debezium, you need to orchestrate a streaming series of table inserts, updates, and deletes.
The solution: ShadowTraffic's `stateMachine` generator. Tell it
If you're an engineer who's always wanted to start a company, read this.
1. You will never feel ready—just begin. No amount of schooling or other experience will put you at ease.
2. You don't need investors to start. Find what problem you want to solve by talking to as many
To push myself as an entrepreneur, I’m going to try something really hard:
I’m launching 4 startups, with 4 products, in quarters—each one run by just me, as a solopreneur.
I wrote a little about why I’m doing this.
Wish me luck. 🔥
My most prized skill is being able to clearly communicate complex ideas. How did I get it?
By having to continually defend why on earth I program in Lisp.
🧐 With all the new
@ApacheFlink
services coming out, I wonder if people would find it useful to have a consistent set of streams to test their perf/reliability/usability head to head.
I know lots of people are baking off
@RisingWaveLabs
vs.
@Decodableco
vs.
@DeltaStreamInc
vs.
Haven't talked much about what I've been working on since I joined
@confluentinc
, but
@jaykreps
is starting to take the wraps off it today. Check out his
#KafkaSummit
keynote, live streamed at 9:30 AM PST today:
One of the best things programming teaches is how to think in trees. Trees are everywhere.
In coding, modules combine to form trees of execution. In writing, paragraphs, sentences, and words form a tree of ideas. In visual design, objects create a tree of directed attention.