Jack Vanlightly Profile Banner
Jack Vanlightly Profile
Jack Vanlightly

@vanlightly

Followers
3,594
Following
219
Media
99
Statuses
1,396

@confluentinc thinking about event streaming. Previously @Splunk , @VMware @vanlightly @discuss .systems Credit: ESO/B. Tafreshi ()

Barcelona, Spain
Joined November 2016
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@vanlightly
Jack Vanlightly
1 year
Then there's the Jepsen tests for humbling you into the right frame of mind for building these systems. Learn how to break things so you can build better things.
3
88
649
@vanlightly
Jack Vanlightly
11 months
Queue semantics are coming to Apache Kafka (KIP-932) and in fact there are many advantages to building queues on top of logs rather than opting for a more queue-native design.
2
45
183
@vanlightly
Jack Vanlightly
2 months
I'm digging into Apache Iceberg internals for the final table format consistency model blog post. Part of my process of understanding a project from its code is making a (throwaway) map of the important classes and functions. This is especially important in the early hours of
Tweet media one
3
21
144
@vanlightly
Jack Vanlightly
1 year
As promised, I have written a complete Kafka replication protocol description (with KIP-966 changes applied) which is inspired by the precise but accessible style and language of the Raft paper.
2
30
128
@vanlightly
Jack Vanlightly
6 months
Chapter 6 of The Architecture of Serverless Data Systems is out. This chapter focuses on commonalities in how these systems scale according to tenant load. Despite the varied workloads, patterns emerge that we can learn from.
1
22
123
@vanlightly
Jack Vanlightly
1 month
The final part of my Apache Iceberg consistency model series is out, covering the formal verification work. It's also the end of the table format consistency model series, but not the end of my writing about the table formats. More still to come...
0
14
120
@vanlightly
Jack Vanlightly
1 month
The first post in my “Understanding Apache Iceberg’s Consistency Model” is out. This post covers the internals of the read and write path, details on metadata manipulation, concurrency control, and so on. If you ever wanted a more in-depth post about Iceberg internals, this is
1
19
119
@vanlightly
Jack Vanlightly
1 month
Understanding Apache Iceberg's Consistency Model Part 2 - details of concurrency control and data conflict checks to allow Apache Iceberg to handle multiple concurrent writers correctly.
1
12
115
@vanlightly
Jack Vanlightly
4 years
I have open sourced my TLA+ specification for Apache BookKeeper here: It detected a data loss scenario that could occur during the ledger recovery process. Checkout the details here:
3
34
112
@vanlightly
Jack Vanlightly
11 months
I've written a primer on formal verification and TLA+ so I can refer to that whenever I write posts about specific aspects of TLA+. I have a two-parter on liveness that is ready to go after this.
1
35
110
@vanlightly
Jack Vanlightly
3 months
How do the costs compare of implementing a low-latency write-ahead-log (WAL) on S3 Express One Zone and one implemented as a State-Machine-Replication (SMR) system (such as Paxos/Raft/Kafka)? I built a cost model for both to find out.
1
28
106
@vanlightly
Jack Vanlightly
8 months
Do we need new columnar file formats in the era of cloud object storage?
Tweet media one
3
12
102
@vanlightly
Jack Vanlightly
1 month
I'm working on a set of blog posts that compare the internals of Apache Iceberg, Delta Lake, Apache Hudi and Apache Paimon. No benchmarking, no judgments etc, just a comparison of internal mechanics.
5
4
92
@vanlightly
Jack Vanlightly
8 months
Chapter 5 of the Architecture of Serverless Data Systems: Serverless ClickHouse Cloud. Part 1 looks at OSS ClickHouse. Part 2 looks at the serverless architecture of ClickHouse Cloud.
4
17
79
@vanlightly
Jack Vanlightly
3 years
The Splunk Messaging-as-a-Service team now has a team engineering blog 😄. First posts, our work to formally verify Apache BookKeeper with TLA+.
3
23
78
@vanlightly
Jack Vanlightly
1 year
Kora: A Cloud-Native Event Streaming Platform For Kafka paper won Best Industry Paper at the VLDB conference. There's so much work going on behind the scenes on the Kora engine that I think we'll need another paper this time next year 😀
1
10
73
@vanlightly
Jack Vanlightly
4 months
Object storage direct-write Kafka topics are coming to Confluent. In data system design, cost is a kind of unstoppable force that rewrites architectures every decade or so, and that is happening right now for high-volume streams.
6
17
72
@vanlightly
Jack Vanlightly
8 months
I'd never really looked into OLAP database internals before, but my research into ClickHouse for chapter 5 of the Architecture of Serverless Data Systems is fascinating. I also get the feeling that the most insights will come when I look at my second OLAP system (probably Pinot)
5
6
71
@vanlightly
Jack Vanlightly
5 years
I have a nice Apache Kafka vs Apache Pulsar post in the works regarding time-travel and multi-topic subscriptions. In case you're wondering, Pulsar got it right and Kafka leaves a lot to be desired.
4
17
69
@vanlightly
Jack Vanlightly
3 years
There is so much to say about the BookKeeper protocol. I have as promised started to write about the details in a new series:
2
22
66
@vanlightly
Jack Vanlightly
1 month
My previous work looked at each table format in isolation. Now it's time to compare them. How do Apache Iceberg, Delta Lake, Apache Hudi and Apache Paimon store the canonical set of files that make up a table? What are the similarities and differences?
2
14
62
@vanlightly
Jack Vanlightly
24 days
The 2nd post in my table format comparison series is out. Append-only tables and incremental reads are one of the pillars of streaming in the table-format space. This post looks at how each of the four table formats supports this workload.
0
16
61
@vanlightly
Jack Vanlightly
6 years
Finally completed part 1 of putting @apache_pulsar 's durability claims to the test. Spoiler alert, I couldn't make it lose data, or deliver any messages out-of-order. Still, I think it makes for interesting reading for those interested in how it works.
0
35
57
@vanlightly
Jack Vanlightly
4 months
With the announcement of S3-native-streams (Freight clusters), here is a commentary on Confluent strategy regarding object storage, streaming and an open data architecture.
1
15
55
@vanlightly
Jack Vanlightly
15 days
The thing is, systems have traditionally been designed around paying for capacity of network and disks, not the number of IO requests. That fundamentally changes with S3, and right now, I don't think that cost model is compatible with S3 being a universal storage layer.
2
4
57
@vanlightly
Jack Vanlightly
5 years
The RabbitMQ that people know is changing: - bit by bit we're replacing the old consensus protocols with Raft - we have new queue types - better upgrade features - better observability - Kubernetes support and more in the works.
3
27
55
@vanlightly
Jack Vanlightly
9 months
Model checking KIP-848 (Kafka's next consumer group balancing protocol), 18 hours in on 32 CPU threads, I've used 800GB of my 2TB NVMe. I'd say its 50-50 whether my workstation can handle this one, might have to go distributed mode.
Tweet media one
2
3
51
@vanlightly
Jack Vanlightly
4 years
Using I can clearly explain how Apache Pulsar supports a replication factor of 2 safely, providing continued read/write availability with the loss of one bookie. Systems like Apache Kafka simply cannot do this, minimum for safety and availability is 3.
Tweet media one
1
12
53
@vanlightly
Jack Vanlightly
11 months
Queues for Kafka (KIP-932), my biggest ask for Apache Kafka since joining Confluent. Really excited about this one. It's not just queue semantics, but queues which can do replay.
Tweet media one
2
10
52
@vanlightly
Jack Vanlightly
2 years
Tweet media one
1
10
51
@vanlightly
Jack Vanlightly
16 days
With conditional writes, S3 takes one more step towards the universal storage layer. But can it go the next level? Is it possible to make a (profitable) storage service that hits all three needs of latency, cost, and durability?
3
3
50
@vanlightly
Jack Vanlightly
3 months
GCP is continuing to innovate its cloud storage, with hierarchical namespace buckets which have a directory structure like an actual filesystem. Can rename, list directories etc. Very cool.
1
19
48
@vanlightly
Jack Vanlightly
2 years
I've created a model checking optimized TLA+ spec of Raft plus a couple of variants such as exploring when to fsync and also Flexible Raft (flexible quorums of Flexible Paxos).
1
9
44
@vanlightly
Jack Vanlightly
3 years
I'm moving again! Today I joined @confluent in a role more focused on product, leaving behind the pager and hands on engineering.
8
1
44
@vanlightly
Jack Vanlightly
2 years
Broken the first ground on a TLA+ spec based on "Viewstamped Replication Revisited"
2
7
42
@vanlightly
Jack Vanlightly
4 months
When I first joined Confluent in Feb 2022, looking for a place to make an impact, it was this proposal that excited me most. I had come from Splunk, running Apache Pulsar as a service internally. We wanted to scale up Pulsar to take on Splunk's big ingestion workloads but ran the
@maheshb
Mahesh Balakrishnan
4 months
In Feb 2022, @ghaz and I wrote an internal proposal at Confluent arguing for a "cost-saving design (e.g., writing to S3 directly) that can eliminate cross-AZ traffic costs for high-rate elephant workloads". I was hoping to apply my research on Corfu and
8
19
110
4
8
43
@vanlightly
Jack Vanlightly
8 months
I think I'm done with chapter 5 - serverless ClickHouse. If there are any engineers with deep understanding of CH internals, I'm open to reviews before I publish. Half is dedicated to open-source CH, so you don't need to work for CH Cloud to be a helpful reviewer :)
4
4
41
@vanlightly
Jack Vanlightly
5 months
Apache Hudi is pretty difficult to understand compared to Iceberg. I'm writing a high-level TLA+ spec to try and nail down the behavior.
1
1
40
@vanlightly
Jack Vanlightly
2 years
17 hours on 28 cores and TLC finds its first defect with VSR revisited. State transfer as described in the paper can cause data loss.
Tweet media one
3
6
40
@vanlightly
Jack Vanlightly
1 year
I'm writing a complete Kafka replication protocol description with a style which lies between the formal style of a paper and the informal style of a blog post. I'm curious if people think it would be best deployed as a single post or spread over multiple posts?
7
3
38
@vanlightly
Jack Vanlightly
10 months
The Architecture of Serverless Data Systems will resume in January with chapter 5. Right now I have to dedicate myself to formally verifying KIP-848, the new Kafka consumer rebalancing protocol before the holidays. Also need to move all my Kafka TLA+ into the Apache Kafka repo.
0
2
36
@vanlightly
Jack Vanlightly
9 months
One thing I'm seeing from my series on serverless data systems, is that an extremely diverse set of systems, APIs and workloads can make use of S3 as the primary storage layer. S3 is not a threat to these systems but an opportunity, if it can be integrated well.
@vanlightly
Jack Vanlightly
9 months
@criccomini There are the APIs (Kafka API, Pulsar API, Flink API) and then there are the distributed systems behind those APIs. S3 can become the default storage layer for everything, without deprecating the APIs (although I think some APIs will need to evolve).
0
0
6
2
2
38
@vanlightly
Jack Vanlightly
10 months
Chapter 1 - Amazon DynamoDB. I made this chapter 1 because it really epitomizes the large-scale multi-tenant data architecture I've been talking about recently and it has been evolving for 11 years already!
1
7
38
@vanlightly
Jack Vanlightly
2 years
Tweet media one
1
9
36
@vanlightly
Jack Vanlightly
5 years
Some things I've learned about distributed data systems design in the last 6 months:
1
9
35
@vanlightly
Jack Vanlightly
9 months
I think the new S3 Express One Zone looks interesting for an S3 WAL (given the pricing model). Typically the WAL needs faster but smaller disks, then the actual data disks need to be bigger and cheaper. This maps well to Express for WAL and standard for long term data.
3
2
35
@vanlightly
Jack Vanlightly
13 days
Such a familiar story, I was always an average student but did enough to get by, it was only after I read a book on SQL Server locking internals that I found something that got me hooked - a healthy obsession as Phil calls it. For me that has been how distributed/data systems
@eatonphil
Phil Eaton
13 days
I wrote an essay on my mistakes trying to convince people to do something, on doing what you want to do, and on obsession. Ended with a personal note on developing healthy discipline, and having fun. :)
Tweet media one
17
20
244
0
3
34
@vanlightly
Jack Vanlightly
6 years
Seriously, I was ready to throw my PC out the window two days ago due to TLA+. Today is my moment of triumph as my spec detected a defect in my protocol. I've never been so ecstatic to find a defect in my own design that escaped all randomized testing of my implementation.
3
5
33
@vanlightly
Jack Vanlightly
1 year
I love these reading lists. Of special interest to me this time round is the Model Checking Guided Testing for Distributed Systems paper which maps TLA+ specifications to implementations and generates tests from the state space. /1
@AlekseyCharapko
Aleksey Charapko
1 year
After a longer summer break, we will resume the distributed systems reading group next week. Here is the list of papers for our Fall'23 term:
1
39
114
1
8
33
@vanlightly
Jack Vanlightly
4 years
Here's how Pulsar avoids split-brain when ZK loses visibility of the owner broker of a topic and a 2nd broker takes on leadership (meaning for a short moment we have 2 brokers thinking they are the owner of the topic).
Tweet media one
4
7
32
@vanlightly
Jack Vanlightly
14 days
An interesting paper on building a virtual disk on top of S3-like storage. The main downer is that it's still only prefix consistency (given high latency of S3 and the need to do batching to avoid small objects and keep the request rate in check for economics).
0
7
34
@vanlightly
Jack Vanlightly
4 years
Next week I start a new role at Splunk working on Apache BookKeeper primarily. The Apache Pulsar/BookKeeper stack is amazing and I am really excited to be contributing to it full time from next week.
4
4
33
@vanlightly
Jack Vanlightly
11 days
The semantics section of this Snowflake paper nicely describes some of the core concepts of incrementally generating/consuming change streams. It's helped me write more cohesively about CDC support in the table formats (which is WIP).
1
6
49
@vanlightly
Jack Vanlightly
21 days
My next table format comparison post is on native support for CDC. So far, I've been digging around the code of Delta and Hudi. As always, it's fascinating to see their similarities and differences.
1
3
33
@vanlightly
Jack Vanlightly
2 years
The VSR state transfer data loss invariant violation detected by TLC. I posited one that required 4 view changes when I read the paper, but TLC surprised me with this one with 3 view changes.
Tweet media one
5
5
33
@vanlightly
Jack Vanlightly
3 years
My stages of writing a non-trivial TLA+ spec
Tweet media one
2
6
32
@vanlightly
Jack Vanlightly
5 years
Tweet media one
1
10
31
@vanlightly
Jack Vanlightly
3 years
The BookKeeper replication protocol is so nuanced I'm still having aha moments. BK devs need to tread very carefully. I should write a blog post about all the surprising and non-intuitive things about it.
5
3
29
@vanlightly
Jack Vanlightly
1 year
This looks really juicy. I was planning on reading the OmniPaxos next but this one is too interesting.
0
10
29
@vanlightly
Jack Vanlightly
6 years
I've finished my first real TLA+ spec and I have to say that it feels like a real achievement. It took 18 days of evenings and weekends with lots of mind bending and mental sweat but it was all worth it.
2
1
28
@vanlightly
Jack Vanlightly
1 year
Something I love about TLA+ is the liveness checking. I can assert that state A leads to state B (A ~> B). Then the model checker will find histories where the protocol is unable to reach state B from state A.
Tweet media one
1
6
29
@vanlightly
Jack Vanlightly
2 years
Here's my guide to benchmarketing:
Tweet media one
3
12
28
@vanlightly
Jack Vanlightly
2 years
I'm a big fan of Apache Flink which is why I am ecstatic that Immerok are joining us. Really amazing news 😍😍😍
@confluentinc
Confluent
2 years
🎉 We’re excited to share our intent to acquire @Immerokcom ! Together, we’ll build a cloud-native service for @apacheflink that delivers the same simplicity, security, & scalability that you expect from Confluent for Kafka. Learn more →
Tweet media one
2
69
200
0
1
29
@vanlightly
Jack Vanlightly
2 months
I'm putting the final touches on my deep dive into Apache Paimon blog post, and a formal specification written using Fizzbee. This one was both a lot of hard work but also a lot of fun. Posting tomorrow hopefully 🤞
3
1
27
@vanlightly
Jack Vanlightly
2 years
Just got back from #tlaconf at #StrangeLoop where @lemmster and I had our "Obtaining statistical properties by simulating specs with TLC" talk. Slides are available () but I'll probably get around to writing a blog series about the technique.
1
3
27
@vanlightly
Jack Vanlightly
4 months
I'm working on a Delta Lake TLA+ specification today. Mostly done. I suppose I should write a Delta Lake consistency blog post next.
0
0
27
@vanlightly
Jack Vanlightly
2 years
I'm working on a new replication protocol. If Flexible Paxos and Cheap Paxos had a kid, and also Raft and Apache BookKeeper had a kid and then those kids grew up and they had a baby, that baby would be my protocol.
2
3
26
@vanlightly
Jack Vanlightly
10 months
I'm working on a series that analyzes real-world serverless data system architectures, with a particular focus on how different systems do multi-tenancy with good tenant isolation.
3
2
27
@vanlightly
Jack Vanlightly
2 years
1\ One thing that neither the Raft thesis or paper discusses is reusing the same server identity, specifically in the context of reconfiguration.
3
5
27
@vanlightly
Jack Vanlightly
4 years
At Vueling Airlines I built an architecture mapping tool that maintained the whole graph of services, databases, queues, object storage, etc and their relationships in Neo4j, all built programmatically from config files and other sources. It was the only way to understand it all.
@QuinnyPig
Corey Quinn
4 years
When I go to a new cloud environment, people apologize because the architecture diagram is out of date. Spoiler: Everyone's architecture diagram is out of date. That is the nature of the universe. Smile, nod, and accept it. Have a listen:
3
10
32
3
7
25
@vanlightly
Jack Vanlightly
2 years
Low-code is a fancy way of saying high-yaml.
0
0
26
@vanlightly
Jack Vanlightly
4 months
I'm digging into Apache Paimon now. Unfortunately, there is no spec/protocol doc, so diving into the code. I'm still in the "groping in the dark" phase of code reading. Initial impressions are that if Apache Iceberg, Apache Hudi and Delta had a baby, it might be Paimon.
3
3
26