Jack Vanlightly @vanlightly profile

Jack Vanlightly

@vanlightly

Followers

3,594

Following

219

Media

99

Statuses

1,396

@confluentinc thinking about event streaming. Previously @Splunk , @VMware @vanlightly @discuss .systems Credit: ESO/B. Tafreshi ()

https://t.co/XB9GX653sy

Barcelona, Spain

Joined November 2016

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Linkin Park • 180942 Tweets

Adalet • 180261 Tweets

Gazze • 115749 Tweets

Chester • 94159 Tweets

ベイマックス • 94026 Tweets

小泉進次郎 • 86384 Tweets

#INZM_HyperBandLive • 45079 Tweets

#笑うマトリョーシカ • 41100 Tweets

知的レベル • 40845 Tweets

雇用統計 • 36293 Tweets

資さんうどん • 35938 Tweets

大阪府警 • 33800 Tweets

エイリアン • 33059 Tweets

Game Day • 22514 Tweets

#ساعه_استجابه • 22209 Tweets

神ちゃん • 19289 Tweets

#Aスタプラス • 15707 Tweets

NANON LETSGO NYFW • 14035 Tweets

KIDNAP CHAPTER ONE • 13197 Tweets

ビーズリー • 11064 Tweets

ロムルス • 10958 Tweets

シングルス • 10361 Tweets

報道ステーション

Tesehki

辞職申し入れ

サンボマスター

ビッグラン

柳川くん

加賀美先生

わしほー

フレッド

清家さん

国枝さん

BGYO DESERVES BETTER

Go Birds

べいまっくす

パイフロ

チャイちゃん

ケアロボット

キングダムハーツ3

れおほー

ハヌッセン

マチャド

上地結衣

ローション相撲

上地さん

どらほー

清家一郎

上地選手

#ترندك_بسعر_مميز_ΘΘ9б17ΘбちΘOΘб

Last Seen Profiles

@ana_vaquer

@com_in_

@LitAgent

@Amit09502696

@ChampionsLnet

@Pinkydust95

@wifeofdurk2

@AngusGunn01

@williams_l15386

@_saim8

@Mikedeluca4890

@lewis_gildart

@Aubrey9268

@kit5110

@sosauceyyy

@JonasSantoh

@pjbond

@chrrycheol

@MaracleMan

@jgodoy1732

Jack Vanlightly

@vanlightly

1 year

Then there's the Jepsen tests for humbling you into the right frame of mind for building these systems. Learn how to break things so you can build better things.

3

88

649

Jack Vanlightly

@vanlightly

1 year

Redpanda bring out benchmark after benchmark claiming performance superiority over Apache Kafka. I decided to run my own tests to see if any of it was true.

Kafka vs Redpanda Performance - Do the claims add up? — Jack Vanlightly

Apache Kafka has been the most popular open source event streaming system for many years and it continues to grow in popularity. Within the wider ecosystem there are other open source and source...

jack-vanlightly.com

17

118

520

Jack Vanlightly

@vanlightly

10 months

Chapter 4 of The Architecture of Serverless Data Systems: CockroachDB (serverless).

Serverless CockroachDB - ASDS Chapter 4 (part 1) — Jack Vanlightly

CockroachDB is a distributed SQL database that aims to be Postgres-compatible. Over the years, the Postgres wire protocol has become a standard of sorts with many database products implementing its...

jack-vanlightly.com

1

33

241

Jack Vanlightly

@vanlightly

10 months

Introducing "The Architecture of Serverless Data Systems". An ongoing review of real-world serverless, multi-tenant data systems.

The Architecture of Serverless Data Systems — Jack Vanlightly

I recently blogged about why I believe the future of cloud data services is large-scale and multi-tenant, citing, among others, S3. “ Top tier SaaS services like S3 are able to deliver amazing...

jack-vanlightly.com

2

55

239

Jack Vanlightly

@vanlightly

11 months

Queue semantics are coming to Apache Kafka (KIP-932) and in fact there are many advantages to building queues on top of logs rather than opting for a more queue-native design.

The advantages of queues on logs — Jack Vanlightly

With the announcement of KIP-932, Queues for Kafka , I thought it was worthwhile a revisit of the subject of queues vs logs and how we actually can build better queues on top of logs.

jack-vanlightly.com

2

45

183

Jack Vanlightly

@vanlightly

2 months

I'm digging into Apache Iceberg internals for the final table format consistency model blog post. Part of my process of understanding a project from its code is making a (throwaway) map of the important classes and functions. This is especially important in the early hours of

3

21

144

Jack Vanlightly

@vanlightly

4 months

I sometimes get asked for advice about how to learn complex distributed systems. I thought about it and wrote this piece.

Learning and reviewing system internals: tactics and psychology — Jack Vanlightly

Every now and then I get asked for advice on how to learn about distributed system internals and protocols. Over the course of my career I've picked up a learning and reviewing style that works...

jack-vanlightly.com

3

42

145

Jack Vanlightly

@vanlightly

16 days

S3 finally supports conditional writes:

Amazon S3 now supports conditional writes - AWS

Discover more about what's new at AWS with Amazon S3 now supports conditional writes

aws.amazon.com

4

36

136

Jack Vanlightly

@vanlightly

1 year

BYOC is something I’ve been thinking about recently so I decided to write down the thoughts I have on it and where I think cloud services are going in general.

On the future of cloud services and BYOC — Jack Vanlightly

My job at Confluent involves a mixture of research, engineering and helping us figure out the best technical strategy to follow. BYOC is something I’ve been thinking about recently so I decided to...

jack-vanlightly.com

6

25

129

Jack Vanlightly

@vanlightly

1 year

As promised, I have written a complete Kafka replication protocol description (with KIP-966 changes applied) which is inspired by the precise but accessible style and language of the Raft paper.

2

30

128

Jack Vanlightly

@vanlightly

6 months

Chapter 6 of The Architecture of Serverless Data Systems is out. This chapter focuses on commonalities in how these systems scale according to tenant load. Despite the varied workloads, patterns emerge that we can learn from.

Scaling models and multi-tenant data systems - ASDS Chapter 6 — Jack Vanlightly

What is scaling in large-scale multi-tenant data systems, and how does that compare to single-tenant data systems? How does per-tenant scaling relate to system-wide scaling? How do scale-to-zero and...

jack-vanlightly.com

1

22

123

Jack Vanlightly

@vanlightly

1 month

The final part of my Apache Iceberg consistency model series is out, covering the formal verification work. It's also the end of the table format consistency model series, but not the end of my writing about the table formats. More still to come...

Understanding Apache Iceberg’s Consistency Model Part 3 — Jack Vanlightly

In this final part of my Apache Iceberg series on its consistency model, I’ll cover the formal specification I wrote for it and the results of model checking.

jack-vanlightly.com

0

14

120

Jack Vanlightly

@vanlightly

1 month

The first post in my “Understanding Apache Iceberg’s Consistency Model” is out. This post covers the internals of the read and write path, details on metadata manipulation, concurrency control, and so on. If you ever wanted a more in-depth post about Iceberg internals, this is

1

19

119

Jack Vanlightly

@vanlightly

1 year

Here's a post that explains how Kafka uses recovery in its replication protocol so it can avoid the need for fsyncs.

Why Apache Kafka doesn't need fsync to be safe — Jack Vanlightly

TLDR: Apache Kafka doesn’t need fsyncs to be safe because it includes recovery in its replication protocol. It is a real-world distributed system that uses asynchronous log writing + recovery with...

jack-vanlightly.com

9

25

118

Jack Vanlightly

@vanlightly

10 months

Chapter 3 of The Architecture of Serverless Data Systems: Neon - Serverless PostgreSQL.

Neon - Serverless PostgreSQL - ASDS Chapter 3 — Jack Vanlightly

Neon is a serverless Postgres service based on an architecture similar to Amazon Aurora . It separates the Postgres monolith into disaggregated storage and compute. The motivation behind this...

jack-vanlightly.com

1

25

116

Jack Vanlightly

@vanlightly

1 month

Understanding Apache Iceberg's Consistency Model Part 2 - details of concurrency control and data conflict checks to allow Apache Iceberg to handle multiple concurrent writers correctly.

Understanding Apache Iceberg’s Consistency Model Part 2 — Jack Vanlightly

In this post, we will explore Apache Iceberg's concurrency control and data conflict checks which provide compute engines with support for offering transactions with Serializable and Snapshot...

jack-vanlightly.com

1

12

115

Jack Vanlightly

@vanlightly

6 months

New blog post! Tableflow: The stream/table, Kafka/Iceberg duality.

Tableflow: the stream/table, Kafka/Iceberg duality — Jack Vanlightly

Confluent just announced Tableflow, the seamless materialization of Apache Kafka topics as Apache Iceberg tables. This announcement has to be the most impactful announcement I’ve witnessed while at...

jack-vanlightly.com

3

34

113

Jack Vanlightly

@vanlightly

4 years

I have open sourced my TLA+ specification for Apache BookKeeper here: It detected a data loss scenario that could occur during the ledger recovery process. Checkout the details here:

Existing fencing not enough to prevent data loss · Issue #2614 · apache/bookkeeper

I have implemented the BookKeeper replication protocol in TLA+. You can find the specification and a readme here: https://github.com/Vanlightly/bookkeeper-tlaplus The TLA+ model checker found a dat...

github.com

3

34

112

Jack Vanlightly

@vanlightly

11 months

I've written a primer on formal verification and TLA+ so I can refer to that whenever I write posts about specific aspects of TLA+. I have a two-parter on liveness that is ready to go after this.

A primer on formal verification and TLA+ — Jack Vanlightly

The aim of this post is to give the reader an understanding of why formal methods exist and an introduction to TLA+ including the conceptual model of how it represents data and time. From here, you...

jack-vanlightly.com

1

35

110

Jack Vanlightly

@vanlightly

3 months

How do the costs compare of implementing a low-latency write-ahead-log (WAL) on S3 Express One Zone and one implemented as a State-Machine-Replication (SMR) system (such as Paxos/Raft/Kafka)? I built a cost model for both to find out.

A Cost Analysis of Replication vs S3 Express One Zone in Transactional Data Systems — Jack Vanlig...

Is it economical to build fault-tolerant transactional data systems directly on S3 Express One Zone, instead of using replication? Read on for an analysis. Cloud object storage is becoming the...

jack-vanlightly.com

1

28

106

Jack Vanlightly

@vanlightly

8 months

Do we need new columnar file formats in the era of cloud object storage?

3

12

102

Jack Vanlightly

@vanlightly

4 months

New blog post on analyzing Delta Lake's consistency model using TLA+.

Understanding Delta Lake's consistency model — Jack Vanlightly

A few days ago I released my analysis of Apache Hudi’s consistency model , with the help of a TLA+ specification . This post will do the same for Delta Lake. Just like the Hudi post, I will not...

jack-vanlightly.com

0

21

96

Jack Vanlightly

@vanlightly

4 months

New blog post on understanding Apache Hudi's consistency model using TLA+, with special focus on multi-writer scenarios.

Understanding Apache Hudi's Consistency Model Part 1 — Jack Vanlightly

Apache Hudi is one of the leading three table formats (Apache Iceberg and Delta Lake being the other two). Whereas Apache Iceberg internals are relatively easy to understand, I found that Apache Hudi...

jack-vanlightly.com

0

26

93

Jack Vanlightly

@vanlightly

2 months

Understanding Apache Paimon's Consistency Model is out. One more table format down in this series.

Understanding Apache Paimon's Consistency Model Part 1 — Jack Vanlightly

Apache Paimon is an open-source table format that has come after the more established Apache Iceberg, Delta Lake and Apache Hudi projects. It was born in the Apache Flink project where it was known...

jack-vanlightly.com

1

22

91

Jack Vanlightly

@vanlightly

1 month

I'm working on a set of blog posts that compare the internals of Apache Iceberg, Delta Lake, Apache Hudi and Apache Paimon. No benchmarking, no judgments etc, just a comparison of internal mechanics.

5

4

92

Jack Vanlightly

@vanlightly

11 months

Liveness properties are often overlooked in TLA+ but they are actually surprisingly important. This is part 1 of a two part post on liveness properties by example.

The importance of liveness properties (with TLA+ Part 1) — Jack Vanlightly

Invariants get most of the attention because they are easy to write, easy to check and find those histories which lead to really bad outcomes, such as lost data. But liveness properties are really...

jack-vanlightly.com

2

19

86

Jack Vanlightly

@vanlightly

9 months

S3 Express One Zone - still not the storage primitive I've been waiting for.

S3 Express One Zone, not quite what I hoped for — Jack Vanlightly

AWS just announced a new lower-latency S3 storage class and for those of us in the data infrastructure business, this is big news. It’s not a secret that a low-latency object storage primitive has...

jack-vanlightly.com

2

18

82

Jack Vanlightly

@vanlightly

8 months

Chapter 5 of the Architecture of Serverless Data Systems: Serverless ClickHouse Cloud. Part 1 looks at OSS ClickHouse. Part 2 looks at the serverless architecture of ClickHouse Cloud.

Serverless ClickHouse Cloud - ASDS Chapter 5 (part 1) — Jack Vanlightly

See The Architecture of Serverless Data Systems introduction chapter to find other serverless data systems. This chapter is the first system of group 3 - the analytics database group. ClickHouse is...

jack-vanlightly.com

4

17

79

Jack Vanlightly

@vanlightly

3 years

The Splunk Messaging-as-a-Service team now has a team engineering blog 😄. First posts, our work to formally verify Apache BookKeeper with TLA+.

Detecting Bugs in Data Infrastructure using Formal Methods (TLA+ Series Part 1)

How the Splunk Messaging as a Service team uses formal methods to improve system reliability and resilience.

medium.com

3

23

78

Jack Vanlightly

@vanlightly

1 year

Kora: A Cloud-Native Event Streaming Platform For Kafka paper won Best Industry Paper at the VLDB conference. There's so much work going on behind the scenes on the Kora engine that I think we'll need another paper this time next year 😀

1

10

73

Jack Vanlightly

@vanlightly

4 months

Object storage direct-write Kafka topics are coming to Confluent. In data system design, cost is a kind of unstoppable force that rewrites architectures every decade or so, and that is happening right now for high-volume streams.

Introducing Confluent Cloud Freight Clusters

Confluent Cloud Freight clusters are now available in Early Access. In this blog, learn how Freight clusters can save you up to 90% at GBps+ scale.

www.confluent.io

6

17

72

Jack Vanlightly

@vanlightly

1 year

We've published a new KIP that improves the durability of Apache Kafka with its asynchronous storage engine configured. See the high level description in my blog post.

Kafka KIP-966 - Fixing the Last Replica Standing issue — Jack Vanlightly

The Kafka replication protocol just got a new KIP that improves its durability when running without fsync. As I previously blogged, Why Kafka Doesn’t Need Fsync to be Safe , there are distributed...

jack-vanlightly.com

2

15

70

Jack Vanlightly

@vanlightly

2 years

TLA+ is more than just for specifying systems and model checking - understand statistical properties of your systems and algorithms too.

Obtaining Statistical Properties by Simulating Specs with TLC - Jack...

In this talk, we describe how simulation can be used to obtain statistical properties ofalgorithms and how we can apply this technique using TLA+ with TLC in...

www.youtube.com

1

14

69

Jack Vanlightly

@vanlightly

8 months

I'd never really looked into OLAP database internals before, but my research into ClickHouse for chapter 5 of the Architecture of Serverless Data Systems is fascinating. I also get the feeling that the most insights will come when I look at my second OLAP system (probably Pinot)

5

6

71

Jack Vanlightly

@vanlightly

5 years

I have a nice Apache Kafka vs Apache Pulsar post in the works regarding time-travel and multi-topic subscriptions. In case you're wondering, Pulsar got it right and Kafka leaves a lot to be desired.

4

17

69

Jack Vanlightly

@vanlightly

3 years

There is so much to say about the BookKeeper protocol. I have as promised started to write about the details in a new series:

Apache BookKeeper Insights Part 1 — External Consensus and Dynamic Membership

Series Introduction

medium.com

2

22

66

Jack Vanlightly

@vanlightly

3 months

This is a big deal. No cross-AZ networking charges in Azure is official. Will the other cloud providers follow suite?

Update on Inter-Availability Zone Data Transfer Pricing | Azure up...

Azure will not charge for the data transfer across availability zones to help customers build more resilient and efficient applications on the cloud by leveraging availability zones. This will enab...

azure.microsoft.com

3

13

66

Jack Vanlightly

@vanlightly

1 month

My previous work looked at each table format in isolation. Now it's time to compare them. How do Apache Iceberg, Delta Lake, Apache Hudi and Apache Paimon store the canonical set of files that make up a table? What are the similarities and differences?

Table format comparisons - How do the table formats represent the canonical set of files? — Jack...

This is the first in a series of short comparisons of table format internals. While I have written in some detail about each, I think it’s interesting to look at what is the same or similar and what...

jack-vanlightly.com

2

14

62

Jack Vanlightly

@vanlightly

24 days

The 2nd post in my table format comparison series is out. Append-only tables and incremental reads are one of the pillars of streaming in the table-format space. This post looks at how each of the four table formats supports this workload.

Table format comparisons - Append-only tables and incremental reads — Jack Vanlightly

This post is about how the table formats support append-only tables and incremental reads. Streaming is becoming more and more important in the data analytics stack and the table formats all have...

jack-vanlightly.com

0

16

61

Jack Vanlightly

@vanlightly

1 year

Is sequential IO dead in the era of the NVMe drive? Spoiler: I don't think so.

Is sequential IO dead in the era of the NVMe drive? — Jack Vanlightly

Two systems I know pretty well, Apache BookKeeper and Apache Kafka, were designed in the era of the spinning disk, the hard-drive or HDD. Hard-drives are good at sequential IO but not so good at...

jack-vanlightly.com

0

15

57

Jack Vanlightly

@vanlightly

6 years

Finally completed part 1 of putting @apache_pulsar 's durability claims to the test. Spoiler alert, I couldn't make it lose data, or deliver any messages out-of-order. Still, I think it makes for interesting reading for those interested in how it works.

How to (not) Lose Messages on an Apache Pulsar Cluster — Jack Vanlightly

In this post we’ll put the protocols we covered in the Understanding How Apache Pulsar Works post to the test. As in previous tests of How to Lose Messages on a RabbitMQ Cluster and How to Lose...

jack-vanlightly.com

0

35

57

Jack Vanlightly

@vanlightly

3 years

My talk on my experience of modelling a complex system in both TLA+ and Maelstrom is out.

Jack Vanlightly — Distributed systems showdown — TLA + vs real code

Hydra 2022 — June 2-3Info and tickets: https://bit.ly/3ni5Hem— —When we design a distributed system we typically care about certain properties: availability,...

www.youtube.com

4

11

55

Jack Vanlightly

@vanlightly

4 months

With the announcement of S3-native-streams (Freight clusters), here is a commentary on Confluent strategy regarding object storage, streaming and an open data architecture.

Hybrid Transactional/Analytical Storage — Jack Vanlightly

Confluent has made two key feature announcements in the spring of 2024: Freight Clusters , a new cluster type that writes directly to object storage. It is aimed at the “freight” of data streaming...

jack-vanlightly.com

1

15

55

Jack Vanlightly

@vanlightly

15 days

The thing is, systems have traditionally been designed around paying for capacity of network and disks, not the number of IO requests. That fundamentally changes with S3, and right now, I don't think that cost model is compatible with S3 being a universal storage layer.

2

4

57

Jack Vanlightly

@vanlightly

5 years

The RabbitMQ that people know is changing: - bit by bit we're replacing the old consensus protocols with Raft - we have new queue types - better upgrade features - better observability - Kubernetes support and more in the works.

3

27

55

Jack Vanlightly

@vanlightly

2 years

Part 4 of my Viewstamped Replication Revisited paper analysis with TLA+.

Paper: VR Revisited - Application state and commit-number monotonicity (part 4) — Jack Vanlightly

Part 4 was going to be focused on the replica recovery sub-protocol but while writing the replica recovery specification I discovered that I had failed to enforce a critical property - that of commit...

jack-vanlightly.com

0

13

52

Jack Vanlightly

@vanlightly

9 months

Model checking KIP-848 (Kafka's next consumer group balancing protocol), 18 hours in on 32 CPU threads, I've used 800GB of my 2TB NVMe. I'd say its 50-50 whether my workstation can handle this one, might have to go distributed mode.

2

3

51

Jack Vanlightly

@vanlightly

3 years

Today I posted a series of posts that look at Apache BookKeeper internals (as configured for Apache Pulsar). - -

Apache BookKeeper Internals — Part 1 — High Level

The primary objective of this internals series is to help people build a mental model of how BookKeeper server works internally. This…

medium.com

0

20

53

Jack Vanlightly

@vanlightly

4 years

Using I can clearly explain how Apache Pulsar supports a replication factor of 2 safely, providing continued read/write availability with the loss of one bookie. Systems like Apache Kafka simply cannot do this, minimum for safety and availability is 3.

1

12

53

Jack Vanlightly

@vanlightly

6 months

Another post on writing as a software engineer.

The beauty of writing — Jack Vanlightly

I woke up this morning, sleep deprived after my cat woke me up repeatedly last night and discovered I needed to write something about writing. Perhaps it's because I'm reading " Bird by bird " again...

jack-vanlightly.com

3

10

51

Jack Vanlightly

@vanlightly

11 months

Queues for Kafka (KIP-932), my biggest ask for Apache Kafka since joining Confluent. Really excited about this one. It's not just queue semantics, but queues which can do replay.

2

10

52

Jack Vanlightly

@vanlightly

2 years

1

10

51

Jack Vanlightly

@vanlightly

16 days

With conditional writes, S3 takes one more step towards the universal storage layer. But can it go the next level? Is it possible to make a (profitable) storage service that hits all three needs of latency, cost, and durability?

3

50

Jack Vanlightly

@vanlightly

3 months

GCP is continuing to innovate its cloud storage, with hierarchical namespace buckets which have a directory structure like an actual filesystem. Can rename, list directories etc. Very cool.

Understanding new Cloud Storage hierarchical namespace | Google Cloud Blog

The new hierarchical namespace capabilities bring file system optimizations to Cloud Storage buckets.

cloud.google.com

1

19

48

Jack Vanlightly

@vanlightly

2 years

I've created a model checking optimized TLA+ spec of Raft plus a couple of variants such as exploring when to fsync and also Flexible Raft (flexible quorums of Flexible Paxos).

GitHub - Vanlightly/raft-tlaplus: TLA+ specifications for Raft and variants

TLA+ specifications for Raft and variants. Contribute to Vanlightly/raft-tlaplus development by creating an account on GitHub.

github.com

1

9

44

Jack Vanlightly

@vanlightly

3 years

I'm moving again! Today I joined @confluent in a role more focused on product, leaving behind the pager and hands on engineering.

8

1

44

Jack Vanlightly

@vanlightly

2 years

Broken the first ground on a TLA+ spec based on "Viewstamped Replication Revisited"

2

7

42

Jack Vanlightly

@vanlightly

28 days

I just updated my first table format comparison blog post to fix some missing details of Apache Hudi.

Table format comparisons - How do the table formats represent the canonical set of files? — Jack...

This is the first in a series of short comparisons of table format internals. While I have written in some detail about each, I think it’s interesting to look at what is the same or similar and what...

jack-vanlightly.com

0

1

41

Jack Vanlightly

@vanlightly

4 months

When I first joined Confluent in Feb 2022, looking for a place to make an impact, it was this proposal that excited me most. I had come from Splunk, running Apache Pulsar as a service internally. We wanted to scale up Pulsar to take on Splunk's big ingestion workloads but ran the

Mahesh Balakrishnan

@maheshb

4 months

In Feb 2022, @ghaz and I wrote an internal proposal at Confluent arguing for a "cost-saving design (e.g., writing to S3 directly) that can eliminate cross-AZ traffic costs for high-rate elephant workloads". I was hoping to apply my research on Corfu and

8

19

110

4

8

43

Jack Vanlightly

@vanlightly

8 months

I think I'm done with chapter 5 - serverless ClickHouse. If there are any engineers with deep understanding of CH internals, I'm open to reviews before I publish. Half is dedicated to open-source CH, so you don't need to work for CH Cloud to be a helpful reviewer :)

4

41

Jack Vanlightly

@vanlightly

6 years

My write-up on producer deduplication, aka idempotent producer, with @apache_pulsar and @apachekafka is complete.

Testing Producer Deduplication in Apache Kafka and Apache Pulsar — Jack Vanlightly

Failures can induce message duplication on both the producer and consumer side. In this post we’ll focus solely on producer side duplication, looking at how the deduplication feature works in Apache...

jack-vanlightly.com

1

15

39

Jack Vanlightly

@vanlightly

5 months

Apache Hudi is pretty difficult to understand compared to Iceberg. I'm writing a high-level TLA+ spec to try and nail down the behavior.

1

40

Jack Vanlightly

@vanlightly

2 years

17 hours on 28 cores and TLC finds its first defect with VSR revisited. State transfer as described in the paper can cause data loss.

3

6

40

Jack Vanlightly

@vanlightly

1 year

I'm writing a complete Kafka replication protocol description with a style which lies between the formal style of a paper and the informal style of a blog post. I'm curious if people think it would be best deployed as a single post or spread over multiple posts?

7

3

38

Jack Vanlightly

@vanlightly

10 months

The Architecture of Serverless Data Systems will resume in January with chapter 5. Right now I have to dedicate myself to formally verifying KIP-848, the new Kafka consumer rebalancing protocol before the holidays. Also need to move all my Kafka TLA+ into the Apache Kafka repo.

0

2

36

Jack Vanlightly

@vanlightly

9 months

One thing I'm seeing from my series on serverless data systems, is that an extremely diverse set of systems, APIs and workloads can make use of S3 as the primary storage layer. S3 is not a threat to these systems but an opportunity, if it can be integrated well.

Jack Vanlightly

@vanlightly

9 months

@criccomini There are the APIs (Kafka API, Pulsar API, Flink API) and then there are the distributed systems behind those APIs. S3 can become the default storage layer for everything, without deprecating the APIs (although I think some APIs will need to evolve).

0

6

2

38

Jack Vanlightly

@vanlightly

15 days

A new table-format comparison post is out. How do the table formats support the streaming ingestion of row-level operations (insert/update/delete), such as ingesting CDC streams?

Table format comparisons - Streaming ingest of row-level operations — Jack Vanlightly

In the previous post , I covered append-only tables, a common table type in analytics used often for ingesting data into a data lake or modeling streams between stream processor jobs. I had promised...

jack-vanlightly.com

1

5

38

Jack Vanlightly

@vanlightly

10 months

Chapter 1 - Amazon DynamoDB. I made this chapter 1 because it really epitomizes the large-scale multi-tenant data architecture I've been talking about recently and it has been evolving for 11 years already!

Amazon DynamoDB - ASDS Chapter 1 — Jack Vanlightly

DynamoDB is a serverless, distributed, multi-tenant NoSQL KV store that was designed and implemented from day one as a disaggregated cloud-native data system. The goals were to build a multi-tenant...

jack-vanlightly.com

1

7

38

Jack Vanlightly

@vanlightly

5 months

A new blog post on commodity, competition and the role of strategy and vision in the future of data infrastructure.

The Sisyphean struggle and the new era of data infrastructure — Jack Vanlightly

I just started re-reading Start With Why by Simon Sinek, which is a fantastic book on leadership and business strategy. The book’s core argument is that great companies don’t focus on what they do or...

jack-vanlightly.com

0

10

36

Jack Vanlightly

@vanlightly

2 years

1

9

36

Jack Vanlightly

@vanlightly

5 years

Some things I've learned about distributed data systems design in the last 6 months:

1

9

35

Jack Vanlightly

@vanlightly

9 months

I think the new S3 Express One Zone looks interesting for an S3 WAL (given the pricing model). Typically the WAL needs faster but smaller disks, then the actual data disks need to be bigger and cheaper. This maps well to Express for WAL and standard for long term data.

3

2

35

Jack Vanlightly

@vanlightly

13 days

Such a familiar story, I was always an average student but did enough to get by, it was only after I read a book on SQL Server locking internals that I found something that got me hooked - a healthy obsession as Phil calls it. For me that has been how distributed/data systems

Phil Eaton

@eatonphil

13 days

I wrote an essay on my mistakes trying to convince people to do something, on doing what you want to do, and on obsession. Ended with a personal note on developing healthy discipline, and having fun. :)

17

20

244

0

3

34

Jack Vanlightly

@vanlightly

6 years

Seriously, I was ready to throw my PC out the window two days ago due to TLA+. Today is my moment of triumph as my spec detected a defect in my protocol. I've never been so ecstatic to find a defect in my own design that escaped all randomized testing of my implementation.

3

5

33

Jack Vanlightly

@vanlightly

1 year

I love these reading lists. Of special interest to me this time round is the Model Checking Guided Testing for Distributed Systems paper which maps TLA+ specifications to implementations and generates tests from the state space. /1

Aleksey Charapko

@AlekseyCharapko

1 year

After a longer summer break, we will resume the distributed systems reading group next week. Here is the list of papers for our Fall'23 term:

1

39

114

1

8

33

Jack Vanlightly

@vanlightly

2 years

Part 5 of my Viewstamped Revisited analysis with TLA+. Would you use an SMR protocol with asynchronous log flushing + recovery protocol?

Paper: VR Revisited - Log-Based Replica Recovery (part 5) — Jack Vanlightly

One of the selling points of VR Revisited is that replicas do not need to write anything to stable storage, or can choose to write to storage asynchronously which can give this protocol a latency...

jack-vanlightly.com

0

6

34

Jack Vanlightly

@vanlightly

4 years

Here's how Pulsar avoids split-brain when ZK loses visibility of the owner broker of a topic and a 2nd broker takes on leadership (meaning for a short moment we have 2 brokers thinking they are the owner of the topic).

4

7

32

Jack Vanlightly

@vanlightly

14 days

An interesting paper on building a virtual disk on top of S3-like storage. The main downer is that it's still only prefix consistency (given high latency of S3 and the need to do batching to avoid small objects and keep the request rate in check for economics).

0

7

34

Jack Vanlightly

@vanlightly

4 years

Next week I start a new role at Splunk working on Apache BookKeeper primarily. The Apache Pulsar/BookKeeper stack is amazing and I am really excited to be contributing to it full time from next week.

4

33

Jack Vanlightly

@vanlightly

11 days

The semantics section of this Snowflake paper nicely describes some of the core concepts of incrementally generating/consuming change streams. It's helped me write more cohesively about CDC support in the table formats (which is WIP).

1

6

49

Jack Vanlightly

@vanlightly

21 days

My next table format comparison post is on native support for CDC. So far, I've been digging around the code of Delta and Hudi. As always, it's fascinating to see their similarities and differences.

1

3

33

Jack Vanlightly

@vanlightly

2 years

The VSR state transfer data loss invariant violation detected by TLC. I posited one that required 4 view changes when I read the paper, but TLC surprised me with this one with 3 view changes.

5

33

Jack Vanlightly

@vanlightly

3 years

My stages of writing a non-trivial TLA+ spec

2

6

32

Jack Vanlightly

@vanlightly

5 years

1

10

31

Jack Vanlightly

@vanlightly

3 years

The BookKeeper replication protocol is so nuanced I'm still having aha moments. BK devs need to tread very carefully. I should write a blog post about all the surprising and non-intuitive things about it.

5

3

29

Jack Vanlightly

@vanlightly

6 years

With the support of @streamlio I have done a write-up on how @apache_pulsar works and how it uses @asfbookkeeper to achieve its durability and latency guarantees. Quite a remarkable system I have to say.

Understanding How Apache Pulsar Works — Jack Vanlightly

I will be writing a series of blog posts about Apache Pulsar, including some Kafka vs Pulsar posts. First up though I will be running some chaos tests on a Pulsar cluster like I have done with...

jack-vanlightly.com

0

17

31

Jack Vanlightly

@vanlightly

1 year

This looks really juicy. I was planning on reading the OmniPaxos next but this one is too interesting.

0

10

29

Jack Vanlightly

@vanlightly

6 years

I've finished my first real TLA+ spec and I have to say that it feels like a real achievement. It took 18 days of evenings and weekends with lots of mind bending and mental sweat but it was all worth it.

2

1

28

Jack Vanlightly

@vanlightly

1 year

Something I love about TLA+ is the liveness checking. I can assert that state A leads to state B (A ~> B). Then the model checker will find histories where the protocol is unable to reach state B from state A.

1

6

29

Jack Vanlightly

@vanlightly

2 years

Here's my guide to benchmarketing:

3

12

28

Jack Vanlightly

@vanlightly

2 years

I'm a big fan of Apache Flink which is why I am ecstatic that Immerok are joining us. Really amazing news 😍😍😍

Confluent

@confluentinc

2 years

🎉 We’re excited to share our intent to acquire @Immerokcom ! Together, we’ll build a cloud-native service for @apacheflink that delivers the same simplicity, security, & scalability that you expect from Confluent for Kafka. Learn more →

2

69

200

0

1

29

Jack Vanlightly

@vanlightly

2 months

I'm putting the final touches on my deep dive into Apache Paimon blog post, and a formal specification written using Fizzbee. This one was both a lot of hard work but also a lot of fun. Posting tomorrow hopefully 🤞

3

1

27

Jack Vanlightly

@vanlightly

2 years

Just got back from #tlaconf at #StrangeLoop where @lemmster and I had our "Obtaining statistical properties by simulating specs with TLC" talk. Slides are available () but I'll probably get around to writing a blog series about the technique.

1

3

27

Jack Vanlightly

@vanlightly

4 months

I'm working on a Delta Lake TLA+ specification today. Mostly done. I suppose I should write a Delta Lake consistency blog post next.

0

27

Jack Vanlightly

@vanlightly

2 years

I'm working on a new replication protocol. If Flexible Paxos and Cheap Paxos had a kid, and also Raft and Apache BookKeeper had a kid and then those kids grew up and they had a baby, that baby would be my protocol.

2

3

26

Jack Vanlightly

@vanlightly

10 months

I'm working on a series that analyzes real-world serverless data system architectures, with a particular focus on how different systems do multi-tenancy with good tenant isolation.

3

2

27

Jack Vanlightly

@vanlightly

2 years

1\ One thing that neither the Raft thesis or paper discusses is reusing the same server identity, specifically in the context of reconfiguration.

3

5

27

Jack Vanlightly

@vanlightly

4 years

At Vueling Airlines I built an architecture mapping tool that maintained the whole graph of services, databases, queues, object storage, etc and their relationships in Neo4j, all built programmatically from config files and other sources. It was the only way to understand it all.

Corey Quinn

@QuinnyPig

4 years

When I go to a new cloud environment, people apologize because the architecture diagram is out of date. Spoiler: Everyone's architecture diagram is out of date. That is the nature of the universe. Smile, nod, and accept it. Have a listen:

3

10

32

3

7

25

Jack Vanlightly

@vanlightly

2 years

Low-code is a fancy way of saying high-yaml.

0

26

Jack Vanlightly

@vanlightly

4 months

I'm digging into Apache Paimon now. Unfortunately, there is no spec/protocol doc, so diving into the code. I'm still in the "groping in the dark" phase of code reading. Initial impressions are that if Apache Iceberg, Apache Hudi and Delta had a baby, it might be Paimon.

3

26