For those who missed Systems Distributed '24 and felt FOMO, here’s a new post where I’ve compiled my highlights along with links to the research papers, blogs, and books presented by the speakers.
Thank you
@TigerBeetleDB
for organizing this. I left with many lessons, new mental
What I Learned This Week...
- File systems lie! OS write buffers improve performance but risk data loss on crashes. Explicitly flushing write buffers ensures durability at a cost.
- Rust's compiler uses static analysis to ensure exclusive write access to each element in a data
What I Learned This Week...
Learned while reading:
- A strong mental model of ACID Transactions comes from grasping why we need them—race conditions and partial failures—NOT memorizing the ACID acronym.
- Rust's atomic operations, when used in pairs with Release and Acquire
Introducing
@resonatehqio
v0.5.0 with new features that allow you to both mitigate infrastructure failures and coordinate microservices in a dead simple way:
🔄 Retries: survive transient & intermittent failures with automatic retries of operations.
🔮 Recoverability: survive
Reading recommendation for this week - A blog detailing an architecture that uses Firecracker VMs for a new type of reconciliation loop. Unlike Kubernetes, this approach recovers not just the application, but also its state.
What I Learned This Week...
- Data partitioning enhances scalability by distributing large datasets across multiple disks and processors, but it's often combined with replication for fault tolerance.
- PostgreSQL's transaction snapshot isolation level provides a consistent
Beyond excited to share that next week I'll be joining
@AntithesisHQ
as Developer Relations Lead to help support the developer community in building highly reliable systems!
For those who have been following my journey, you know how strongly I believe in Deterministic Simulation
What I Learned This Week...
- Memory access patterns directly influence computation time by affecting caching behavior.
- Data models are a key factor in building software, as they govern the way data is structured, related, and accessed, ultimately impacting the overall
What I Learned This Week...
- Single-leader replication increases read throughput by allowing reads from any machine, while write throughput is limited to that of a single machine.
- While serverless architectures reduce operational complexity, they shift the burden of
found a race condition that didn't show up in simulated test runs of 1,000, 10,000, or even 100,000 concurrent requests, but only emerged after 1,000,000 evenly distributed requests across 10 clients.
the joys of concurrency... 🫠
🙌 Had a great chat with
@DominikTornow
today! We geeked out about systems, mental models, and more. A very thought-provoking discussion. Thanks and looking forward to the upcoming chapters of “Thinking in Distributed Systems” :)
#software
#thinking
seeing linearizability in action with
@anishathalye
's porcupine is a game-changer for verifying resonate's correctness. absolute 🔥 for our team!
🦔 Check out the linearizability checker for yourself: 👀
the resonate website just got a slick redesign - just in time for the new year!
check it out to learn what is distributed async await and how to build your next project with it 👇
updated Resonate's Dockerfile to use
@chainguard_dev
rather than Docker's official Go image.
Resonate (docker): 848MB
Resonate (chainguard): 35.3MB
A 96% image size reduction and 0 CVEs!? 👀
Working on a new blog about programming with assertions.
Thinking about software in terms of invariants has greatly improved the speed and quality at which I deliver software.
My first blog post for
@resonatehqio
just dropped!
In "P-lang for OSS Cloud Infrastructure", we'll take a closer look at the programming language P, discuss why formal modeling is beneficial for OSS, and share insights into how we leveraged these concepts in our work.
Link
Reading recommendation for this week - A paper which proposes a new paradigm for data management systems that is composed of the following logical components:
- Language frontend
- Intermediate representation
- Query optimizer
- Execution engine
- Execution runtime
My first week working at a testing startup and a single bug takes down an estimate 8.5 million Windows devices. I couldn't resist...
Here's my first blog post at Antithesis. Spoiler: Rust isn't enough.
Last Friday, an eerie silence descended across the globe. Millions of screens flickered, then faded to an ominous blue background. The dreaded Blue Screen of Death had returned, bringing the modern world to its knees.
The
@resonatehqio
docs revamp continues. I'd say ~85% complete for the upcoming release. 🥱
Feel free to check it out and give early feedback. Docs website in the comments below.
Last week, it was my turn to lead the discussion for
@eatonphil
's book club. We're reading 'Understanding Software Dynamics' and discussed waiting for disk and network.
Here are some of my notes and questions:
Thrilled to share that I'll be joining Resonate👇 as a founding team member to build durable async/await in public 🔥
As a developer, I know too well how difficult it is to write code that is fault tolerant AND (actually) dead simple 💀
Fortunately, we have a plan to fix this.
I've been playing around with
@tursodatabase
a bit more and noticed many users, including me, really wanted positional parameters supported in the LibSQL database driver for Go so I opened a PR for this feature. It was merged today, so try it out!
My second blog post for
@resonatehqio
just dropped!
In “Shipping Faster with Assertions“, we'll take a closer look at using assertions inside our production code and share insights into how we leverage them in our work.
Link provided in the comments below. 👇
End-to-End tests are often hard to write, debug, and maintain. That is why many of us avoid them at all costs and rely on unit and integration tests. But what if instead of avoiding the problem——you fix it?
Systems Distributed boasts an unmatched lineup of speakers:
Alex Petrov, Gwen Shapira, James Cowling, Kyle Kingsbury, and many more
This will be an incredible event. Don’t miss out—get your tickets now! 🏴☠️
Reading recommendation for this week - An old paper that formally defines the safety and liveness properties of distributed systems, along with tests to determine property types.
for durable executions, it’s impossible to overlook the value of traces. 📊
the resulting DAGs not only capture the flow of an execution, but also record the progression of failures, retries and mitigations over time.
here is one from
@resonatehqio
's test run yesterday: 🌟
@ThePrimeagen
If that’s how you really feel, let’s bring
@AntithesisHQ
on the show soon and chat all things deterministic simulation testing, guided fuzzing, etc.
In Distributed Systems, timeouts are necessary to detect failures since components cannot reliably determine the state of remote components.
This is why all durable executions in
@resonatehqio
require a timeout.
Introducing
@resonatehqio
v0.5.0 with new features that allow you to both mitigate infrastructure failures and coordinate microservices in a dead simple way:
🔄 Retries: survive transient & intermittent failures with automatic retries of operations.
🔮 Recoverability: survive
After an unbelievable time
@diagridio
, I decided it’s time for a new chapter! 🚀 Grateful for the memories, lessons, and talented colleagues I’ve had the privilege to work with.
More announcements soon 👀
deterministic behavior is predictable behavior.
understanding which operations are not deterministic is vital for writing systems that are easy to reason about.
one of the most subtle sources of non-determinism in go is the select statement.
Looking to start contributing to a couple OSS projects this summer. Ideally a code base in Go.
Any suggestions for active projects with welcoming communities?
Why are serverless applications notoriously hard to test?
In my experience, they reduce operational complexity, but shift the burden of managing the complexities of distributed systems to developers.
What looks like a simple function is actually a distributed service invoked
Serverless applications are notoriously hard to test.
If you’ve done the hard work of mocking your AWS environment with
@localstack
, you can now bring your setup to Antithesis and deterministically simulate a bunch of crazy usage scenarios.
Any bug you find, you can perfectly
Huge congratulations to the entire
@diagridio
team on the phenomenal product release! It was such a privilege to collaborate with this talented group and help build Catalyst. Make sure to get on the waitlist!❤️🔥
Introducing Diagrid Catalyst, a suite of unified developer APIs for messaging, data and workflow that make building incredible apps possible no matter where they run or what language you use - powered by
@daprdev
Read the blog & get early access for free
Spent this past weekend reviewing snippets from the upcoming book on Coding Interview Patterns by
@alexxubyte
.
If you’ve ever struggled with building intuition for coding interview material rather than simply memorizing solutions, this book seems very valuable.
💡If you curious in understanding the durable execution space or what the heck even is a workflow engine,
@a16z
did a great job summarizing the current landscape:
i dove into P (👋
@ankushpd
) this week and built a model of resonate's upcoming worker protocol.
P provided a great way for me to conceptualize the protocol and validate our software design decisions before implementing it.
Resonate Demo Day 🏴☠️
@vaibhaw_vipul
demoed Resonate Schedules, run Distributed Async Await functions on a schedule 🗓️
@gabe_guerra_
demoed a formal model of Resonate's task framework, fan out Distributed Async Await on 100s of nodes. Without missing a beat 💓
Guaranteed ⛓️
digging into data structures for timed events... seems like i've got 4 options:
- unordered lists ❌
- ordered lists - slow linear inserts 🙅♂️
- binary heaps - logarithmic inserts/deletes 💸
- hashed timing wheels - new to me but seem promising 🤔️
Late night. I decided yesterday to go down a fuzzing rabbit hole after stumbling upon this super high-quality resource by Google.
I'll unpack some of the interesting bits in the coming days.
Observations from my first week at Antithesis:
- Testing a Nintendo game is the best kind of onboarding.
- Whether it's a late-night brainstorm or a weekend epiphany, there's always someone ready to bounce ideas around.
- Taking initiative is valued over constantly seeking
dived into some sequence diagrams to map out resonate's next big subsystem... taking a first crack at modeling this in p-lang today. excited to see where this design process takes us 🔥
obvious what cloud native design pattern this would enable?
This weekend, I decided to write a bash script the hard way...with Nix.
Warning: This blog post is somewhere between a rant and a reflection.
Link in bio.
My favorite of our 8 security principles: no hidden state.
This principle is implemented through:
1. Cloud env managed with Infrastructure as Code (IaC) tools
2. NixOS running on all machines
Benefits:
- Single source of truth: Our entire cloud infrastructure is defined in
People trust us with their code, so our security needs to be hardcore. The thing is, we're more aware than most that software has bugs. Read this to learn how we try to build systems that are secure *even if they have bugs*:
This blog post is finally out! Worth a weekend read if you are interested in building an intuition for what makes a search strategy great at finding bugs.
A QA engineer walks into a bar. Orders 1 beer, 999999999 beers, an anteater in a beer. First real Customer walks in and asks for a light. Bar bursts into flames, killing everyone. What went wrong?
🤌
@resonatehqio
's typescript sdk is out and oss:
looking for 5-10 typescript devs to be early adopters of our new durable async await project!
i will personally onboard you and provide ongoing support as you build your first project with our software.
in resonate's next release, we’re shipping support for distributed locking 🏴☠️
this will allow distributed async/await apps to coordinate access to shared resources like pending promises more efficiently and correctly.
👇
super hyped to see community contributions like this detailed architectural documentation created by
@srinidhi94
!
it's awesome when people take the time to deeply understand a system and generously share their knowledge to benefit others.
We're finally launching multiverse debugging! With it, you can interactively rewind time to destructively analyze events leading up to your bugs. Check this blog post for details. Debug responsibly.
Ever hit a bug in production and wish you could rewind time to capture just a little more information? If you find that bug in Antithesis then we've got you covered:
In Distributed Systems, at-least once message delivery with Idempotency is the only way to achieve exactly-once processing semantics.
This is why we support Idempotent APIs
@resonatehqio
.
Introducing
@resonatehqio
v0.5.0 with new features that allow you to both mitigate infrastructure failures and coordinate microservices in a dead simple way:
🔄 Retries: survive transient & intermittent failures with automatic retries of operations.
🔮 Recoverability: survive
Just posted a new blog that goes over my experience learning zig to make a chess engine.
I even got the engine compiled to wasm so you can play it in the post!
Its a bit more of a "vibe" as in not a tutorial like most of my posts.
LMK what you think
From Designing Data-Intensive Applications:
"Data models are perhaps the most important part of developing software, because they have such a profound effect: not only on how the software is written, but also on how we think about the problem that we are solving."
One of the most enjoyable technical blogs I've read in a while with excellent examples.
Clear. Simple. Concise.
The author takes you on a journey that starts with the early designs of python generators as a lazy producer of values, to its development into full-fledged
The paradox of the cloud ☁️
TL;DR: Startups incur a tax for the product velocity gained from cloud services, but this tax can become overly burdensome as they scale.
@qianl_cs
Hi
@Qian
, you're right that normally you wouldn't be able to, but that's why
@AntithesisHQ
exists!
Running your application plus Kafka plus Postgres plus Redis deterministically on Antithesis sounds crazy, but it's true.
Here's a cool case study you might find interesting:
In Distributed Systems, retries are the main form of mitigation when it comes to transient & intermittent failures.
This is why all functions executed with
@resonatehqio
are automatically retried. No boilerplate required.
Introducing
@resonatehqio
v0.5.0 with new features that allow you to both mitigate infrastructure failures and coordinate microservices in a dead simple way:
🔄 Retries: survive transient & intermittent failures with automatic retries of operations.
🔮 Recoverability: survive
Transactions are the fundamental unit of work in relational databases. They provide a very reliable abstraction because of their adherence to the ACID properties ⚡
These transaction properties map to concrete database features we interact with:
✅ A
➕transaction property:
At
@embano1
's recommendation, I dove into Amazon's PR/FAQ framework for proposing new products and services.
The key takeaway for me was crystal clear: Solve a real customer problem first, before considering the technology. Kubernetes, eBPF, Nix, LLMs, WASM, Blockchain - none of
If Property-Based Testing (PBT) is so great, why isn't it more mainstream?
When I see a valuable technology fail to take off beyond niche developer circles, I find that jargon is often one of the biggest culprits. The solution usually involves making the technology's value not
Want to learn about Deterministic Simulation Testing (DST)? Great. Well, level 1 requires you to first develop a strong intuition for Property-Based Testing (PBT):
I am in NYC for 2 weeks starting this Thursday. If you're around and want to catchup for lunch/coffee, DM me ! ☕️
Always trying to connect with new folks.
What I Learned This Week...
- Tail latency in datacenters is affected by resource contention, with waiting being the fundamental bottleneck.
- Coroutines and threads provide concurrency differently, with coroutines offering simplicity and threads enabling parallelism.
- A
TL;DR.
P gives you an intuitive framework to think about distributed systems:
1. System interactions via state machines and events.
2. System behavior via safety and liveness properties.
My first blog post for
@resonatehqio
just dropped!
In "P-lang for OSS Cloud Infrastructure", we'll take a closer look at the programming language P, discuss why formal modeling is beneficial for OSS, and share insights into how we leveraged these concepts in our work.
Link
Found 4 thought-provoking podcast episodes that offer valuable insights into building tech products customers love. Worth a listen for anyone passionate about building.
The Nature of Product
Product Strategy & Metrics
Product
Just had an insightful discussion with my friend
@JoshVanL
on building a robust security brain. The ultimate north star?
“What happens if Russia can read this?” 😅
I'm starting a Substack to track my weekly learnings and occasionally do some tech deep dives.
This week, I'm sharing my learnings from seeking to understand the value of ScyllaDB and FoundationDB, relearning C++, and more.
Blog link in the comments.