When I wrote Database Internals back in ~2018, my main goal was to make the field more approachable and less intimidating. Everyone should be welcome to enter, there's so much work to do here!
Over the last year, so many people have read it. I've seen at least 3 book reading
Finally, entire Database Internals book contents are available on O’Reilly Early Access program! Still raw and unedited, we are working hard on a final release. You can have a sneak peek here:
Every cent made from Database Internals
@therealdatabass
for now is going to be used to support Ukrainians! Ukraine has given me education and enabled me to write this book. I owe this to them.
We’re reunited and I’m extremely happy to have them with us. But this is not over: under the circumstances, we are privileged. We live in the EU, have picked them up, and they have food and shelter. But most people running - don’t. Please keep donating and volunteering.
They’ve crossed the Polish border! Meanwhile, a missile has exploded right next to the train station where they’ve started their journey yesterday evening.
In case you don’t know this yet, CMU database group YouTube channel is simply amazing. Watching it regularly. Very much thought provoking, constantly getting a stream of ideas when listening.
Database Internals
@therealdatabass
is now also available in audio format, from as well as Audible. Retweet to help spread the news and enter the raffle to get it for free (via redeemable code).
Database Internals is nearing 26K sold copies! Every month, I get a slip with this little creature on it and add up the tally. Oftentimes, I can't believe that so many people have it on their desks. Thank you, everyone, for spreading the word!
Looks like three chapters from Storage Engines part of
@therealdatabass
were released on O'Reilly EAP: In-Place Update Storage, Binary File Formats and Implementing B-Trees. We'll keep releasing chapters if you keep spreading news about the book!
If you haven’t yet read my article on LSM vs B-Tree storage, you have no more excuses. It’s now also publicly available on ACM. Algorithms powering modern storage systems:
If there’d been such a thing like Database Internals II, what would you like to see in it?
I’ll start: code. Def not production-grade, deliberately simplified, but technically equivalent to what you’d see in a real database, with clear cut abstractions even if they hurt perf.
Big thank you 🙏 to everyone who got a copy of Database Internals
@therealdatabass
. It's now ranked
#5
,534 out of over 8,000,000 books, and is named a best-seller in Management Information Systems on Amazon \o/
This year, I'll be posting Database Papers Advent Calendar, a collection of curious papers from 2021 and 2022, to my Mastodon account, and replicate to Twitter with some delay to maybe inspire folks to migrate over:
I still take this as a compliment! If someone with quite a bit of experience in databases has learned something new from it, I’m very happy. As regards shortcomings, if I knew what I know now back in 2019, I could’ve done (also, written and edited) better.
My review of Database Internals.
A solid guide to many challenging topics, from an interesting perspective.
Would recommend, and I look forward to a potential second edition!
Crazy idea: a “behind Database Internals” reading group. Like database internals reading group but instead we’re going to read the most notable references. Probably will take us about a year (reading all will def take us way over a year). Yay or nay?
It's never too late for a year summary post! New blog post about some of the papers I've read and enjoyed lat year. Database Research in 2019: The Year in Review
I'm working on the new article series on database systems, this time concerning Distributed System concepts. First article discusses Links, Two Generals Problem and FLP Impossibility:
In the database book club, we just have finished the first chapter of Gray/Reuter’s Transaction Processing, and
@MilanLoveless
have started a repo with notes:
If you want to join, just jump on discord, we’re just starting:
Looks like my book on Database Internals is startling looking like a book! Two parts, 12 chapters, 93 headings, 160 pages, 46k words, 300k symbols and counting!
Finally, Database Internals book
@therealdatabass
is available in a print version on Amazon! It’s been going pretty quick for the last couple of weeks, but you can get one now:
I would like to try something new in 2024. If you have always wanted to read Transaction Processing by Gray/Reuter, but found the tome too thick to tackle without a good company, join us on the journey to read through it and learn new things together:
Second part of Ways to Agree, Path to Atomic Broadcast, is out! Featuring Shared Memory, Linearizability, Two/Three Phase Commit, Leader Election and Broadcast:
At database internals book club, an overwhelming majority has voted for
@martinkl
DDIA book! First meetup has commenced today. Join in if you want to read a great book in a good company!
With Gray’s transaction processing book done, folks in Database Internals discord are voting for the next iteration of the book club. Guess which one is in the lead? Also, join if you want to participate.
With Gray’s transaction processing book done, folks in Database Internals discord are voting for the next iteration of the book club. Guess which one is in the lead? Also, join if you want to participate.
So we’re at 309 pages, folks! Content seems to be finished now, with addition of several awesome papers. Now to editing again, and releasing the early access version \o/
One of my favourite youtubers,
@asianometry
, has published a video on history of SQL and relational databases. Featuring a fish pun, alas no mention of data bass!
Today I've read a paper that I have (kind of) started my book research from back in 2018: Database Architecture by Hellerstein and Stonebraker. I'm looking at it in a completely different way, with a new understanding and insight. Really hope that reading it does the same to you.
Database Internals
@therealdatabass
is on sale on US Amazon for half the price! Check it out if you haven’t; holidays are the best time to learn about new stuff!
Fuzz testing is an absolute must for any database (or any sophisticated system/program whatsoever). Trying to compose the edge-cases manually is tedious, hard to review, and is likely to still live a lot uncovered. Randomised tests are easier to develop, maintain and run.
Not only Gray has described simulations and chaos engineering, but also event sourcing, in a short off-the-cuff remark. Wondering how many valuable things from that book have gotten overlooked over the years.
If you think Andres Freund has saved you from xz vulnerability, this is not the first time he finds tricky subtle problems across software boundaries. But this one, you probably won’t read in NYTimes about. Oh wait…
Dear
@lufthansa
. It might be surprising, but there are people whose name ends with “DR”. It doesn’t mean you should turn this into their title. Checked my boarding passes for last 10 years, and everywhere I am Dr Oleksan. I appreciate your recognition of my intelligence.
Friends around the world, thank you for the pings. No one is safe from this conflict and you’ve seen the threats. Ukraine remains an epicentre of this conflict. If you got a little spare money, please consider donating. Ukrainian friends have recommended
Systems Distributed was such a great conference. Has a chance to meet so many great folks doing all sorts of things in databases and distributed systems. Very inspiring!
In case you haven’t read the first version, the second version of “Is Parallel Programming Hard, And, If So, What You Can Do About It?”is out, and you better check it out!
If you like reading papers, join paper reading group in Databass Slack. Check out the details and the list of the papers we've already read:
The read for next two weeks is SSS: Scalable KV Store with External Consistent and Abort-free RO Transactions.
We seriously need to start the whole CAP conversation once again from very scratch with way more rigorous terminology. These slides of hands that start with consistency "according to the desired service specification" and then raising it to "atomic consistency, because it is
Ukraine got it right: there's 1 confirmed case there, and all schools and universities are closed, mass gatherings and events are limited. Learn from other countries, don't wait for a number to grow; prevent instead of reacting.
62 pages chapter on B-Trees is now edited, with 48 images made for it! In addition to 21 pages on storage taxonomy. “Only” 127 pages more to go. I need a break now 😴
One of the reasons I re-read distributed systems papers multiple times is mobbing: a single non-careful wording and you’re out. The crowd will rage and question your competence.
This is also the reason I always tried to avoid writing longer pieces on functional programming.
\o/ my proposal for Velocity in San Jose got accepted! Will start working on notes and release them same as last year with disk IO series. This year’s subject is Consensus Algoritms.
We already have 750+ folks in Database Internals Discord & Book Club. We're currently reading a Fault Tolerance chapter from Transaction Processing by Gray/Reuter, and the next meet-up is this Tuesday. Join up if you want to learn more about how databases work!
They’ve crossed the Polish border! Meanwhile, a missile has exploded right next to the train station where they’ve started their journey yesterday evening.
I’m a bit concerned if “hacking” design interviews is going eventually lead to same thing that happened with Whiteboarding. Will we eventually have to recite Paxos, Raft, ZAB, and VR papers by heart to get a backend job?
Last read is done and final edits are submitted to my production editor. Waiting for their “go” and can’t wait to see the book in print. Secretly hoping there won’t be much more work. It’s been hard 1,5 years, and I’m grateful to all the friends who helped me to work through it.
Some more great LSM papers:
* Monkey: Optimal Navigable Key-Value Store
* Jungle: Towards Dynamically Adjustable Key-Value Store by Combining LSM-Tree and Copy-On-Write B+-Tree
* LLAMA: A Cache/Storage Subsystem for Modern Hardware
Sharing some really good papers on LSM-tree storage engine domain (some focused to compaction; others more abstract). I would be very glad to hear suggestions if you know any paper that's not listed here.
pls find them in this thread...
Database Internals reading group is now over 1K people! We got some awesome folks from all over the industry & academia. We're currently reading Transaction Monitors chapter from Gray/Reuter Transaction Processing book. If you're not in yet, join us:
If you got Database Internals and enjoyed it, please consider posting rating on the website where you purchased it. I know it’s just a pointless number for many, but this really helps the book and the author. Who knows, there might be another one?
This year, I’ve found myself coming back to several books I’ve read a while ago and re reading them to get new insights. Back in the day i resisted re-reading as it felt redundant. Doesn’t feel this way anymore as reading the same book with a new mind is like reading a new book.
@eatonphil
It has, at least on dotcom amzn! I know it’s often a lot to ask, but for many thousands of copies sold only a few rate it. If you enjoy the read, and got a couple of minutes, please hit those stars. This helps a lot and may even encourage the author to work on V2!
Two things you should never google to keep your faith in humanity:
* wide-column vs column-oriented databases
* serialisability vs sequential consistency
Would anyone be interested to read and discuss papers together? I have some in mind, and it seems like many folks do this because everyone spends more time at home. I can provide free excerpts from the book for relevant reference points, too!
I’m giving away a paper copy of Database Internals book! To enter the raffle, respond to this tweet with your favorite podcast episode related to distributed systems!
Check out the
@therealdatabass
episode of Software Engineering Radio for inspiration:
If you've ever struggled to understand how Linearizability relates to Sequential Consistency, or Serialisability, and how consistency models generally fit together, read this paper. A must-read for any distributed systems engineer:
Day 7: There are quite a few consistency models, each one having its own important implications. A great overview of consistency models by Paolo Viotti and Marko Vukolić: Consistency in Non-Transactional Distributed Storage Systems: