Mark Callaghan Profile
Mark Callaghan

@MarkCallaghanDB

Followers
7,139
Following
268
Media
38
Statuses
6,058

Databases, storage and math

Joined September 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@MarkCallaghanDB
Mark Callaghan
7 months
Impressive work to explain 1+ second write latencies with Kafka running on ext-4. But the best part is the solution --> use xfs. Back in the day that was also the solution for intermittent high fsync latency from the MySQL binlog with ext-2 or ext-3.
7
66
253
@MarkCallaghanDB
Mark Callaghan
30 days
Ordered "More Modern B-Tree Techniques" by Goetz Graefe, published in 2024.
5
28
231
@MarkCallaghanDB
Mark Callaghan
5 years
I am excited to start a new job next week doing performance at MongoDB with @DavidDaly44 and @h_ingo
24
20
167
@MarkCallaghanDB
Mark Callaghan
1 year
Like Google Domains, I was deprecated by Google in 2009. MySQL team, in Ads Eng, was ended because F1/Spanner was coming. AFAIK it took ~5 years to fully switch and there was a migration to MariaDB after I left. I am sure there are more stories but I was too busy at FB to learn.
1
11
156
@MarkCallaghanDB
Mark Callaghan
2 years
Coroutines and io_uring are used in RocksDB. I have some reading to do.
4
21
144
@MarkCallaghanDB
Mark Callaghan
6 months
@stephaniemlee UCSD takes 58% of a research grant as overhead. Is oversight to avoid problems like this something they should be expected to do?
6
4
141
@MarkCallaghanDB
Mark Callaghan
4 years
OLTP -> Real Time Analytics. I have a new job at Rockset.
11
11
136
@MarkCallaghanDB
Mark Callaghan
2 years
Thank you @percona and OSS database communities
Tweet media one
14
6
132
@MarkCallaghanDB
Mark Callaghan
3 years
My focus on specific systems (MySQL, RocksDB) meant I neglected my general systems perf skills. Working on that now by reading "Understanding Software Dynamics" by Richard Sites and I highly recommend it.
4
10
125
@MarkCallaghanDB
Mark Callaghan
2 years
LeanStore is impressive. Hope it turns into a product. Regardless I appreciate how much effort has gone into it. Systems research takes a long time. Source is here
2
18
109
@MarkCallaghanDB
Mark Callaghan
10 months
Kyle does amazing work to make databases better. I am less of a fan of the drive-by snark from smart people who read or browse his work. Being smart doesn't replace sweat equity.
3
14
111
@MarkCallaghanDB
Mark Callaghan
2 years
Team Spanner: Spanner, @CockroachDB , @Yugabyte , @PingCAP Team Aurora: PG/MySQL Aurora, AlloyDB, @neondatabase Team Spanner is also DistSQL or NewSQL. What is a better name for Team Aurora? Neon is Postgres and OSS. When does Team Aurora get an OSS MySQL solution?
9
15
108
@MarkCallaghanDB
Mark Callaghan
8 months
MVCC GC problems ... Postgres has some * InnoDB and MyRocks have some, just elsewhere * I try to be fair when I document (or whine about) problems
1
21
100
@MarkCallaghanDB
Mark Callaghan
2 years
Someone is fixing MySQL replication at scale by replacing lossless semisync with Raft. I briefly worked on semisync and I am definitely not a dist sys expert.
2
16
93
@MarkCallaghanDB
Mark Callaghan
1 year
Still working on the names ... TradSQL - traditional (Oracle, MySQL, PG, etc) DistSQL - distributed SQL (Yugabyte, CockroachDB, TiDB) NewSQL - Aurora, AlloyDB, Neon ShardSQL - Vitess, CitusDB
10
15
92
@MarkCallaghanDB
Mark Callaghan
2 years
Folly comes to RocksDB - faster mutex, faster hash map, coroutines for async IO.
5
4
88
@MarkCallaghanDB
Mark Callaghan
1 year
Oracle has been a great owner of MySQL -- invested a lot, regular & stable releases, innovation continues: * parallel replication apply, query, index create * synchronous replication * InnoDB compression * scaling InnoDB on many-core * Heatwave ...
4
8
93
@MarkCallaghanDB
Mark Callaghan
3 years
The ribbon filter in RocksDB uses more CPU to save on memory vs a bloom filter.
2
24
89
@MarkCallaghanDB
Mark Callaghan
2 years
@brainiaq2000 Thanks for making Twitter great for people like me
3
2
88
@MarkCallaghanDB
Mark Callaghan
1 year
TreeLine - interesting paper, although I disagree with the claim that the primary reason for RocksDB (LSM) is write efficiency. The primary reason was space efficiency, while write efficiency was a secondary reason. #VLDB2023
2
10
84
@MarkCallaghanDB
Mark Callaghan
1 year
Yet another great LeanStore paper
1
16
83
@MarkCallaghanDB
Mark Callaghan
2 years
4 of the top 5 (Oracle, MySQL, MSFT, MongoDB) have peaked for 12+ months (no growth, or slight decline). Only Postgres continues to grow. No shame in not being able to grow forever, but fun to see Postgres continue to adapt, innovate and thrive.
@DBEngines
DB-Engines
2 years
0
9
26
2
17
80
@MarkCallaghanDB
Mark Callaghan
4 years
Interesting paper on databases & fast SSD from CIDR2020. I learned a few things and like how they presented results at a high level. Recipe for fast DBMS IO is: array of fast SSD, SW RAID, XFS, O_DIRECT, fdatasync and io_uring.
1
20
82
@MarkCallaghanDB
Mark Callaghan
1 month
Postgres is bad for business when your business is finding perf regressions. Postgres 17beta3 looks great on a small server * no regressions * one read-only test is ~2X faster * many write-heavy tests are ~5% to ~10% faster
2
18
80
@MarkCallaghanDB
Mark Callaghan
1 month
This paper is worth reading and I look forward to more research in this space. We need better b-trees to navigate more of the read, write & space-amp tradeoffs explained in the Rum Conjecture paper. Thank you Xiangpeng Hao and @badrishc
@badrishc
Badrish Chandramouli
1 month
At #VLDB2024 , check our Bf-Tree, our high-perf B-Tree design optimized for small key-values. It uses a mini-page abstraction to cache reads/writes and a variable-length buffer pool to maintain them. See and attend session C3 at 3:30pm today to learn more!
2
13
107
1
9
79
@MarkCallaghanDB
Mark Callaghan
4 years
There is a paper from the @RocksDB team in #fast21
2
16
74
@MarkCallaghanDB
Mark Callaghan
2 years
@cstross @elonmusk Excellent, now I must add seagull to my short list: * duck - calm above water, furiously creating drama below water * alligator -- big mouth to share ideas, short arms that can't reach keyboard to implement them
2
8
71
@MarkCallaghanDB
Mark Callaghan
4 years
I co-presented a tutorial at SIGMOD. My part was a description of MVCC GC using Postgres, InnoDB and RocksDB as examples. By chance there is a proper paper in SIGMOD on MVCC GC and it is worth reading. "Long-lived Transactions Made Less Harmful"
1
17
71
@MarkCallaghanDB
Mark Callaghan
1 year
@ovaistariq Lets talk about the real outrage. They don't explain how DynamoDB uses InnoDB. cc: @jim_dowling
3
7
72
@MarkCallaghanDB
Mark Callaghan
2 years
More advice for @elonmusk ... * disabling fsync will make databases run faster * get rid of backups, rarely needed, big waste of $$$
8
9
71
@MarkCallaghanDB
Mark Callaghan
4 years
Postgres is boring! No regressions from 11.10 to 13.1 for in-memory & low-concurrency sysbench on a small server
3
15
70
@MarkCallaghanDB
Mark Callaghan
11 months
Old Postgres and old MySQL had similar performance on sysbench. But modern Postgres is usually faster than modern MySQL because Postgres has avoided CPU perf regressions over time.
2
12
71
@MarkCallaghanDB
Mark Callaghan
5 months
Comparing MariaDB and MySQL with a CPU-bound Insert Benchmark on a new small server. The song remains the same ... MySQL has big regressions over time + MariaDB does not = Modern MariaDB is faster than modern MySQL
2
13
65
@MarkCallaghanDB
Mark Callaghan
3 years
@UMNComputerSci @gregkh I look forward to the post-mortem. Today a lot of time is being spent reviewing all of the previous commits from the UM research group.
1
1
62
@MarkCallaghanDB
Mark Callaghan
1 year
More Postgres tuning for the insert benchmark no a medium server with the database cached by Postgres. Reducing autovacuum scale factors to 0.05 helps a lot. Will now do IO-bound tests on this server.
0
12
65
@MarkCallaghanDB
Mark Callaghan
1 year
Apparently @Yugabyte is telling the truth when they claim Postgres compatible. I was able to run the Postgres version of the Insert Benchmark without changes.
5
6
65
@MarkCallaghanDB
Mark Callaghan
2 years
The big win for FB from RocksDB & MyRocks was less space amp (used half the space vs compressed InnoDB). Better write efficiency was nice, but not the big deal. Many papers get this wrong. Citations:
3
8
61
@MarkCallaghanDB
Mark Callaghan
3 months
A simple test to understand the CPU overhead from cloud block storage.
2
7
64
@MarkCallaghanDB
Mark Callaghan
6 months
A great article, consider subscribing to @lwnnet Much useful info, including "changeset contributions by employer"
2
13
63
@MarkCallaghanDB
Mark Callaghan
6 months
My Twitter experience has been mixed lately but there are still a few bright spots: 1) engaging with the database community 2) following computer systems perf experts There is much I don't know so it is great to learn from others here.
1
2
63
@MarkCallaghanDB
Mark Callaghan
1 year
@isamlambert Maybe this is the circle of life that all big companies go through. One result is that talent leaves for startups.
3
0
62
@MarkCallaghanDB
Mark Callaghan
27 days
Trying out Hetzner: * 48 cores, 128G RAM, ~4T of storage * includes all the HW counters (PMC) for perf * a similar server from AWS and GCP costs ~10X more at list price or ~5X more from GCP if I commit to 3 years of usage.
3
6
62
@MarkCallaghanDB
Mark Callaghan
5 years
Met with @mipsytipsy . Happy to learn about growth of @honeycombio . Years ago I pitched her on benefit of staying at $bigTech. Clearly I know more about databases then business.
1
2
61
@MarkCallaghanDB
Mark Callaghan
2 years
@sriramk @elonmusk How much equity are these late-nighters getting? Because that is also part of the startup experience.
3
2
58
@MarkCallaghanDB
Mark Callaghan
1 year
After much testing I might agree with OtterTune -- PG implementation of MVCC is a big problem. I like PG and hope this gets fixed. Perhaps I am doing it wrong, but this isn't an issue for MyRocks or InnoDB. Search the post for "fairness"
2
10
59
@MarkCallaghanDB
Mark Callaghan
1 year
I learn about new features by working near clever kernel people. Normally I just pitch io_uring, but perhaps sched_ext is the new kernel thing for me to pitch. Which DBMS will use it first?
0
1
59
@MarkCallaghanDB
Mark Callaghan
5 years
Pebbles, an LSM for @CockroachDB , is interesting. Compared to RocksDB: 1) commit pipeline is simpler 2) IO throttling for flush and compaction is different 3) writes are not stalled when flush/compaction gets behind
4
18
58
@MarkCallaghanDB
Mark Callaghan
3 years
I am working on RocksDB (part-time contract at Meta). My current focus is universal (tiered) compaction and searching for CPU regressions. I see up to 20% more CPU/query from 6.0 to 6.26 for simple workloads. Finding CPU regressions after the fact is time consuming. If only ...
2
2
57
@MarkCallaghanDB
Mark Callaghan
3 years
Compatible with MySQL or PostgreSQL is becoming a big deal. This is great for users but there will be confusion about the meaning of "compatible".
9
13
57
@MarkCallaghanDB
Mark Callaghan
2 years
New blog posts from the RocksDB team for async IO and crash recovery testing:
2
6
57
@MarkCallaghanDB
Mark Callaghan
6 months
Interesting paper from Nguyen and Leis on improving storage for LOBs. An LSM with key-value separation, like RocksDB's Integrated BlobDB, is likely to be the most performant solution today in a production-ready DBMS.
1
10
55
@MarkCallaghanDB
Mark Callaghan
2 years
An interesting post on the use of MySQL and MyRocks at Quora. The author, Vamsi Ponnekanti, also created the online schema change (OSC) tool while at FB and long ago we were classmates at UW-Madison.
0
6
56
@MarkCallaghanDB
Mark Callaghan
7 months
Long ago Mike wrote a great paper on things an OS does that makes it hard to write a DBMS. If DBOS succeeds then I hope for a paper that explains how DBMS features make it hard to write an OS. * *
0
9
55
@MarkCallaghanDB
Mark Callaghan
2 years
Realized last night that I need to learn more about b-epsilon trees. Today I learned someone from my technical community is publishing a book that includes a chapter on it. So I ordered a copy.
2
10
55
@MarkCallaghanDB
Mark Callaghan
1 year
I published a summary of the insert benchmark vs a big server for MyRocks, InnoDB and Postgres. Two highlights from the summary: * worst-case write and query response time is much better for MyRocks than for InnoDB or Postgres. This wasn't expected. /1
3
11
55
@MarkCallaghanDB
Mark Callaghan
1 year
Coursera is great. Hope this covers the most important syntax: create table (...) engine=rocksdb
2
11
52
@MarkCallaghanDB
Mark Callaghan
5 years
I am happy to see companies like @RocksetCloud leverage @RocksDB so they can focus on adding value higher in the stack. Just like FB was able to leverage LevelDB to start the RocksDB project -
2
12
52
@MarkCallaghanDB
Mark Callaghan
5 months
I struggle to find this paper once per decade. High Volume Trans. Proc. ... by Whitney, Shasha et al from HTPS 7 in 1997 Whitney went on to much success with kx and kdb. Shasha continued with a remarkable research career.
1
6
51
@MarkCallaghanDB
Mark Callaghan
2 years
@mituzas @micsolana I am having a hard time understanding people who have a hard time understanding the impact of a struggling company having a jerk as CEO who doesn't understand systems yet is happy to pontificate about them & fire employees who correct him. Not sure this saves his investment.
3
0
50
@MarkCallaghanDB
Mark Callaghan
2 years
Results from an in-memory sysbench benchmark to show the benefit of huge pages for Postgres and InnoDB. It helped Postgres a lot more (1.32X vs 1.06X). Perhaps I will explain why in future work.
2
10
49
@MarkCallaghanDB
Mark Callaghan
2 years
Tiered storage comes to RocksDB thanks to @cooldoger
0
11
47
@MarkCallaghanDB
Mark Callaghan
5 years
A review of FoundationDB Record Layer. Who wants to write the SQL layer?
3
12
49
@MarkCallaghanDB
Mark Callaghan
2 years
Trie, skiplist, ART? What is best for the memtable? This paper does 3 things to make C* faster: * reduces Java GC impact * makes keys byte comparable * uses a sharded trie (multi reader, single writer) ForestDB also used a trie, and was write-optimized but never reached GA.
@pvldb
PVLDB
2 years
Vol:15 No:12 → Trie memtables in Cassandra
1
4
28
1
10
49
@MarkCallaghanDB
Mark Callaghan
3 years
Low-concurrency insert benchmark: * Postgres is boring (no regressions) * MySQL has CPU regressions from 5.6 to 8.0 * MySQL 8.0.20 was an exciting release Results: * MySQL - * Postgres -
2
7
48
@MarkCallaghanDB
Mark Callaghan
2 years
Interesting papers from @BU_DiSClab for VLDB: * LSM Trees Under Memory Pressure * BoDS: Benchmark on Data Sortedness They also have a paper in progress on sortedness, OSM. Papers:
1
6
48
@MarkCallaghanDB
Mark Callaghan
2 years
@matthewokeefe1 @FranckPachot So many teams at Google wasted much time building workarounds on top of BigTable to compensate for the lack of ACID and support their user-facing workloads. Spanner made things much better for them. Too bad those stories aren't told in public.
4
4
48
@MarkCallaghanDB
Mark Callaghan
2 years
Read a great paper. Dremel: Adaptive Configuration Tuning of RocksDB KV Store Things I liked: * used some knowledge of LSM (cost models) * allowed for uncertainty to explore tuning search space * reduced search space via "fused features"
0
6
48
@MarkCallaghanDB
Mark Callaghan
6 months
Modern MariaDB is 13% to 22% faster than modern MySQL on cached & low-concurrency sysbench. CPU regressions matter.
0
8
45
@MarkCallaghanDB
Mark Callaghan
2 years
Fun to see new R&D on in-memory sort -- 2 for merge sort, 1 for quick sort: * * * I hope to revisit work I did on sort long ago, but the bar has been raised over the past 20 years.
1
4
47
@MarkCallaghanDB
Mark Callaghan
4 years
MySQL 8.0.20 looks interesting: * full support for hash join so that "... MySQL no longer use BNL as a join strategy." * more work on CATS locking for InnoDB * binlog compression * disable PK checks on replication apply
0
18
45
@MarkCallaghanDB
Mark Callaghan
4 months
Modern MariaDB is (almost always) 10% to 30% faster than modern MySQL using sysbench, a cached database and (new) small server because MySQL suffers from too many performance regressions over time.
1
9
46
@MarkCallaghanDB
Mark Callaghan
3 months
Not all SSDs can process TRIM as fast as you want so that deleting a large amount of data can stall read IO requests for many seconds. We need trimbench to document how devices behave during large deletes.
3
7
46
@MarkCallaghanDB
Mark Callaghan
2 years
Writes fast on primary needs replays fast on replica. Great progress in Postgres 15 on this although the post wasn’t clear on the implementation to get concurrent disk reads.
2
10
43
@MarkCallaghanDB
Mark Callaghan
2 years
Can someone save Twitter before the jerk ruins it? I use it to engage with systems and database communities and enjoy discussions with experts I would otherwise never encounter. No surprise, the site has been more error prone over the past week.
5
2
45
@MarkCallaghanDB
Mark Callaghan
3 months
On sysbench with a cached database MyRocks uses more CPU per operation than InnoDB, thus InnoDB gets more QPS. Conference papers should focus more on CPU read-amp with an LSM, as that is a bigger issue than IO read-amp.
2
13
44
@MarkCallaghanDB
Mark Callaghan
8 months
My summary of an interesting article. The problem - if you are paying by the IO, then doing a lot of IO via EBS is expensive The solution - figure out how to use local attached storage.
4
10
42
@MarkCallaghanDB
Mark Callaghan
4 years
Congrats to my @MongoDBEng peers for getting this published -- using TLA+ for model-based trace checking
@pvldb
PVLDB
4 years
Vol:13 No:9 → eXtreme Modelling in Practice
0
4
14
0
12
43
@MarkCallaghanDB
Mark Callaghan
1 year
Yet another great paper from the Leanstore people. Page writeback on fast storage isn't easy, especially for a DBMS designed when storage was slower "Write-Aware Timestamp Tracking: Effective and Efficient Page Replacement for Modern Hardware" #vldb2023
2
8
43
@MarkCallaghanDB
Mark Callaghan
3 years
I look forward to reading this but UDB (MySQL + RocksDB) is the data store and TAO is the (very clever) cache. "RAMP-TAO: Layering Atomic Transactions on Facebook's Online TAO Data Store"
2
5
43
@MarkCallaghanDB
Mark Callaghan
5 years
On tuning filesystem readahead for a DBMS
1
13
41
@MarkCallaghanDB
Mark Callaghan
3 years
A truthy summary of tiered compaction implementations
2
8
43
@MarkCallaghanDB
Mark Callaghan
5 months
A paper on MySQL + Raft
0
10
42
@MarkCallaghanDB
Mark Callaghan
1 year
When I read conference papers on LSM I often wish the paper didn't have an LSM overview. Reading the Tigger paper on using eBPF to build a DBMS proxy and the overview is excellent -- I needed that background info. #vldb2023
2
6
41
@MarkCallaghanDB
Mark Callaghan
3 years
I am sharing notes on RocksDB internals as I read the source code. This one is about code that determines whether write stalls or slowdowns are needed.
0
0
41
@MarkCallaghanDB
Mark Callaghan
2 years
Let me be pedantic: 1) Joins are expensive 2) A query that uses a non-covering secondary index does an index nested loops join 3) Lets ban such queries! FB implemented OSC (Online Schema Change for MySQL) to make a few critical, large, busy indexes covering for frequent queries.
3
4
42
@MarkCallaghanDB
Mark Callaghan
2 years
Much detail, nothing but good news from Postgres: * CPU overhead doesn't change much from v11 to v15 * A few things are much faster in v15 (full table scan, update the same row) Context is: small server, low concurrency, in-memory
2
11
42
@MarkCallaghanDB
Mark Callaghan
2 years
Postgres 12, 13, 14 and 15 vs the Insert Benchmark - not many performance regressions, Postgres remains boring (for me).
7
5
41
@MarkCallaghanDB
Mark Callaghan
2 months
I am starting to document regressions and sources of CPU overhead in MySQL and InnoDB. FIrst up, why does binlog_log_row use ~3X more CPU in 8.0 vs 5.6?
2
5
41
@MarkCallaghanDB
Mark Callaghan
4 years
Old me: be wary of perf results from non-experts New me: be wary of DBMS that requires too many experts
1
13
41
@MarkCallaghanDB
Mark Callaghan
4 years
I enjoyed reading "Optimizing Databases by Learning Hidden Parameters of Solid State Drives" and this blog post has a few comments and questions. I hope there is a sequel. @pateljm @uwKPark @bpkrothGeek
2
11
40
@MarkCallaghanDB
Mark Callaghan
2 years
How I do performance tests for RocksDB, part 1
1
9
40
@MarkCallaghanDB
Mark Callaghan
1 year
Nice paper from Leanstore on MVCC GC, because writers don't block readers, but writers can make readers slow down a lot via old versions. It wasn't clear to me how the paper deals with transactions that make some changes then rollback.
1
7
40
@MarkCallaghanDB
Mark Callaghan
4 years
For Postgres and MySQL with in-memory, low-concurrency sysbench on a small server: * Old MySQL (5.6) is faster than old Postgres (11.10) * New MySQL (8.0.21) is slower than new Postgres (13.1) * New CPU overhead is the problem.
0
11
40
@MarkCallaghanDB
Mark Callaghan
3 years
RocksDB internals: the write rate limiter
0
0
40