Mark E. Dawson, Jr. Profile Banner
Mark E. Dawson, Jr. Profile
Mark E. Dawson, Jr.

@medawsonjr

Followers
1,942
Following
200
Media
22
Statuses
1,312

CEO of JabPerf Corp, Contributing Author to "Performance Analysis and Tuning on Modern CPUs" (available on Amazon), Blogger, and Former Amateur Boxer

Chicago, IL
Joined August 2015
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@medawsonjr
Mark E. Dawson, Jr.
3 months
"C++ design patterns for low-latency applications including high-frequency trading"
5
77
461
@medawsonjr
Mark E. Dawson, Jr.
2 years
Why do people believe this is a slam dunk? Dynamic dispatch is 5x slower, has a 0.70 IPC and a 12.81% branch miss rate compared to the switch case w/an IPC close to the max on my i9-10940X and a 0.02% branch miss rate. I'm truly baffled.
@unclebobmartin
Uncle Bob Martin
2 years
Some folks thought I should add a second derivative and deploy them randomly. That actually made a difference. Now the dispatch is a bit more than 4ns longer.
7
2
35
7
12
154
@medawsonjr
Mark E. Dawson, Jr.
11 months
Succinct, edifying article on the topic of Tagged Pointers and their various uses in the real world:
0
29
141
@medawsonjr
Mark E. Dawson, Jr.
10 months
Between clients & writing my sections for the upcoming 2nd edition of "Performance Analysis & Tuning on Modern CPUs", I've had little time to post new articles. Here's one to close out 2023:
1
23
139
@medawsonjr
Mark E. Dawson, Jr.
6 months
Matt Godbolt describes how hardware is deployed in the low latency HFT space:
2
18
115
@medawsonjr
Mark E. Dawson, Jr.
2 years
Fruitful (and civil) discussion between @cmuratori and @unclebobmartin (the author of the book "Clean Code") regarding the former's recent "Clean Code, Horrible Performance" video:
5
23
113
@medawsonjr
Mark E. Dawson, Jr.
2 years
If you're a Perf Engineer who works in the nano-to-microsecond timescale, then there's no getting around needing to know Assembly. Here's a two-part series on it from a Security guy (see a theme forming here?)
1
29
134
@medawsonjr
Mark E. Dawson, Jr.
5 months
Today, May 31st, marks 11 years since I began spookin' HFT firms away from using the first core of *any* CPU in a server, not just that of the first CPU (i.e., core 0), for latency-sensitive threads😊
5
11
102
@medawsonjr
Mark E. Dawson, Jr.
1 year
"Overeager techies wanna tinker with magic knobs & secret tunables hidden behind names with leading underscores. They want the *tricks* of the trade w/o first understanding the trade itself. But that’s not Performance Engineering. . ."
4
19
89
@medawsonjr
Mark E. Dawson, Jr.
10 months
Four tools made notable impact on my performance consulting business: FTrace, Coz, eBPF, and perf c2c. My crystal ball tells me that this new tool feature will provide the fifth!🔥
@mjpt777
Martin Thompson
10 months
This could be a game changer.
3
43
213
3
15
85
@medawsonjr
Mark E. Dawson, Jr.
1 year
Intel cuts no. of dies in half, crams in more cores, cranks up DDR5 & UPI bandwidth, and dramatically increases shared LLC size (2.84x) all while taking up less space in Emerald Rapids when compared to Sapphire Rapids🤔
7
8
82
@medawsonjr
Mark E. Dawson, Jr.
6 months
Colleagues think I'm an expert w/the perf tool. HA! I can't tell you how often I've stumbled upon new perf functionality that I had no idea existed (for YEARS in some cases). For example, check out this one:
5
17
92
@medawsonjr
Mark E. Dawson, Jr.
2 years
QUICK TIP: I've noticed that many of my colleagues reach for "strace" for tracking syscalls in code. Do yourself a favor & use "perf trace" instead. All the same bells & whistles but FAR less overhead. And now back to your regularly scheduled programming. . .
6
7
74
@medawsonjr
Mark E. Dawson, Jr.
2 years
PERFORMANCE TIP: If you publish microbenchmarks for public consumption, you'll do yourself & everyone else a big favor by submitting via established microbenchmark frameworks (e.g., Google Benchmark, JMH, etc.). These help avoid common pitfalls👍🏽
5
4
67
@medawsonjr
Mark E. Dawson, Jr.
2 years
@wil_da_beast630 It will be Roland Fryer once he emerges from the hatchet job perpetrated by Harvard et al. The documentary about the whole debacle (which Glenn Loury is involved with) may help expedite the process.
2
4
61
@medawsonjr
Mark E. Dawson, Jr.
6 months
Dmytro Vyazelenko from Aeron recently gave a presentation on designing for low latency. He made it a point to specify that "in preparation for this presentation, no LLM was used":
2
20
66
@medawsonjr
Mark E. Dawson, Jr.
1 year
As the greatest Performance Engineering Rap Group of all time once said, "Cache Rules Everything Around Me (C.R.E.A.M.)"
@johnnysswlab
johnnysswlab.com
1 year
Faster hash maps, binary trees etc. through data layout modification We investigate how to make faster hash maps, trees, linked lists and vector of pointers by changing their data layout.
3
61
263
2
10
59
@medawsonjr
Mark E. Dawson, Jr.
2 years
@lemire Was this prophesied in the Book of Revelation?
10
1
49
@medawsonjr
Mark E. Dawson, Jr.
2 years
Cloudflare is back at it again! Yet another quality writeup on a real TCP latency issue, and the new kernel patch they've created to address it:
1
10
55
@medawsonjr
Mark E. Dawson, Jr.
6 months
Happy Performance Engineer's Day, all (yes, this is a thing)!🥳
2
10
54
@medawsonjr
Mark E. Dawson, Jr.
5 months
The C/C++ Hashmap Showdowns continue:
2
10
52
@medawsonjr
Mark E. Dawson, Jr.
1 year
On my JabPerf blog, I've written brief explanations about DRAM internals only as an intro into larger topics regarding crafting low latency software. But this Cloudflare article does a proper Deep Dive on DRAM organization:
1
8
52
@medawsonjr
Mark E. Dawson, Jr.
1 year
Opening Keynote from ICPE 2023 regarding the performance engineering work that goes into modern video games at Ubisoft:
1
15
51
@medawsonjr
Mark E. Dawson, Jr.
4 months
This article cogently supports my firm belief that mastery of any one tool does *not* an expert Perf Engineer make. Fantastic breakdown of pitfalls & rules-of-thumb for perf analysis👍🏽 Metadata: Always Measure One Level Deeper
1
12
46
@medawsonjr
Mark E. Dawson, Jr.
1 year
Can't recall the last time I debugged a Linux kernel networking issue since most of my IT life has revolved around kernel bypass stacks & libibverbs. Still, can't help sharing this article from the always stellar Cloudflare Blog:
3
10
40
@medawsonjr
Mark E. Dawson, Jr.
8 months
I don't typically post Job Openings here. But several of my PerfEng brethren have asked about breaking into the HFT industry, where pay is phenomenal & the security is much better than elsewhere in the IT industry. Check it out:
4
7
37
@medawsonjr
Mark E. Dawson, Jr.
2 years
We often discuss TLB Miss Penalty of big, randomly-accessed working sets yet seldom do I see mention of the increased Cache Pollution (L1d thru L3 on Intel/L2 thru L3 on AMD) via HW Page Walkers traversing the Page Table in such cases. Adds insult to injury.
2
5
36
@medawsonjr
Mark E. Dawson, Jr.
11 months
@PeterVeentjer AMD uProf measures memory bandwidth utilization for AMD CPUs, while Intel PCM, intel-cmt-cat, and pmu-tools all do the same for Intel CPUs (I use the latter to track per-socket memory bandwidth into Grafana via the intel_rdt Telegraf plugin).
2
6
33
@medawsonjr
Mark E. Dawson, Jr.
2 years
Good paper on dealing w/systems perf variability in benchmarking, why never to assume normally distributed results, how to test for normality, and how to use nonparametric tests (and why I'm not crazy for always rebooting btwn tests)😊
1
5
33
@medawsonjr
Mark E. Dawson, Jr.
1 year
Firstly, Perf Engineers who want to get a better grasp of statistical concepts in digestible posts should follow this guy. Secondly, I find that few ppl talk about #6 - i.e., the fact that not only small sample sizes can trip you up. Large ones can, too.
@selcukorkmaz
Selçuk Korkmaz
1 year
Common Statistics Mistakes in Published Papers - A Cautionary Tale 📊📝 1/ Publishing a paper is a monumental task, but it’s essential to get the statistics right. Let's dive into some frequent statistical missteps researchers make and how to avoid them! 2/ Misunderstanding
Tweet media one
7
166
604
2
4
33
@medawsonjr
Mark E. Dawson, Jr.
11 months
One performance specialist's experience with software optimization, distilled down to his four (4) classes of tuning strategies:
0
11
33
@medawsonjr
Mark E. Dawson, Jr.
1 year
RCU and Hazard Pointers have been voted into C++26! Congratulations on all your efforts & those who worked diligently with you on this, @paulmckrcu !
2
5
31
@medawsonjr
Mark E. Dawson, Jr.
2 years
@unclebobmartin Another way of putting it is that, even in a toy microbenchmark, dynamic dispatch increases runtime by 5x when compared to switch.
2
1
27
@medawsonjr
Mark E. Dawson, Jr.
6 months
Case Study in how *not* to run a performance analysis:
3
3
27
@medawsonjr
Mark E. Dawson, Jr.
6 months
DB & IO EXPERTS: On Linux w/NVME storage, which multi-queue I/O Scheduler do you prefer for optimal performance? Consensus from a quick online search seems to lean toward "none" but I trust my carefully curated list of X Follows much more than Google😉
5
9
27
@medawsonjr
Mark E. Dawson, Jr.
8 months
Shout out to our very own @fleming_matt for founding his new company Nyrkiö for the purpose of bringing a supported Change Point Detection solution to the masses:
3
3
26
@medawsonjr
Mark E. Dawson, Jr.
2 years
@halvarflake Apple M1 has a microarchitectural component called "Data Memory-Dependent Prefetcher" that optimizes pointer following workloads.
4
0
26
@medawsonjr
Mark E. Dawson, Jr.
2 years
0
5
25
@medawsonjr
Mark E. Dawson, Jr.
4 months
JAVA PERF PEEPS: There's a new book out from Oracle Press entitled "JVM Performance Engineering" by Monica Beckwith. I'm interested in your thoughts on its contents. Feel free to comment here or via DM, wherever you're more comfortable.
2
5
23
@medawsonjr
Mark E. Dawson, Jr.
1 year
Email from P99Conf '23: "Since we expect a large spike of logins, be sure to be among the 1st who login to the keynote sessions. Be ready w/your wireshark, gdb, docker & bpftrace." I *told* you Wireshark's an important perf tool😊
0
7
24
@medawsonjr
Mark E. Dawson, Jr.
2 years
This comical refusal by Uncle Bob Martin to stop & reconsider whether he should reevaluate his hard dynamic dispatch stance illustrates a common pitfall of those hailed as experts - "Expert Ego" sets in. Here are tips to avoid it:
1
5
22
@medawsonjr
Mark E. Dawson, Jr.
9 months
Thorough article on the decision to enable frame pointers for Fedora, as well as a nice breakdown on the pros & cons of various stack unwinding methods, some of which are tried-and-true & others which are on the horizon:
1
9
22
@medawsonjr
Mark E. Dawson, Jr.
11 months
EVERY man over 30 should get a full battery of blood tests at each annual physical. Even if you're not playing a sport. Make sure it includes Lipid, Total/Free Testosterone, DHEA, ApoB, HbA1c, Fasted Insulin/Glucose, LH/FSH/TSH/T4/T3, CRP, etc. Trust me👍🏽
3
1
23
@medawsonjr
Mark E. Dawson, Jr.
6 months
@i_bogosavljevic No, you're not hallucinating. It's part of the Intel Resource Director family of features called "Cache Pseudo Locking". The Linux kernel supports it, as well. I've never used it personally because the HFT game is won or lost in the L1d anyway🤷🏾‍♂️
0
2
24
@medawsonjr
Mark E. Dawson, Jr.
1 year
One of the better, if not best, IT Performance Conferences based on how practical and immediately useful all the talks are. BONUS: You don't even have to book a Business Class seat to attend (it's virtual) 😉
1
2
22
@medawsonjr
Mark E. Dawson, Jr.
1 year
“An amateur can be satisfied with knowing a fact; a professional must know the reason why. An amateur practices until he can do a thing right, a professional until he can’t do it wrong.”
2
1
22
@medawsonjr
Mark E. Dawson, Jr.
3 months
I've noticed the fact that splitting work btwn threads not only has diminishing returns but also the potential for *regressive* behavior is tough for clients to wrap their heads around. This is why we explain USL in both @dendibakh 's perf book editions👍🏽
1
2
21
@medawsonjr
Mark E. Dawson, Jr.
2 years
In yet another example of the symbiotic relationship between Perf & Sec Engineering, here's a new paper reverse engineering Intel's L1/L2 TLB, providing operation details you won't find in the Intel Arch SW Dev Manuals:
0
8
22
@medawsonjr
Mark E. Dawson, Jr.
5 months
@lemire This mirrors Myths #3 & #5 from my article: Sampling profilers can mislead, and mastering any one tool (e.g., perf or VTune or uPerf) won't magically confer perf analysis expertise. Somehow that ruffled feathers on X:
2
5
22
@medawsonjr
Mark E. Dawson, Jr.
5 months
PERF PEEPS: I value your expertise & camaraderie. I'd like it to last as long as possible. So plz do me a favor: Each weekday morning drop & do 30 pushups/30 squats. At some point each weekday do 30mins on the treadmill/bike at a moderate pace. That's all🙏
2
1
21
@medawsonjr
Mark E. Dawson, Jr.
6 months
@GergelyOrosz I won't lie - I used to *love* non-competes. In the HFT industry, you'd get paid between 75 - 100% of your salary for the entire duration, which typically lasts btwn 6 months to 1 year. I took some of my most memorable vacations during mine.
2
0
20
@medawsonjr
Mark E. Dawson, Jr.
1 year
Continuous Profilers (the 4th Pillar of Observability) will become even more important as general purpose CPUs gain more ground from special-purpose co-processors in the Cloud AI/ML space:
3
8
20
@medawsonjr
Mark E. Dawson, Jr.
9 months
@forked_franz You guys may also benefit from running roofline analysis on your code, a topic about which both @i_bogosavljevic and @dendibakh have blogged. Here's @i_bogosavljevic 's article on how to use it on your code:
0
10
20
@medawsonjr
Mark E. Dawson, Jr.
1 year
There's the geeky side of me that loves each new advancement in x86-64, ARM, RISC-V, and specialized co-processors. But then there's my *other* side that wishes it all stagnates a bit (~5yrs or so) to force devs to embrace Mechanical Sympathy🤷🏾‍♂️
2
2
19
@medawsonjr
Mark E. Dawson, Jr.
1 year
Oh, for the love of All That Is Holy & Highly Performant, would someone w/better Google-Fu than me plz tell me where I might find this tool? A system-wide causal profiler that employs virtual speedups againt cores instead of threads🤯
4
6
20
@medawsonjr
Mark E. Dawson, Jr.
1 year
Everyone dances in the streets any time an x86 CPU gets a bigger LLC. I felt like I was dancing alone on an abandoned street w/o any music besides what played in my head when the first one w/48KB L1d arrived. The game is won or lost in the L1d for me🤷🏾‍♂️
1
1
19
@medawsonjr
Mark E. Dawson, Jr.
11 months
Very interesting performance investigation! I won't tell you much more so that I won't give it away. I'll just say that you shouldn't let the title make you jump to conclusions about possible culprits😉
1
7
19
@medawsonjr
Mark E. Dawson, Jr.
2 years
Has anyone else managed to put Kanye West, Taylor Swift, the MTV Music Video Awards, and Linux FTrace all in the same article? You know what? Never mind. I'm just gonna take the credit anyway:
0
8
17
@medawsonjr
Mark E. Dawson, Jr.
2 years
Interesting article compares AWS, GCP, and Azure on the likelihood of VM colocation and the resulting performance impact:
0
6
18
@medawsonjr
Mark E. Dawson, Jr.
1 year
I know I've endorsed P99 Conf several times before. But for any of you still on the fence, lemme tell ya that I just noticed @trav_downs on the agenda (not sure I've ever seen him present before). Register now:
2
2
18
@medawsonjr
Mark E. Dawson, Jr.
6 months
@axelgneiting HFT apps eschew runtime allocs not just cuz of alloc cost (or the more expensive dealloc cost) but the minor pg fault cost (100s of ns). Allocators that prealloc/pin memory upfront (and prevent TLB Shootdowns by not unmapping on dealloc) avoid this.
2
1
18
@medawsonjr
Mark E. Dawson, Jr.
6 months
Interesting proposal for improving the accuracy of performance profiles in the era of OoO, deeply pipelined CPUs
@bjorntopel
Björn Töpel
6 months
This seems to be a really, REALLY good CPU performance monitoring thesis:
2
17
54
1
6
18
@medawsonjr
Mark E. Dawson, Jr.
1 year
Since RAM latency's only growing w/DRAM chip size & each DDR release, I see CPU DCA support as a no-brainer. Yet AMD *still* doesn't offer it, while Intel has DDIO & ARM has DynamIQ Cache Stashing. Are there any perf writeups available for the latter?
3
3
18
@medawsonjr
Mark E. Dawson, Jr.
1 year
BOOM! Intel Linear Address Masking Masking (LAM) support has finally been merged!💪🏾
@kayseesee
Kostya Serebryany
1 year
Intel LAM support is merged to Linux upstream! (LAM makes HWASAN possible, similar to Arm's top-byte-ignore)
2
30
106
1
1
17
@medawsonjr
Mark E. Dawson, Jr.
2 years
@majek04 Yep, this is a nice writeup on DRAM Refresh (I touch on it briefly at ). Interestingly, LPDDR uses per-bank refresh to allow concurrent R/W access during refresh. But that's typically only used in mobile platforms🤷🏾‍♂️
0
4
17
@medawsonjr
Mark E. Dawson, Jr.
1 year
@disruptnhandlr And therein lies the problem w/its applicability for HFT: code that we want optimized *are* the cold, rarely executed functions😪 As a result, we resort to clever tricks to keep that code warm artificially. But we're a corner case industry anyway🤷🏾‍♂️
3
1
15
@medawsonjr
Mark E. Dawson, Jr.
6 months
This is a cool distillation of important high-level performance concepts along with ample references to source material👍🏽
@sadisticsystems
Tyler Neely
4 years
Wrote this little theoretical performance guide. It contains a bunch of ideas that have been super helpful while building most of the systems I've worked on as an engineer. It's fairly language and system agnostic, focusing on a number of timeless ideas.
10
116
453
0
6
17
@medawsonjr
Mark E. Dawson, Jr.
3 years
Now *this* is a great example of proper benchmarking. Using runtime info alone requires relying too much on intuition, and performance on today's complex systems often defy our intuition.
@johnnysswlab
johnnysswlab.com
3 years
We try to answer why #quicksort is faster than #heapsort and then we dig deeper into these algorithms' #hardwareefficiency . The goal: making them faster.
1
6
28
0
5
16
@medawsonjr
Mark E. Dawson, Jr.
3 years
In the game of Low Latency (and, dare I say, in general computing), if your monitoring stack (e.g., TICK, ELK, etc.) tracks %CPU and %MEM but ignores IPC and Memory BW, you're hamstringing yourself.
2
1
16
@medawsonjr
Mark E. Dawson, Jr.
3 years
After several yrs working w/it now, I can confidently say that if you're not profiling your multithreaded code with @emeryberger 's Coz profiler then you're costing yourself extra work:
2
1
16