Marat Dukhan Profile
Marat Dukhan

@MaratDukhan

Followers
1,496
Following
252
Media
17
Statuses
336

Building AGI @OpenAI . Previously TLM for XNNPACK @GoogleAI , lead for QNNPACK @FacebookAI & author of NNPACK. Opinions are my own

SF bay area, CA
Joined October 2013
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@MaratDukhan
Marat Dukhan
3 years
Harder, Better, Faster, Stronger quantized inference available in TensorFlow Lite and XNNPACK right now!
@TensorFlow
TensorFlow
3 years
⚡ XNNPACK-Accelerated Quantized Inference is coming to TensorFlow Lite and brings capabilities for efficient cross-platform deployment. Learn more ↓
0
39
112
0
1
12
@MaratDukhan
Marat Dukhan
10 months
OpenAI is nothing without its people
11
25
457
@MaratDukhan
Marat Dukhan
5 years
TensorFlow.js on CPU now faster with an XNNPACK-powered #WebAssembly backend! Whopping 4-20x over previous TF.js CPU backend in pure JavaScript🚀, near-universal coverage, and Node.js compatibility - available right now in the Alpha release
@TensorFlow
TensorFlow
5 years
We’re excited to release the Alpha of our WebAssembly backend for TensorFlow.js! 🎉 WASM has wider device support and better numerical stability while getting competitive with WebGL for smaller models. Share your feedback here →
Tweet media one
7
193
505
0
12
86
@MaratDukhan
Marat Dukhan
10 months
How come I have 1K+ followers? I'm a simple AGI engineer, I don't have anything interesting to share!
25
0
79
@MaratDukhan
Marat Dukhan
10 months
♥️
@gdb
Greg Brockman
10 months
❤️
348
642
14K
1
3
61
@MaratDukhan
Marat Dukhan
10 months
- Have you decided where you want to work at? - I decided WHO I want to work with. Where is not important.
3
2
50
@MaratDukhan
Marat Dukhan
4 years
Today at @CVPR we are presenting the joint work () with colleagues from @DeepMind and @GoogleAI that proves sparsity in neural network weights practically useful for accelerating ConvNet inference on general-purpose processors.
Tweet media one
1
19
48
@MaratDukhan
Marat Dukhan
4 months
@ezyang It is not Python, it is floating-point >>> 9007199254740993. == 9007199254740992. True
1
0
47
@MaratDukhan
Marat Dukhan
10 months
The OpenAI board should resign IMO
2
0
35
@MaratDukhan
Marat Dukhan
8 years
With NNPACK it is viable to run Imagenet-scale neural networks inside a Web browser. (2/2)
Tweet media one
0
7
26
@MaratDukhan
Marat Dukhan
5 years
Third-generation NNPACK-family library is here, at ! This time the focus is on accelerating FP32 models in NHWC layout, and it supports both mobile and Web platforms.
0
3
24
@MaratDukhan
Marat Dukhan
5 years
@soumithchintala @jeremyphoward In hardware, it is cheaper to scale compute units than memory bandwidth, thus the more powerful an accelerator is, the more it is typically imbalanced towards compute performance vs memory performance.
1
3
22
@MaratDukhan
Marat Dukhan
9 months
The FP16 inference blog post I wrote while still at Google is finally up: Enjoy the 2X speedup for floating-point models across most ARM devices, including recent mobile phones, Apple Silicon Macs, ARM64 Windows laptops, and Raspberry Pi 5!
2
2
22
@MaratDukhan
Marat Dukhan
8 months
It is already 2024, and I just received the statue for the award I won in 2023 for the work I did in 2022. Big companies’s gears take a long time to turn.
Tweet media one
2
0
19
@MaratDukhan
Marat Dukhan
5 years
My most awaited #TFDevSummit announcement today: XNNPACK is coming in TF Lite 2.3! But if you're adventurous, you can already try it today following instructions at
Tweet media one
0
9
20
@MaratDukhan
Marat Dukhan
4 years
Today is a historic day for Web computing: #WebAssembly #SIMD proposal progressed to stage 4 of standardization process, i.e. browsers now can enable it by default.
0
4
17
@MaratDukhan
Marat Dukhan
5 years
@soumithchintala @jeremyphoward Grouped convolution reduce number of parameters and FLOPs by number of groups, but doesn’t affect the number of input & output elements. Thus, overall grouped convolution is more memory-intensive than normal convolution, especially when there’re many groups (e.g. DW convolution).
2
3
16
@MaratDukhan
Marat Dukhan
5 years
New goodies from colleagues at @GoogleAI ! All #MediaPipe effects - edge detection, face detection, hair segmentation, and hand tracking - now can run inside a Web browser, powered by XNNPACK and #WebAssembly
@googledevs
Google for Developers
5 years
Thumbs up, MediaPipe! 👍👍👍 Check out the blog to see MediaPipe graphs running live in the web browser, enabled by WebAssembly and accelerated by XNNPack ML Inference Library. Read here →
7
125
457
0
3
14
@MaratDukhan
Marat Dukhan
4 years
With sparsity you get up to 3X faster inference and also smaller models (e.g. 16.5 MB -> 6 MB for our sparse MobileNet v1 model). Experimental implementation is available right now in the @TensorFlow Lite through the XNNPACK backend:
1
2
13
@MaratDukhan
Marat Dukhan
4 years
Sparse inference for computer vision is how you do more (mind-blowing AI processing) with less (milliseconds, joules, and bytes), and it is not just production-ready, it is in production! Try background effects in Google Meet or MediaPipe Hand Tracking to see for yourself!
@GoogleAI
Google AI
4 years
Today we announce the release of new sparsity features in the #XNNPACK acceleration library that is powering #TensorFlowLite ! Sparse inference improves efficiency without degrading quality in applications like Google Meet's background effects.
8
127
590
0
2
12
@MaratDukhan
Marat Dukhan
4 months
We will have AGI before California high-speed rail
@CaHSRA
CA High-Speed Rail 🚄💨
4 months
The Fresno River Viaduct in Madera County is one of the first completed high-speed rail structures. At nearly 1,600 feet long, high-speed trains will travel over the riverbed and will run parallel with the BNSF Railroad. #BuildHSR
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2K
146
2K
1
0
13
@MaratDukhan
Marat Dukhan
4 months
I’m riding the high tide of Chelsea’s excitement
@csvoss
Chelsea Sierra Voss
4 months
I am excited about rapha’s excitement
0
0
66
0
0
11
@MaratDukhan
Marat Dukhan
4 years
Its official: TensorFlow Lite gets XNNPACK booster 🚀
@TensorFlow
TensorFlow
4 years
Accelerating TF Lite 🏁⚡️ See how integration of the XNNPACK library with TensorFlow Lite improves neural network inference performance by 2.3X on average. Learn more ↓
1
75
228
1
2
11
@MaratDukhan
Marat Dukhan
3 years
#WebAssembly SIMD is now enabled by default in both Chrome and FireFox. Time to start hacking on it!
@RReverser
🇺🇦 Ingvar Stepanyan
3 years
Very surprised that release notes don't mention WebAssembly SIMD. It's shipped by default on desktop in Firefox 89!
Tweet media one
2
0
26
0
1
10
@MaratDukhan
Marat Dukhan
3 years
@hardmaru I wonder how much better it could be if we removed all "2+2=5" examples from the training set.
0
0
10
@MaratDukhan
Marat Dukhan
4 months
@OfficialLoganK Expectation: "available to try today" Reality: "sign up for the waitlist"
2
0
10
@MaratDukhan
Marat Dukhan
5 years
@soumithchintala @jeremyphoward On mobile, where DW convolutions are particularly popular, the situation is different: with XNNPACK, DW 3x3 convolutions deliver ~1/3 GFLOPS of a normal convolution, and the ~9X reduction in the total number of FLOPS in DW sep. conv. more than compensates for the efficiency loss.
1
1
10
@MaratDukhan
Marat Dukhan
6 months
And nuclear fusion and room-temperature superconductivity!
@OfficialLoganK
Logan Kilpatrick
6 months
In the next 10 years we are going to have: - super human AI - full self driving everywhere in the world - humans on mars - internet everywhere on earth - supersonic commercial jets - cures for major diseases Keep building, there’s still more to do 🚀
147
155
2K
1
0
9
@MaratDukhan
Marat Dukhan
9 months
@giffmana Publication quotas
1
1
8
@MaratDukhan
Marat Dukhan
5 years
Atwood's law is coming after ML. Expect any ML model that can run on mobile to eventually run on the Web.
@TensorFlow
TensorFlow
5 years
Real time face + hand tracking in the browser with MediaPipe + TensorFlow.js What will you create using these new models? #MadeWithTFJS Read more →
12
449
2K
0
2
9
@MaratDukhan
Marat Dukhan
2 months
@abacaj Amusing that many people believe competition will drive LLM cost to zero, some people believe it will drive GPU cost to zero, and no people believe competition will drive electricity cost to zero.
2
0
8
@MaratDukhan
Marat Dukhan
10 months
@paulg A passage from your old essay came a lot to my mind in the recent days: “You could parachute [Sam] into an island full of cannibals and come back in 5 years and he'd be the king.”
1
2
7
@MaratDukhan
Marat Dukhan
7 months
What the world would look like if we only had more GPUs!
@sama
Sam Altman
7 months
690
2K
21K
0
0
8
@MaratDukhan
Marat Dukhan
4 years
TIL #Firefox nightly supports enough of #WebAssembly SIMD to run XNNPACK!
0
0
8
@MaratDukhan
Marat Dukhan
8 years
Suddenly, without warning and without a declaration of war, NNPACK is rolling out Android and ARM NEON versions
0
0
7
@MaratDukhan
Marat Dukhan
10 months
@hardmaru @paulg was on point!
Tweet media one
0
0
6
@MaratDukhan
Marat Dukhan
9 months
Tweet media one
0
0
7
@MaratDukhan
Marat Dukhan
5 years
@soumithchintala @jeremyphoward High compute performance is great for normal convolutions, particularly with many input/output channels, but barely helps the commonly used 3x3 depthwise convolutions, which are bandwith-limited even on mobile CPUs.
1
2
7
@MaratDukhan
Marat Dukhan
5 years
Sparsification, or pruning of weights, in convolutional neural networks has a long history as a compression technique, and good support in deep learning frameworks, e.g. Model Optimization Toolkit in TensorFlow. [1/4]
1
2
6
@MaratDukhan
Marat Dukhan
6 years
Today Facebook publicly released QNNPACK, open source library for low-precision neural network computations on mobile. Caffe2+QNNPACK = 2x speedup over TFLite + support for grouped conv (CondenseNet, ShuffleNet, RexNeXt).
0
0
6
@MaratDukhan
Marat Dukhan
2 years
Until recently, XNNPack was not memory-efficient for throughput-optimized inference, e.g. server-side or burst-mode. Weight cache fix that, by enabling internally repacked weights to be shared across multiple threads.
@TensorFlow
TensorFlow
2 years
🎉For TFLite Users, XNNPack now includes a weights cache that is more memory-efficient for batch inference. Learn more here →
Tweet media one
1
25
59
1
0
6
@MaratDukhan
Marat Dukhan
8 years
#WebAssembly benchmarks on neural network inference via #NNPACK and #mInference
Tweet media one
0
0
5
@MaratDukhan
Marat Dukhan
3 years
@cHHillee We should be optimizing for minimal memory bandwidth. Unfortunately, its very hard to do. First, utilized memory bandwidth is dependent on both hardware and software stack. Secondly, there aren't simple models for it, you need to understand the whole stack.
1
0
5
@MaratDukhan
Marat Dukhan
2 years
@sedielem BatchNorm is free at inference time (fused into preceding convolution/matmul), LayerNorm/GroupNorm has cost (and a big one on highly parallel hardware).
0
0
5
@MaratDukhan
Marat Dukhan
4 years
#PyTorch Mobile now integrates XNNPACK too! Congrats @AshkanAliabadi on the launch
@PyTorch
PyTorch
4 years
v1.6: native mixed-precision support from NVIDIA (~2x perf improvement), distributed perf improvements, new profiling tool for memory consumption, Microsoft commits to developing and maintaining Windows PyTorch. Release Notes: Blog:
5
238
789
0
1
5
@MaratDukhan
Marat Dukhan
4 years
One more happy XNNPACK + #WebAssembly user: "XNNPACK was great. After some adjustment to the build system we were able to build our code with XNNPACK for WASM. It took us overall 10 days. The result? Almost the same speed as the C++ version. "
0
1
4
@MaratDukhan
Marat Dukhan
8 years
Favorite plot from my Ph.D. proposal. That's why you should learn assembly!
Tweet media one
0
3
4
@MaratDukhan
Marat Dukhan
6 months
Deja vu
@Hooli_CEO
Gavin Belson
8 years
I don’t want to live in a world where someone else makes the world a better place better than we do #SiliconValley #hooli
0
15
59
0
0
4
@MaratDukhan
Marat Dukhan
6 months
The first #GTC where Jensen didn’t need to chant “The more GPUs you buy, the more money you save”
2
0
4
@MaratDukhan
Marat Dukhan
1 year
The gap between native and in-browser compute is shrinking. Relaxed SIMD augments WebAssembly SIMD with FMA, 8-bit Integer Dot Product, efficient minimum/maximum, efficient float-to-int conversions, efficient lane-wise selects, and more.
@intenttoship
Intent To Ship
2 years
Blink: Intent to Ship: WebAssembly Relaxed SIMD
0
1
9
0
0
4
@MaratDukhan
Marat Dukhan
10 months
@btaylor Welcome aboard!
0
0
0
@MaratDukhan
Marat Dukhan
10 months
@bchesky Thank you so much! I’ve heard you worked hard on this
0
0
1
@MaratDukhan
Marat Dukhan
10 months
❤️
@ikeadrift
benjamin
10 months
now everyone can experience what i, @joannejang , @giertler , @gopatrik , and many other folks worked so hard on. it's very cool. go check it out
49
27
504
0
0
4
@MaratDukhan
Marat Dukhan
3 years
@slava_pestov #define true is_computer_on()
0
0
4
@MaratDukhan
Marat Dukhan
8 years
0
0
2
@MaratDukhan
Marat Dukhan
7 months
@giffmana There’s no business case for running ZLUDA on AMD/Intel = AMD/Intel don’t want to lock themselves into forever being the second-best CUDA implementation. This has little to do with ZLUDA being a viable tech.
2
0
3
@MaratDukhan
Marat Dukhan
7 months
State of mind: someone who claims to be Will Smith, but could be as well be a fake, posted a video that claims to be a fake, but could as well be real!
@WillSmith2real
Will Smith
7 months
This is getting out of hand! - Will Smith
1K
9K
82K
0
0
3
@MaratDukhan
Marat Dukhan
5 years
@rzidane360 E.g. see files which start with expminus (i.e. exp of negative number) here:
1
0
3
@MaratDukhan
Marat Dukhan
9 months
@soumithchintala @giffmana I heard this complaint from a then-FAIR-now-ex-FAIR researcher couple years ago
0
0
3
@MaratDukhan
Marat Dukhan
10 months
@eshear Thank you for your hard work on saving OpenAI! 🫡
0
0
3
@MaratDukhan
Marat Dukhan
5 years
Our recent work [] with colleagues from @DeepMind and @GoogleAI demonstrates that with a right layout and optimizations sparse inference delivers practical and non-negligible speedups of 1.3X-2.4X on a range of MobileNet and EfficientNet models. [3/4]
1
0
3
@MaratDukhan
Marat Dukhan
4 years
Visit our virtual poster at @CVPR for learn the details:
0
1
3
@MaratDukhan
Marat Dukhan
8 months
@ashvardanian disagrees (although they don't have Sapphire Rapids, but Alder Lake P should be similar): 3 cycle without masking for any width, 5 cycles with
0
0
2
@MaratDukhan
Marat Dukhan
2 months
@abacaj Overall, technology improvements + Jevons paradox are driving it. OpenAI was lowering token pricing even before there was any competition.
0
0
3
@MaratDukhan
Marat Dukhan
8 years
Overview of neuromorphic computing architectures by Catherine Schuman of @ORNL #SC16
Tweet media one
0
2
3
@MaratDukhan
Marat Dukhan
3 years
@Nextremer_nb_o Operators supported by XNNPACK get offloaded to XNNPACK, others fall back to built-in TFLite kernels, which use Ruy for matrix multiplication.
0
0
3
@MaratDukhan
Marat Dukhan
8 months
@zeuxcg You need MI250 or MI300X. Consumer Radeon cards have very poor matmul performance.
0
0
3
@MaratDukhan
Marat Dukhan
3 years
@zeuxcg No big deal. x86 will get irrelevant in the consumer space in the next decade.
0
0
2
@MaratDukhan
Marat Dukhan
8 years
Moore's law is dead, and it's time to learn to live without it! #SC16
Tweet media one
1
4
2
@MaratDukhan
Marat Dukhan
2 months
@Dan_Jeffries1 @tsarnick Orcas are undertrained
0
0
2
@MaratDukhan
Marat Dukhan
8 months
@ashvardanian These days even microcontrollers have (limited) SIMD. RV64 is the only somewhat popular CPU architecture that doesn’t.
0
0
2
@MaratDukhan
Marat Dukhan
7 months
@gazorp5 @giffmana To have more control. HIP doesn't have to 100% follow CUDA semantics, but ZLUDA does.
1
0
2
@MaratDukhan
Marat Dukhan
6 months
1
0
2
@MaratDukhan
Marat Dukhan
8 years
@ajlavin Just shipped in NNPACK. The accuracy got about 10000x (yep, ten thousand times) better on ImageNet models!
2
0
2
@MaratDukhan
Marat Dukhan
7 years
@shiffman @p5xjs #NNPACK PNaCl/Asm.js/WebAssembly backend
Tweet media one
0
2
2
@MaratDukhan
Marat Dukhan
10 months
@nutsiepully Thanks Pulkit! It is better than appears from the outside. We may join Microsoft, but either way we'd be fine.
1
0
2
@MaratDukhan
Marat Dukhan
8 months
@willdepue Warp reductions are even cooler 😎
1
0
2
@MaratDukhan
Marat Dukhan
5 years
A preview of in-browser machine learning-based demos by our group in a great #ChromeDevSummit talk by @fractorious and @RReverser Powered By XNNPACK, MediaPipe, and #WebAssembly (+SIMD)
Tweet media one
Tweet media two
1
1
2
@MaratDukhan
Marat Dukhan
8 years
Jaeha Kung of @gtcomputing on how to build really efficient CNN hardware
Tweet media one
0
1
2
@MaratDukhan
Marat Dukhan
2 years
@PINTO03091 @mattn_jp @KzhtTkhs We have some new (and experimental) optimization: Would you give it a try?
1
0
2
@MaratDukhan
Marat Dukhan
5 years
@rzidane360 Or see this file for AVX512F version:
0
0
2
@MaratDukhan
Marat Dukhan
10 months
0
0
1
@MaratDukhan
Marat Dukhan
3 years
@InstLatX64 I suspect it can be used to implement histogram calculation
1
0
1
@MaratDukhan
Marat Dukhan
3 years
@Nextremer_nb_o @PINTO03091 Upstream XNNPACK supports RISC-V, the version in TFLite 2.7 most likely doesn't
1
0
2
@MaratDukhan
Marat Dukhan
10 months
@matt_dz I’m working on getting Opcodes up-to-date
0
0
2
@MaratDukhan
Marat Dukhan
2 years
@nikitabier For 50 steps, under 1 second of compute time on A100, which amounts to about $0.0002.
0
0
2
@MaratDukhan
Marat Dukhan
8 years
Intel documented the new AVX512 instructions for neural networks. See Chapter 6 of
0
2
2
@MaratDukhan
Marat Dukhan
4 years
@utkuevci @GoogleAI @ablavatski @erich_elsen @Tgale96 Right. Structured sparsity is more efficient on microkernel level, but unstructured works well too.
0
0
2
@MaratDukhan
Marat Dukhan
4 months
@LiamFedus GPT2 is cool again!
0
0
2
@MaratDukhan
Marat Dukhan
10 months
0
0
2
@MaratDukhan
Marat Dukhan
4 months
@OfficialLoganK Still waiting for Gemini Ultra access, released > 6 months ago
0
0
1
@MaratDukhan
Marat Dukhan
2 months
@GregoryDiamos @realSharonZhou Per GeekWire “Adept co-founder and CEO David Luan, the former vice president of engineering at OpenAI, will join Amazon. Adept co-founders Augustus Odena, Maxwell Nye, Erich Elsen, and Kelsey Szot will also move to Amazon, along with a few other employees.”
0
0
1
@MaratDukhan
Marat Dukhan
4 years
@zeuxcg I think this was on purpose. They don’t want to jeopardize sales of current-gen Intel Macs.
0
0
1
@MaratDukhan
Marat Dukhan
2 years
@hardmaru What does Dalle think of a bear market?
0
0
1
@MaratDukhan
Marat Dukhan
5 months
@aaron_defazio Does it work for transformers?
1
0
1