Marat Dukhan @MaratDukhan profile

Marat Dukhan

@MaratDukhan

Followers

1,496

Following

252

Media

17

Statuses

336

Building AGI @OpenAI . Previously TLM for XNNPACK @GoogleAI , lead for QNNPACK @FacebookAI & author of NNPACK. Opinions are my own

SF bay area, CA

Joined October 2013

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

América • 955481 Tweets

Venezuela • 584485 Tweets

Senado • 403821 Tweets

Argentina • 387073 Tweets

Colombia • 362311 Tweets

Haitians • 351313 Tweets

Yunes • 321794 Tweets

Morena • 246288 Tweets

PS5pro • 244704 Tweets

Sony • 149156 Tweets

Project 2025 • 96916 Tweets

Bolivia • 79527 Tweets

#Debate2024 • 77181 Tweets

Dibu • 72268 Tweets

Dave Grohl • 52427 Tweets

Uruguay • 51899 Tweets

Barranquilla • 48343 Tweets

James Rodríguez • 40158 Tweets

Vargas • 34574 Tweets

Gareca • 30121 Tweets

Xicoténcatl • 27691 Tweets

#VamosChile • 26007 Tweets

Paraguay • 24796 Tweets

ゲーム機 • 22826 Tweets

Scaloni • 19096 Tweets

ゲーミングPC • 14790 Tweets

Dybala • 14039 Tweets

Giulia • 12927 Tweets

Nico González • 12567 Tweets

Milad • 12224 Tweets

Enner

Laura Loomer

Rochet

Reince Priebus

Celso

Diego Gómez

平次と和葉

Otamendi

安達さん

Osorio

Wharton

ビシエド

David Muir

Data Pribadi

Kike Olivera

Bielsa

#JagaDataKita

#BersamaJagaNIK

Run Spot Run

#صلوا_علي_سيدنا_محمد

Last Seen Profiles

@ProfessorBoone

@Lqrmey

@InterceptorHire

@Jendaffin

@xxxxl_1111

@vidar064

@shotbyZeus

@whiterose1250

@pakislutaliyah

@getcareteam

@AMojiretsu

@Cocoakurun

@proftimwarner

@albainster

@SmallwoodCorner

@DRALSHAMMERI11

@yzxswiftie

@gra7hk

@maticslapsak

@CikampekGigolo

Pinned Tweet

Marat Dukhan

@MaratDukhan

3 years

Harder, Better, Faster, Stronger quantized inference available in TensorFlow Lite and XNNPACK right now!

TensorFlow

@TensorFlow

3 years

⚡ XNNPACK-Accelerated Quantized Inference is coming to TensorFlow Lite and brings capabilities for efficient cross-platform deployment. Learn more ↓

0

39

112

0

1

12

Marat Dukhan

@MaratDukhan

10 months

OpenAI is nothing without its people

11

25

457

Marat Dukhan

@MaratDukhan

5 years

TensorFlow.js on CPU now faster with an XNNPACK-powered #WebAssembly backend! Whopping 4-20x over previous TF.js CPU backend in pure JavaScript🚀, near-universal coverage, and Node.js compatibility - available right now in the Alpha release

TensorFlow

@TensorFlow

5 years

We’re excited to release the Alpha of our WebAssembly backend for TensorFlow.js! 🎉 WASM has wider device support and better numerical stability while getting competitive with WebGL for smaller models. Share your feedback here →

7

193

505

0

12

86

Marat Dukhan

@MaratDukhan

10 months

How come I have 1K+ followers? I'm a simple AGI engineer, I don't have anything interesting to share!

25

0

79

Marat Dukhan

@MaratDukhan

10 months

♥️

Greg Brockman

@gdb

10 months

❤️

348

642

14K

1

3

61

Marat Dukhan

@MaratDukhan

10 months

- Have you decided where you want to work at? - I decided WHO I want to work with. Where is not important.

3

2

50

Marat Dukhan

@MaratDukhan

4 years

Today at @CVPR we are presenting the joint work () with colleagues from @DeepMind and @GoogleAI that proves sparsity in neural network weights practically useful for accelerating ConvNet inference on general-purpose processors.

1

19

48

Marat Dukhan

@MaratDukhan

4 months

@ezyang It is not Python, it is floating-point >>> 9007199254740993. == 9007199254740992. True

1

0

47

Marat Dukhan

@MaratDukhan

10 months

The OpenAI board should resign IMO

2

0

35

Marat Dukhan

@MaratDukhan

8 years

With NNPACK it is viable to run Imagenet-scale neural networks inside a Web browser. (2/2)

0

7

26

Marat Dukhan

@MaratDukhan

5 years

Third-generation NNPACK-family library is here, at ! This time the focus is on accelerating FP32 models in NHWC layout, and it supports both mobile and Web platforms.

GitHub - google/XNNPACK: High-efficiency floating-point neural network inference operators for...

High-efficiency floating-point neural network inference operators for mobile, server, and Web - google/XNNPACK

github.com

0

3

24

Marat Dukhan

@MaratDukhan

5 years

@soumithchintala @jeremyphoward In hardware, it is cheaper to scale compute units than memory bandwidth, thus the more powerful an accelerator is, the more it is typically imbalanced towards compute performance vs memory performance.

1

3

22

Marat Dukhan

@MaratDukhan

9 months

The FP16 inference blog post I wrote while still at Google is finally up: Enjoy the 2X speedup for floating-point models across most ARM devices, including recent mobile phones, Apple Silicon Macs, ARM64 Windows laptops, and Raspberry Pi 5!

Half-precision Inference Doubles On-Device Inference Performance

We are pleased to announce the general availability for half-precision inference in TensorFlow Lite and XNNPack.

blog.tensorflow.org

2

22

Marat Dukhan

@MaratDukhan

8 months

It is already 2024, and I just received the statue for the award I won in 2023 for the work I did in 2022. Big companies’s gears take a long time to turn.

2

0

19

Marat Dukhan

@MaratDukhan

5 years

My most awaited #TFDevSummit announcement today: XNNPACK is coming in TF Lite 2.3! But if you're adventurous, you can already try it today following instructions at

0

9

20

Marat Dukhan

@MaratDukhan

4 years

Today is a historic day for Web computing: #WebAssembly #SIMD proposal progressed to stage 4 of standardization process, i.e. browsers now can enable it by default.

0

4

17

Marat Dukhan

@MaratDukhan

5 years

@soumithchintala @jeremyphoward Grouped convolution reduce number of parameters and FLOPs by number of groups, but doesn’t affect the number of input & output elements. Thus, overall grouped convolution is more memory-intensive than normal convolution, especially when there’re many groups (e.g. DW convolution).

2

3

16

Marat Dukhan

@MaratDukhan

5 years

New goodies from colleagues at @GoogleAI ! All #MediaPipe effects - edge detection, face detection, hair segmentation, and hand tracking - now can run inside a Web browser, powered by XNNPACK and #WebAssembly

Google for Developers

@googledevs

5 years

Thumbs up, MediaPipe! 👍👍👍 Check out the blog to see MediaPipe graphs running live in the web browser, enabled by WebAssembly and accelerated by XNNPack ML Inference Library. Read here →

7

125

457

0

3

14

Marat Dukhan

@MaratDukhan

4 years

With sparsity you get up to 3X faster inference and also smaller models (e.g. 16.5 MB -> 6 MB for our sparse MobileNet v1 model). Experimental implementation is available right now in the @TensorFlow Lite through the XNNPACK backend:

1

2

13

Marat Dukhan

@MaratDukhan

4 years

Sparse inference for computer vision is how you do more (mind-blowing AI processing) with less (milliseconds, joules, and bytes), and it is not just production-ready, it is in production! Try background effects in Google Meet or MediaPipe Hand Tracking to see for yourself!

Google AI

@GoogleAI

4 years

Today we announce the release of new sparsity features in the #XNNPACK acceleration library that is powering #TensorFlowLite ! Sparse inference improves efficiency without degrading quality in applications like Google Meet's background effects.

8

127

590

0

2

12

Marat Dukhan

@MaratDukhan

4 months

We will have AGI before California high-speed rail

CA High-Speed Rail 🚄💨

@CaHSRA

4 months

The Fresno River Viaduct in Madera County is one of the first completed high-speed rail structures. At nearly 1,600 feet long, high-speed trains will travel over the riverbed and will run parallel with the BNSF Railroad. #BuildHSR

2K

146

2K

1

0

13

Marat Dukhan

@MaratDukhan

4 months

I’m riding the high tide of Chelsea’s excitement

Chelsea Sierra Voss

@csvoss

4 months

I am excited about rapha’s excitement

0

66

0

11

Marat Dukhan

@MaratDukhan

4 years

Its official: TensorFlow Lite gets XNNPACK booster 🚀

TensorFlow

@TensorFlow

4 years

Accelerating TF Lite 🏁⚡️ See how integration of the XNNPACK library with TensorFlow Lite improves neural network inference performance by 2.3X on average. Learn more ↓

1

75

228

1

2

11

Marat Dukhan

@MaratDukhan

3 years

#WebAssembly SIMD is now enabled by default in both Chrome and FireFox. Time to start hacking on it!

🇺🇦 Ingvar Stepanyan

@RReverser

3 years

Very surprised that release notes don't mention WebAssembly SIMD. It's shipped by default on desktop in Firefox 89!

2

0

26

0

1

10

Marat Dukhan

@MaratDukhan

3 years

@hardmaru I wonder how much better it could be if we removed all "2+2=5" examples from the training set.

0

10

Marat Dukhan

@MaratDukhan

4 months

@OfficialLoganK Expectation: "available to try today" Reality: "sign up for the waitlist"

2

0

10

Marat Dukhan

@MaratDukhan

5 years

@soumithchintala @jeremyphoward On mobile, where DW convolutions are particularly popular, the situation is different: with XNNPACK, DW 3x3 convolutions deliver ~1/3 GFLOPS of a normal convolution, and the ~9X reduction in the total number of FLOPS in DW sep. conv. more than compensates for the efficiency loss.

1

10

Marat Dukhan

@MaratDukhan

6 months

And nuclear fusion and room-temperature superconductivity!

Logan Kilpatrick

@OfficialLoganK

6 months

In the next 10 years we are going to have: - super human AI - full self driving everywhere in the world - humans on mars - internet everywhere on earth - supersonic commercial jets - cures for major diseases Keep building, there’s still more to do 🚀

147

155

2K

1

0

9

Marat Dukhan

@MaratDukhan

9 months

@giffmana Publication quotas

1

8

Marat Dukhan

@MaratDukhan

5 years

Atwood's law is coming after ML. Expect any ML model that can run on mobile to eventually run on the Web.

TensorFlow

@TensorFlow

5 years

Real time face + hand tracking in the browser with MediaPipe + TensorFlow.js What will you create using these new models? #MadeWithTFJS Read more →

12

449

2K

0

2

9

Marat Dukhan

@MaratDukhan

2 months

@abacaj Amusing that many people believe competition will drive LLM cost to zero, some people believe it will drive GPU cost to zero, and no people believe competition will drive electricity cost to zero.

2

0

8

Marat Dukhan

@MaratDukhan

10 months

@paulg A passage from your old essay came a lot to my mind in the recent days: “You could parachute [Sam] into an island full of cannibals and come back in 5 years and he'd be the king.”

1

2

7

Marat Dukhan

@MaratDukhan

7 months

What the world would look like if we only had more GPUs!

Sam Altman

@sama

7 months

690

2K

21K

0

8

Marat Dukhan

@MaratDukhan

4 years

TIL #Firefox nightly supports enough of #WebAssembly SIMD to run XNNPACK!

0

8

Marat Dukhan

@MaratDukhan

8 years

Suddenly, without warning and without a declaration of war, NNPACK is rolling out Android and ARM NEON versions

0

7

Marat Dukhan

@MaratDukhan

10 months

@hardmaru @paulg was on point!

0

6

Marat Dukhan

@MaratDukhan

9 months

@AravSrinivas

0

7

Marat Dukhan

@MaratDukhan

5 years

@soumithchintala @jeremyphoward High compute performance is great for normal convolutions, particularly with many input/output channels, but barely helps the commonly used 3x3 depthwise convolutions, which are bandwith-limited even on mobile CPUs.

1

2

7

Marat Dukhan

@MaratDukhan

5 years

Sparsification, or pruning of weights, in convolutional neural networks has a long history as a compression technique, and good support in deep learning frameworks, e.g. Model Optimization Toolkit in TensorFlow. [1/4]

1

2

6

Marat Dukhan

@MaratDukhan

6 years

Today Facebook publicly released QNNPACK, open source library for low-precision neural network computations on mobile. Caffe2+QNNPACK = 2x speedup over TFLite + support for grouped conv (CondenseNet, ShuffleNet, RexNeXt).

QNNPACK: Open source library for optimized mobile deep learning

Facebook is open-sourcing QNNPACK, a high-performance kernel library that is optimized for mobile AI. The library speeds up many operations, such as depthwise convolutions, that advanced neural net…

engineering.fb.com

0

6

Marat Dukhan

@MaratDukhan

2 years

Until recently, XNNPack was not memory-efficient for throughput-optimized inference, e.g. server-side or burst-mode. Weight cache fix that, by enabling internally repacked weights to be shared across multiple threads.

TensorFlow

@TensorFlow

2 years

🎉For TFLite Users, XNNPack now includes a weights cache that is more memory-efficient for batch inference. Learn more here →

1

25

59

1

0

6

Marat Dukhan

@MaratDukhan

8 years

#WebAssembly benchmarks on neural network inference via #NNPACK and #mInference

0

5

Marat Dukhan

@MaratDukhan

3 years

@cHHillee We should be optimizing for minimal memory bandwidth. Unfortunately, its very hard to do. First, utilized memory bandwidth is dependent on both hardware and software stack. Secondly, there aren't simple models for it, you need to understand the whole stack.

1

0

5

Marat Dukhan

@MaratDukhan

2 years

@sedielem BatchNorm is free at inference time (fused into preceding convolution/matmul), LayerNorm/GroupNorm has cost (and a big one on highly parallel hardware).

0

5

Marat Dukhan

@MaratDukhan

2 years

@pshufb

GitHub - pytorch/cpuinfo: CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/ma...

CPU INFOrmation library (x86/x86-64/ARM/ARM64, Linux/Windows/Android/macOS/iOS) - pytorch/cpuinfo

github.com

0

5

Marat Dukhan

@MaratDukhan

4 years

#PyTorch Mobile now integrates XNNPACK too! Congrats @AshkanAliabadi on the launch

PyTorch

@PyTorch

4 years

v1.6: native mixed-precision support from NVIDIA (~2x perf improvement), distributed perf improvements, new profiling tool for memory consumption, Microsoft commits to developing and maintaining Windows PyTorch. Release Notes: Blog:

5

238

789

0

1

5

Marat Dukhan

@MaratDukhan

4 years

One more happy XNNPACK + #WebAssembly user: "XNNPACK was great. After some adjustment to the build system we were able to build our code with XNNPACK for WASM. It took us overall 10 days. The result? Almost the same speed as the C++ version. "

We had to squeeze our Neural Network by 30x to run inside Chrome

Building noise cancelling extension for a browser wasn’t possible until recently. Find out how we shrunk our DNN 30x to run inside Chrome.

krisp.ai

0

1

4

Marat Dukhan

@MaratDukhan

8 years

Favorite plot from my Ph.D. proposal. That's why you should learn assembly!

0

3

4

Marat Dukhan

@MaratDukhan

6 months

Deja vu

Gavin Belson

@Hooli_CEO

8 years

I don’t want to live in a world where someone else makes the world a better place better than we do #SiliconValley #hooli

0

15

59

0

4

Marat Dukhan

@MaratDukhan

6 months

The first #GTC where Jensen didn’t need to chant “The more GPUs you buy, the more money you save”

2

0

4

Marat Dukhan

@MaratDukhan

1 year

The gap between native and in-browser compute is shrinking. Relaxed SIMD augments WebAssembly SIMD with FMA, 8-bit Integer Dot Product, efficient minimum/maximum, efficient float-to-int conversions, efficient lane-wise selects, and more.

Intent To Ship

@intenttoship

2 years

Blink: Intent to Ship: WebAssembly Relaxed SIMD

0

1

9

0

4

Marat Dukhan

@MaratDukhan

10 months

@btaylor Welcome aboard!

0

Marat Dukhan

@MaratDukhan

10 months

@bchesky Thank you so much! I’ve heard you worked hard on this

0

1

Marat Dukhan

@MaratDukhan

5 years

@Deepmind @GoogleAI The micro-kernels for sparse inference on ARM64 and WebAssembly are already open-sourced in XNNPACK [], and so are pre-trained sparse models [] [4/4]

GitHub - google/XNNPACK: High-efficiency floating-point neural network inference operators for...

High-efficiency floating-point neural network inference operators for mobile, server, and Web - google/XNNPACK

github.com

0

4

Marat Dukhan

@MaratDukhan

10 months

❤️

benjamin

@ikeadrift

10 months

now everyone can experience what i, @joannejang , @giertler , @gopatrik , and many other folks worked so hard on. it's very cool. go check it out

49

27

504

0

4

Marat Dukhan

@MaratDukhan

3 years

@slava_pestov #define true is_computer_on()

0

4

Marat Dukhan

@MaratDukhan

7 years

#mInference demo, that won the 1st place on the AI Hackathon in Minsk, is now live on #WebAssembly #AsmJS #PNaCl

minference.com - Domain Name For Sale | Dan.com

I found a great domain name for sale on @undeveloped. Check it out!

dan.com

0

4

Marat Dukhan

@MaratDukhan

8 years

@jckarter @stephentyrone @JokerEph @bofh453 We have a poster about :

0

2

Marat Dukhan

@MaratDukhan

7 months

@giffmana There’s no business case for running ZLUDA on AMD/Intel = AMD/Intel don’t want to lock themselves into forever being the second-best CUDA implementation. This has little to do with ZLUDA being a viable tech.

2

0

3

Marat Dukhan

@MaratDukhan

7 months

State of mind: someone who claims to be Will Smith, but could be as well be a fake, posted a video that claims to be a fake, but could as well be real!

Will Smith

@WillSmith2real

7 months

This is getting out of hand! - Will Smith

1K

9K

82K

0

3

Marat Dukhan

@MaratDukhan

5 years

@rzidane360 E.g. see files which start with expminus (i.e. exp of negative number) here:

1

0

3

Marat Dukhan

@MaratDukhan

9 months

@soumithchintala @giffmana I heard this complaint from a then-FAIR-now-ex-FAIR researcher couple years ago

0

3

Marat Dukhan

@MaratDukhan

10 months

@eshear Thank you for your hard work on saving OpenAI! 🫡

0

3

Marat Dukhan

@MaratDukhan

5 years

Our recent work [] with colleagues from @DeepMind and @GoogleAI demonstrates that with a right layout and optimizations sparse inference delivers practical and non-negligible speedups of 1.3X-2.4X on a range of MobileNet and EfficientNet models. [3/4]

1

0

3

Marat Dukhan

@MaratDukhan

4 years

Visit our virtual poster at @CVPR for learn the details:

0

1

3

Marat Dukhan

@MaratDukhan

8 months

@ashvardanian disagrees (although they don't have Sapphire Rapids, but Alder Lake P should be similar): 3 cycle without masking for any width, 5 cycles with

0

2

Marat Dukhan

@MaratDukhan

2 months

@abacaj Overall, technology improvements + Jevons paradox are driving it. OpenAI was lowering token pricing even before there was any competition.

0

3

Marat Dukhan

@MaratDukhan

8 years

Overview of neuromorphic computing architectures by Catherine Schuman of @ORNL #SC16

0

2

3

Marat Dukhan

@MaratDukhan

3 years

@Nextremer_nb_o Operators supported by XNNPACK get offloaded to XNNPACK, others fall back to built-in TFLite kernels, which use Ruy for matrix multiplication.

0

3

Marat Dukhan

@MaratDukhan

8 months

@zeuxcg You need MI250 or MI300X. Consumer Radeon cards have very poor matmul performance.

0

3

Marat Dukhan

@MaratDukhan

3 years

@zeuxcg No big deal. x86 will get irrelevant in the consumer space in the next decade.

0

2

Marat Dukhan

@MaratDukhan

8 years

Moore's law is dead, and it's time to learn to live without it! #SC16

1

4

2

Marat Dukhan

@MaratDukhan

2 months

@Dan_Jeffries1 @tsarnick Orcas are undertrained

0

2

Marat Dukhan

@MaratDukhan

8 months

@ashvardanian These days even microcontrollers have (limited) SIMD. RV64 is the only somewhat popular CPU architecture that doesn’t.

0

2

Marat Dukhan

@MaratDukhan

7 months

@gazorp5 @giffmana To have more control. HIP doesn't have to 100% follow CUDA semantics, but ZLUDA does.

1

0

2

Marat Dukhan

@MaratDukhan

6 months

@bowenc0221 @Tesla @OpenAI Welcome!

1

0

2

Marat Dukhan

@MaratDukhan

8 years

@ajlavin Just shipped in NNPACK. The accuracy got about 10000x (yep, ten thousand times) better on ImageNet models!

2

0

2

Marat Dukhan

@MaratDukhan

3 years

@soumithchintala @ryanmhickman @MediaPipe @realmgyong

0

2

Marat Dukhan

@MaratDukhan

7 years

@shiffman @p5xjs #NNPACK PNaCl/Asm.js/WebAssembly backend

0

2

Marat Dukhan

@MaratDukhan

10 months

@nutsiepully Thanks Pulkit! It is better than appears from the outside. We may join Microsoft, but either way we'd be fine.

1

0

2

Marat Dukhan

@MaratDukhan

8 months

@willdepue Warp reductions are even cooler 😎

1

0

2

Marat Dukhan

@MaratDukhan

5 years

A preview of in-browser machine learning-based demos by our group in a great #ChromeDevSummit talk by @fractorious and @RReverser Powered By XNNPACK, MediaPipe, and #WebAssembly (+SIMD)

1

2

Marat Dukhan

@MaratDukhan

8 years

Jaeha Kung of @gtcomputing on how to build really efficient CNN hardware

0

1

2

Marat Dukhan

@MaratDukhan

2 years

@PINTO03091 @mattn_jp @KzhtTkhs We have some new (and experimental) optimization: Would you give it a try?

1

0

2

Marat Dukhan

@MaratDukhan

5 years

@rzidane360 Or see this file for AVX512F version:

0

2

Marat Dukhan

@MaratDukhan

10 months

@_aidan_clark_ @AdrienLE 😂

0

1

Marat Dukhan

@MaratDukhan

3 years

@InstLatX64 I suspect it can be used to implement histogram calculation

1

0

1

Marat Dukhan

@MaratDukhan

3 years

@Nextremer_nb_o @PINTO03091 Upstream XNNPACK supports RISC-V, the version in TFLite 2.7 most likely doesn't

1

0

2

Marat Dukhan

@MaratDukhan

10 months

@matt_dz I’m working on getting Opcodes up-to-date

0

2

Marat Dukhan

@MaratDukhan

2 years

@nikitabier For 50 steps, under 1 second of compute time on A100, which amounts to about $0.0002.

0

2

Marat Dukhan

@MaratDukhan

8 years

Intel documented the new AVX512 instructions for neural networks. See Chapter 6 of

0

2

Marat Dukhan

@MaratDukhan

4 years

@utkuevci @GoogleAI @ablavatski @erich_elsen @Tgale96 Right. Structured sparsity is more efficient on microkernel level, but unstructured works well too.

0

2

Marat Dukhan

@MaratDukhan

4 months

@LiamFedus GPT2 is cool again!

0

2

Marat Dukhan

@MaratDukhan

10 months

@yoavgo No

0

2

Marat Dukhan

@MaratDukhan

4 months

@OfficialLoganK Still waiting for Gemini Ultra access, released > 6 months ago

0

1

Marat Dukhan

@MaratDukhan

2 months

@GregoryDiamos @realSharonZhou Per GeekWire “Adept co-founder and CEO David Luan, the former vice president of engineering at OpenAI, will join Amazon. Adept co-founders Augustus Odena, Maxwell Nye, Erich Elsen, and Kelsey Szot will also move to Amazon, along with a few other employees.”

0

1

Marat Dukhan

@MaratDukhan

4 years

@zeuxcg I think this was on purpose. They don’t want to jeopardize sales of current-gen Intel Macs.

0

1

Marat Dukhan

@MaratDukhan

2 years

@hardmaru What does Dalle think of a bear market?

0

1

Marat Dukhan

@MaratDukhan

5 months

@aaron_defazio Does it work for transformers?

1

0

1