Tianqi Chen Profile
Tianqi Chen

@tqchenml

Followers
15,933
Following
992
Media
49
Statuses
1,210
Explore trending content on Musk Viewer
Pinned Tweet
@tqchenml
Tianqi Chen
19 days
Exciting to share we have been working on over the past one year. MLCEngine, a universal LLM deployment engine that brings the power of server optimizations and local deploymet into a single framework, checkout platforms support 👇 and blogpost more in a🧵
Tweet media one
4
51
222
@tqchenml
Tianqi Chen
5 years
I will join @mldcmu and @CSDatCMU @SCSatCMU in Fall 2020 as an Assistant Professor. I am super grateful to my advisors, collaborators and @uwcse for a wonderful PhD journey. Looking forward to working with colleagues on more cross-stack research for future intelligent systems.
46
40
679
@tqchenml
Tianqi Chen
6 months
Chat with Mistral 7B Instruct v0.2 running locally in iphone and ipad. Now available in @AppStore .
24
72
501
@tqchenml
Tianqi Chen
1 year
Running LLM-based chat locally on iphone 📱with GPU acceleration. Also brings universal deployment to NV/AMD/M1 GPUs. Love to see it enabling personal assistants for everyone. Try out demos
@bohanhou1998
Bohan Hou
1 year
Can LLMs run natively on your iPhone📱? Our answer is yes, and we can do more! We are introducing MLC-LLM, an open framework that brings language models (LLMs) directly into a broad class of platforms (CUDA, Vulkan, Metal) with GPU acceleration! Demo:
Tweet media one
33
180
716
9
89
323
@tqchenml
Tianqi Chen
1 year
MLC Chat app is now on @AppStore Checkout it out! Chat with open language models running on your iPad and iPhone, offline, locally, no data collected
Tweet media one
14
54
271
@tqchenml
Tianqi Chen
7 years
Introducing NNVM compiler brings MXNet, PyTorch, Caffe2, CoreML to bare-metal hardware backends with #TVM stack
@uwcse
Allen School
7 years
#UWAllen researchers team up with @awscloud to release new NNVM compiler for deep learning frameworks:
Tweet media one
0
18
37
2
138
268
@tqchenml
Tianqi Chen
2 years
A summer endeavor, developing MLC: the first open lecture series on ML compilation. Machine learning compilation is an emerging field for systematic optimization and deployment of AI workloads. Hope to share adventures and fun with the community 🚀
7
59
258
@tqchenml
Tianqi Chen
6 years
Compile your deep learning models directly into WebGL and deploy to browsers without having to write a single line of javascript!
5
104
250
@tqchenml
Tianqi Chen
1 year
Today LLMs require extensive computation and memory to run and usually run on servers. What will be the future of consumer devices in the era of AI? While it is hard to predict the future, let us talk about possible opportunities and how we can enable them
Tweet media one
1
63
223
@tqchenml
Tianqi Chen
2 years
Autodiff is the backbone of DL frameworks. In Lecture 4 of open DLSysCourse, we derive auto-diff from scratch and bring it further under the context of architecture considerations in frameworks such as @PyTorch and @TensorFlow . Checkout to learn more
2
47
221
@tqchenml
Tianqi Chen
6 years
Highly recommend @marcotcr @sameer_ and @guestrin 's new paper anchors for explaining ML models. Intriguing points that I learned: the VQA model will tell you the answer is "dog" just because there a "what" in the question!
2
84
216
@tqchenml
Tianqi Chen
6 years
We can now use AI to automatically optimize tensor operator kernels and compile AI workloads, enabling deployment to all hardware with state-of-art perf. This is an exciting test bed for new ideas in BayesianOpt, RL, GraphNN, transfer learning etc.
2
72
197
@tqchenml
Tianqi Chen
2 years
@zicokolter and I are releasing a free online deep learning systems course. We will a minimum PyTorch-like framework from scratch, and build various end-to-end deep learning models using the framework. Check it out if you are interested in a “full stack” view AI systems
@zicokolter
Zico Kolter
2 years
Announcement: This Fall @tqchenml and I are releasing a free online version of our Deep Learning Systems course! Short description is you build a deep learning framework from the ground up. Sign up at course website: Video teaser:
13
122
627
2
31
196
@tqchenml
Tianqi Chen
4 years
I spend the Monday morning reflecting on research process. And come up with the following picture. While the impact is what we go for, most joyful and sometimes grinding moments come before that. Surfing and enjoy the uncertainty period is one of the best part.
Tweet media one
6
19
196
@tqchenml
Tianqi Chen
3 years
The future of machine learning system architecture should change from uni-directional pipelines to circles. Checkout our latest blogpost to learn about our lessons and vision about ML software & hardware ecosystem, with @samps @roeschinc
Tweet media one
3
45
197
@tqchenml
Tianqi Chen
11 months
We are facing a hardware shortage for AI, and the key reason is software. Let us bring high-performance, universal deployment of open LLMs. 👉 Running #Llama2 on AMD 7900 XTX GPU with 80% perf of RTX 4090. Checkout the python package and get superb perf on both CUDA and ROCm.
@bohanhou1998
Bohan Hou
11 months
Making @AMD @amdradeon GPUs competitive for LLM inference! 130 toks/s of Llama 2 7B, 75 toks/s for 13B with ROCm 5.6 + 7900 XTX + 4 bit quantization 80% performance of Nvidia RTX 4090 See how we do this in detail and try out our Python packages here:
Tweet media one
9
40
186
3
37
197
@tqchenml
Tianqi Chen
13 days
Browsers have the potential to become the next main LLM OS to empower web agents. We are excited to announce WebLLM engine, a fast, private (full client-side computation) and convenient (zero environment setup) in-browser LLM inference engine to enable that. WebLLM offers
@charlie_ruan
Charlie Ruan
13 days
Excited to share WebLLM engine: a high-performance in-browser LLM inference engine! WebLLM offers local GPU acceleration via @WebGPU , fully OpenAI-compatible API, and built-in web workers support to separate backend executions. Check out the blog post:
10
94
399
4
31
193
@tqchenml
Tianqi Chen
11 months
Making LLM more accessible on cheap devices. Running 3B LLM(RedPajama-3B) at 5 tok/sec and #llama2 7B at 2.5 tok/sec on a $100 Orange Pi single board computer accelerated by Mali GPU 👉
@junrushao
Junru Shao
11 months
While LLM is resource hungry and challenging to run at satisfactory speed on small devices, we show that ML compilation (MLC) techniques makes it possible to actually generate tokens at 5 tok/sec on a $100 Orange Pi with a Mali GPU.
Tweet media one
11
48
234
2
23
183
@tqchenml
Tianqi Chen
6 years
I am always curious how to write mobile gpu code to make my phones run AI applications fast. SJTU undergrad Lianmin Zheng just write a blog about it
6
49
172
@tqchenml
Tianqi Chen
6 years
Designing DL accelerator is more than designing hardware itself, it is about hardware, driver, compiler, and model working together. The new open source deep learning accelerator stack VTA provide a first open step toward that direction
Tweet media one
3
65
159
@tqchenml
Tianqi Chen
6 years
The desire toward "novelty" in writing papers often leads to overly complex methods, while simple existing solution might just work as well and should have been reported
3
20
142
@tqchenml
Tianqi Chen
7 years
thanks to yuwei hu, NNVM compiler now deploys keras models directly to the hardware backends
2
50
136
@tqchenml
Tianqi Chen
3 years
Exciting to share our incoming #MLsys paper -- Cortex: A Compiler for Recursive Deep Learning Models This is a part of our recent CMU catalyst group effort with @JiaZhihao @atalwalkar and others amazing folks
2
26
131
@tqchenml
Tianqi Chen
3 months
If you want to learn more about the latest advances in AI and systems, such as systems for diffusion models, multi-LoRA serving, MOE, efficient quantization and systems, and more AI and systems topics. Check out this year’s #MLSys2024 program. The conference will happen May 13th
Tweet media one
1
26
130
@tqchenml
Tianqi Chen
3 years
Glad that XGBoost continues to help data scientists, thanks to @hcho3_ml and many other community developers
@XGBoostProject
XGBoost
3 years
@kaggle State of machine learning annual survey. XGboost, as usual, is among the top framework choices. Glad that we can help to make ml and data science better together with collection of other awesome frameworks
Tweet media one
4
54
283
3
16
125
@tqchenml
Tianqi Chen
1 year
#WebLLM just completed a major overhaul with typescript rewrite, and modularized packaging. The @javascript package is now available in @npmjs . Brings accelerated LLM chats to the browser via @WebGPU . Checkout examples and build your own private chatbot
Tweet media one
2
39
126
@tqchenml
Tianqi Chen
6 years
With the emergence of new hardware, what are the challenges for deep learning systems? We try to answer it in our TVM paper to provide reusable optimization stack for DLSys on CPU/GPU, and accelerators. I will give an oral presentation about it at #SysML
1
48
125
@tqchenml
Tianqi Chen
6 years
Video for my SysML talk TVM: End-to-End Compilation Stack for Deep Learning video is now online at The first SysML is really exciting and am looking forward to next year's conference!
1
43
121
@tqchenml
Tianqi Chen
6 years
2018 will be the year we see more compiler and deep learning in all parts of the stack
3
24
118
@tqchenml
Tianqi Chen
2 years
Excited to share our #NeurIPS2022 paper. "Tensor Program Optimization with Probabilistic Programs". Talk to @junrushao and others at poster Hall J 702! This is one of the most fun improvement made to @ApacheTVM in the past two years, from @OctoML @mldcmu @SCSatCMU . a thread🧵
Tweet media one
2
22
119
@tqchenml
Tianqi Chen
6 years
Set out to #NeurIPS , many UWSAMPL folks will be there. We will present "Learning to Optimize Tensor Programs" (Tue Dec 4th Spotlight 3:45PM Room 220 CD and 5PM Room 210 & 230 AB Poster 104). Love to chat about ML for systems and hardware-software full-stack ML systems
Tweet media one
0
23
100
@tqchenml
Tianqi Chen
4 years
Fun hack in the past few weeks. By compiling machine learning models to #WebAssembly and #WebGPUc for the first time, we can run GPU-accelerated deep learning models on browsers and get close to the native performance.
@ApacheTVM
Apache TVM
4 years
Compiling Machine Learning to WASM and WebGPU with Apache TVM. Preliminary experiments shows that TVM’s WebGPU backend can get close to native GPU performance when deploying models to the web.🚀🚀
Tweet media one
1
27
84
1
13
102
@tqchenml
Tianqi Chen
5 years
Moving to the new @uwcse ai lab. The space makes me feel that we can grow deeper models here:)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
4
99
@tqchenml
Tianqi Chen
5 years
Random search can be very strong and often is STOA, especially when we do not have strong domain knowledge
@hardmaru
hardmaru
5 years
Random Search and Reproducibility for Neural Architecture Search They show that random search of architectures is a strong baseline for architecture search. In fact, random search gets near state-of-the-art results on PTB (RNNs) and CIFAR-10 (ConvNets).
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
117
428
1
18
99
@tqchenml
Tianqi Chen
5 months
One key insight in FlashAttn and many related works is that attention computation can be communicative and associative. I always wonder about the best way to write down and teach it. Checkout "Recursive Attention" section👉, which explains it elegantly
@ye_combinator
Zihao Ye
5 months
(1/3) Memory Bandwidth Efficient Shared Prefix Batch Decoding, brought to you by FlashInfer: blog: Trying out our APIs:
Tweet media one
1
20
100
0
19
100
@tqchenml
Tianqi Chen
6 years
Nice blog on using TVM stack to speed up batchmul in @TensorFlow for 13x and end to end neural machine translation for 1.7x
0
35
97
@tqchenml
Tianqi Chen
8 years
We are opensource NNVM as our step to push modularize and decentralzied deep learning systems
4
55
97
@tqchenml
Tianqi Chen
1 year
RedPajama-3B is a pretty compact model(takes ~2G) yet pretty amazing. Now running on M1, iPhone and browsers. You can also bring your own model weights and chat with them in the laptop, browser or phone, all local. Checkout the demo!
@junrushao
Junru Shao
1 year
LLMs like RedPajama are embracing open permissive license. What does it mean for us? Could they become our personal friends, and will they shape a unique market? Now RedPajama-3B can run locally on phones, browsers and laptops with hardware acceleration!
3
18
75
1
16
91
@tqchenml
Tianqi Chen
6 years
Use transfer learning to automatic optimize deep learning kernel performance on Nvidia, AMD GPUs, mobile phones and IoT devices. Super exciting to see that learning is already competitive with the vendor specific solutions such as TensorRT, TFLite, ARMComputeLib.
@ApacheTVM
Apache TVM
6 years
Automatic Kernel Optimization for Deep Learning on All Hardware Platforms
Tweet media one
1
23
60
0
27
89
@tqchenml
Tianqi Chen
4 years
Binary neural networks have great potential for AIoT, #TinyML , however, none of existing paper actually implemented the BNNs in a way that can measure end to end speedups on real hw. Riptide is the first paper to makes BNN practical, using great ML and systems( #MLSys ) insights
@OctoAICloud
OctoAI
4 years
Interested in trying out the first ever end to end, optimized, open source binary neural network framework? Read all about the 12x end to end speedups available from Riptide in our latest blog entry by our very own Josh Fromm:
1
46
118
2
25
92
@tqchenml
Tianqi Chen
8 months
Bringing #Llama2 70B universally, for everyone with multi-GPU. 👉 Scale to bigger models for GPU poor without constraint by the RAM limit. 🚀 go GPU poors. The 70B model now can get to 30t/s under 2k$ budget with 2xAMD and 34 t/s under 3k$ with 2x4090.
@junrushao
Junru Shao
8 months
(1/3) 🦙🌟 Looking to run Llama2-70B? With two NV/AMD GPUs or more? 💥🔥 Machine learning compilation (MLC) now supports multi-GPU. ⚡️💻 We achieve 34 tok/sec on 2 x RTX 4090, the fastest solution at $3.2k. 🌐💡Two AMD 7900XTX delivers 30 tok/sec at $2k.
Tweet media one
8
37
173
1
19
88
@tqchenml
Tianqi Chen
11 months
It is kinda of 🤯 and fun to see it works. llama2 70b completely running on browser without server support with latest Chrome Canary via @WebGPU , get a apple silicon device with 64G or more, open the webpage and that's it.
@ruihanglai
Ruihang Lai
11 months
Even the 70B #Llama2 model is runnable through Web LLM accelerated by @WebGPU ! If you have an Apple Silicon Mac with 64GB memory or more, visit and try out the 70B model.
Tweet media one
5
21
86
2
23
86
@tqchenml
Tianqi Chen
4 years
FFI Navigator: IDEs support find function definition within the same language(e.g. python or c++), but not cross cross-language FFI calls. We hacked up a fun language server plugin for doing that supports @ApacheTVM @PyTorch @ApacheMXNet @GraphDeep
2
14
90
@tqchenml
Tianqi Chen
8 years
I need to make an example tutorial for NNVM. So we build a Tensorflow from ground up on top of @TorchML !
1
37
89
@tqchenml
Tianqi Chen
1 year
Check this out. Stable diffusion completely in the browser with @WebGPU with no server support (and as fast as native gpu running). It is also amazing to see how OSS ecosystems come together to make it possible, with @huggingface diffusors, @PyTorch @ApacheTVM @WebGPU
@ruihanglai
Ruihang Lai
1 year
Stable diffusion models are fun and usually take a server to run. Today we are happy to share the Web Stable Diffusion project, which brings SD models from @huggingface and @pytorch to browser clients through @WebGPU support. Checkout for our demo!
Tweet media one
4
28
125
0
23
87
@tqchenml
Tianqi Chen
8 years
The lessons we learnt when building #XGBoost system and also see our paper at @guestrin
1
55
89
@tqchenml
Tianqi Chen
1 month
Ten years ago, where the journey of XGBoost began.
@dhpmrou
David Rousseau
1 month
10 years ago #OTD , @bingxu_ and @tqchenml quietly announced the birth of @XGBoostProject on the @kaggle #higgsml competition forum. Since then, XGBoost has skyrocketed in popularity to become the top ML go-to tool! 1/4
Tweet media one
2
16
60
4
10
85
@tqchenml
Tianqi Chen
7 years
I am excited to teach CSE 599G1, deep learning system next quarter at @uwcse .
2
8
84
@tqchenml
Tianqi Chen
8 months
👉Mistral running fully locally on your browser, with benefit of sliding window attention means to have a looong chat with the model.
@charlie_ruan
Charlie Ruan
8 months
Run @MistralAI 's 7B model on your browser with @WebGPU acceleration! Try it out at For native LLM deployment, sliding window attention is particularly helpful for enjoying longer context with less memory requirement.
Tweet media one
1
10
62
1
17
84
@tqchenml
Tianqi Chen
8 years
XGBoost now comes with dropout support(DART) by @marugari2
1
43
78
@tqchenml
Tianqi Chen
6 years
Proud to be part of the new group at UW to bring people system, architecture, machine learning and program language together to push for research on future intelligent systems.
@luisceze
Luis Ceze
6 years
Say hi to SAML, our newly formed @uwcse research group at continue pushing cross-stack machine learning systems research like TVM and PHUB. @tqchenml @thierryduplat @guestrin @ztatlock
0
2
40
0
6
80
@tqchenml
Tianqi Chen
7 years
If you are at #GTC , go and talk to Rory Michell deep learning & ai section tonight about brand new GPU accelerated #XGBoost
1
22
74
@tqchenml
Tianqi Chen
6 months
Mistral-7B-Instruct-v0.2 on iPhone with StreamingLLM support
@davidpissarra
David Pissarra
6 months
Run the Mistral-7B-Instruct-v0.2 model on iPhone! Supports now StreamingLLM for endless generation. Try the MLC Chat App via TestFlight For native LLM deployment, attention sinks are particularly helpful for longer generation with less memory requirement.
Tweet media one
Tweet media two
3
16
73
1
12
76
@tqchenml
Tianqi Chen
2 months
#Llama3 🦙🦙 running fully locally on iPad without internet connnection. credits to @ruihanglai and the team
0
17
76
@tqchenml
Tianqi Chen
11 months
Amazing results from WizardLM. Great to see open LLM continues to make strides, and let us bring them to everywhere, for everyone
@WizardLM_AI
WizardLM
11 months
🔥🔥🔥 Introduce the newest WizardMath models (70B/13B/7B) ! WizardMath 70B achieves: 1. Surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Chinchilla on GSM8k with 81.6 Pass @1 2. Surpasses Text-davinci-002, GAL, PaLM, GPT-3 on MATH with 22.7 Pass @1 3. Surpasses all other
Tweet media one
Tweet media two
Tweet media three
38
119
598
3
9
72
@tqchenml
Tianqi Chen
3 months
Please spread the words, #MLSys2024 will feature a full day single track-event young professional symposium with invited talks, panels, round tables, and poster sessions. Submit your 1-page abstract by April 1st & present your work at our poster session.
2
19
71
@tqchenml
Tianqi Chen
7 years
Here are the slides for my talk at ML System workshop:) #NIPS2017
3
19
72
@tqchenml
Tianqi Chen
2 months
Llama 3 running natively in browser accelerated by @WebGPU
@charlie_ruan
Charlie Ruan
2 months
Llama 3 from @AIatMeta is now up on WebLLM! Try it on with local inference accelerated by @WebGPU . Or start building your local agent with the web-llm package -- everything in-browser!
Tweet media one
Tweet media two
2
12
78
2
15
72
@tqchenml
Tianqi Chen
1 year
#MLSys23 will be Miami from June 4th through June 8th! We have an exciting program about the latest advances in machine learning and systems. We are also happy to announce the MLSys23 student travel awards program. Please apply via
0
17
70
@tqchenml
Tianqi Chen
6 years
TVM v0.2 is here! A lot of exciting things happened in the last release cycle, nnvm compiler support for deep learning frameworks, AMDGPU backends, and good ARM GPU results. We will see more exciting things to come in the stack in the next cycle.
1
25
71
@tqchenml
Tianqi Chen
4 months
Gemma 2b running on iphone at 22+ tok/sec. 2B really hits a sweetspot for running local model on phone. Try it out via 👉
@ruihanglai
Ruihang Lai
4 months
Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model. This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by @junrushao from @OctoAICloud and many other contributors.
3
18
38
2
18
71
@tqchenml
Tianqi Chen
2 months
It is amazing how cheap we can go when it comes to running #Llama3 models from @AIatMeta , running on a $100 Orange Pi
@mengshyu
Mengshiun
2 months
Deploy #Llama3 on $100 Orange Pi with GPU acceleration through MLC LLM. Try it out on your Orange Pi 👉
Tweet media one
Tweet media two
1
12
58
1
13
70
@tqchenml
Tianqi Chen
1 year
#MLSys kicks off this year with a session on scaling ai systems, chaired by @JiaZhihao . Three great talks on scaling up transformers, mixture of experts and related model variants
Tweet media one
3
6
69
@tqchenml
Tianqi Chen
9 years
A self-contained and comprehensive introduction to boosted trees http://t.co/k15UGC8iOd #xgboost
2
26
66
@tqchenml
Tianqi Chen
4 years
Remote presentation works quite smoothly at the #MLSys conference
Tweet media one
1
4
65
@tqchenml
Tianqi Chen
8 years
Great work from @marcotcr , @sameer_ on explaining any machine learning model (20 newsgroup, deep net). @guestrin
1
34
66
@tqchenml
Tianqi Chen
1 year
Excited to share #ASPLOS23 paper, "TensorIR: An Abstraction for Automatic Tensorized Program Optimization". This is an exciting step to bring @ApacheTVM to next level and enable more accessible ML compilation. It also powers fun stuffs like Web Stable Diffusion. a thread🧵
@bohanhou1998
Bohan Hou
1 year
Excited to share our #ASPLOS23 paper. “TensorIR: An Abstraction for Automatic Tensorized Program Optimization”. Come to our talk in Section 5C at Grand D! TensorIR lies at the core of the next-level generation of the TVM compiler @ApacheTVM , from @SCSatCMU @OctoML .
Tweet media one
0
10
47
2
13
66
@tqchenml
Tianqi Chen
10 months
👉 Try out codellama 7b and 13b in colab notebooks
@davidpissarra
David Pissarra
10 months
Code Llama is now on MLC LLM! 🦙💻 MLC LLM allows you to deploy your personal coding AI assistants locally. Try it out directly on Colab. This is an easy way to run @MetaAI 's latest code-specialized LLM. MLC LLM: Colab 13b models:
Tweet media one
Tweet media two
4
22
71
0
15
62
@tqchenml
Tianqi Chen
1 year
WebLLM, runs LLM entirely in the browser with @WebGPU , leading to lot of fun opportunities to build AI assistants for everyone. Try out the demo! runs on M1/M2 and other desktops with enough GPU. Decent at writing poems and not too good at drawing, did get cat right sometimes.
Tweet media one
@HongyiJin258
Hongyi Jin
1 year
Introducing WebLLM, an open-source chatbot that brings language models (LLMs) directly onto web browsers. We can now run instruction fine-tuned LLaMA (Vicuna) models natively on your browser tab via @WebGPU with no server support. Checkout our demo at .
Tweet media one
46
448
2K
0
15
62
@tqchenml
Tianqi Chen
5 years
A cool result from FB, better inference with new conv op and code optimized by AutoTVM
@goodfellow_ian
Ian Goodfellow
5 years
OctConv is a simple replacement for the traditional convolution operation that gets better accuracy with fewer FLOPs
Tweet media one
16
571
2K
0
18
59
@tqchenml
Tianqi Chen
11 months
Try out the @WizardLM_AI WizardMath directly in colab 👉 Fun to explore the reasoning capabilities of WizardMath and see what specialized openLLM can do
@charlie_ruan
Charlie Ruan
11 months
Specialized personal AI assistants--deployable everywhere! 🌎🤖 Check out MLC LLM’s support for recently released models from @WizardLM_AI : WizardLM, WizardMath, and WizardCoder! Try it here (runnable on Colab):
Tweet media one
Tweet media two
2
10
46
0
9
61
@tqchenml
Tianqi Chen
8 years
Here is the link to my talk at MLSys #NIPS2016 this year about #MXNet and #XGBoost
2
23
59
@tqchenml
Tianqi Chen
7 years
AMDGPUs for TVM stack and NNVM compiler! enables deep learning Frameworks ->AMD GCN assembly. by Aditya and Masa
1
28
60
@tqchenml
Tianqi Chen
1 year
#MLSys23 will be Miami from June 4th through June 8th! We have an exciting program about the latest advances in machine learning and systems. If you want to learn about the latest technologies and advances in AI and machine learning systems. Early registration ends by May 7th!
1
14
58
@tqchenml
Tianqi Chen
1 year
#MLSys2023 will feature two keynote speakers, @matei_zaharia will talk about “Improving the Quality and Factuality of Large Language Model Applications”, and @srush_nlp will discuss “Do we need Attention?” join us and register at
Tweet media one
2
13
57
@tqchenml
Tianqi Chen
7 years
assignment 2 of dlsys course build a complete GPU based framework in 1k lines of python
0
14
55
@tqchenml
Tianqi Chen
19 days
Qwen2 on iphone is now available through testflight , powered by MLCEngine
@tqchenml
Tianqi Chen
19 days
Exciting to share we have been working on over the past one year. MLCEngine, a universal LLM deployment engine that brings the power of server optimizations and local deploymet into a single framework, checkout platforms support 👇 and blogpost more in a🧵
Tweet media one
4
51
222
0
14
56
@tqchenml
Tianqi Chen
5 years
@soumithchintala Regardless of technical directions, one things that I learned from PyTorch is the importance of a good community :)
1
4
55
@tqchenml
Tianqi Chen
6 years
TVMv0.3 is here! TOPI vision support, numpy-style operator overload APIs, new backends. We are also going to have a fun next release cycle with improvements bought by our awesome community contributors
0
20
54
@tqchenml
Tianqi Chen
2 years
We are back on our Episode 6 of MLC(ponsored by @OctoML ). A lot of fun with TorchFX and bring @PyTorch models onto the a machine learning compilatiion flow. Looking forward to seeing more pythonic program transformations. May the modules evolve
1
11
53
@tqchenml
Tianqi Chen
11 months
Finally feeling making full worth the buck on getting the 64g m2max book that runs llama2 70b. Kinda of attractive deal on the capability it can get. The age of PC begins with a few big computers, wonder what would the age of P-AI looks like. At least RAM becomes as crucial
@junrushao
Junru Shao
11 months
(1/2) 🦙 Buckle up and ready for a wild llama ride with 70B Llama-2 on a single MacBook 💻 🤯 Now 70B Llama-2 can be run smoothly on an 64G M2 max with 4bit quantization. 👉 Here is a step-by-step guide: 🚀 How about the performance? It's
19
86
504
3
9
53
@tqchenml
Tianqi Chen
5 years
Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition. Simpler and more efficient network design that improves over EfficientNet-B0. by Bing Xu, @ajtulloch and others from FB AI
2
9
52
@tqchenml
Tianqi Chen
1 month
#MLSys2024 keynote @JeffDean . Advances in ML for systems and systems for ML
Tweet media one
5
7
52
@tqchenml
Tianqi Chen
5 years
TVM community joins the @TheASF , fun journeys ahead, looking forward to building better deep learning compiler stack together for everyone
@ApacheTVM
Apache TVM
5 years
TVM Deep Learning Compiler Community Joins Apache Software Foundation @TheASF
Tweet media one
0
16
40
1
9
50
@tqchenml
Tianqi Chen
2 years
Excited to see folks in two weeks at MLSys22 in Santa Clara. @mcarbin and I are happy to announce MLSys 2023 key dates: - Paper submission and co-author registration:, October 28, 2022 4pm ET - Author response: Jan 16 to Jan, 20, 2023 - Author notification: Feb 17, 2023
1
11
51
@tqchenml
Tianqi Chen
7 years
An awesome tutorial about how to improve depthwise GPU op in DL frameworks by 2x+ using #TVM @uwces @guestrin
1
14
51
@tqchenml
Tianqi Chen
5 years
One key lesson that I learned in open source is the Apache( @TheASF ) way of development. By keeping all discussions and decisions in public archive, we enable everyone's participation. While this approach brings a bit overhead, it is the best way to grow a healthy community.
0
4
51
@tqchenml
Tianqi Chen
2 years
Great to attend MLSys this year in person. CMU Catalyst group is also presenting several works at the conference. CoRa Tensor Compiler that generates optimized code for transformer workloads with ragged tensor. DietCode enables automatic dynamic tensor program optimizations.
1
2
51
@tqchenml
Tianqi Chen
1 month
#MLSys2024 day2 keynote, AI robustness and security in the Age of LLMs by @zicokolter
Tweet media one
2
4
52
@tqchenml
Tianqi Chen
4 months
Please help spread the words📢 If you are in the field of AI and interested in the latest innovations in machine learning and systems. You should checkout #MLSys24 !
@chrismdesa
Christopher De Sa
4 months
We are excited to announce this year’s keynote speakers for #MLSys2024 : Jeff Dean @JeffDean , Zico Kolter @zicokolter , and Yejin Choi @YejinChoinka ! MLSys this year will be held in Santa Clara on May 13–16. More details at .
1
17
82
1
13
49
@tqchenml
Tianqi Chen
7 years
Doing flexible deep learning in imperative #numpy way, shares similar spirit with @TorchML
@slashML
/MachineLearning
7 years
MinPy 0.30 Release - Imperative NumPy style code with MXNet backend
0
9
18
0
19
50
@tqchenml
Tianqi Chen
1 year
The same solution also brings universal support to any GPUs on linux/windows and Mac thanks to @VulkanAPI Here is a screenshot of CLI in action through vulkan. Checkout the page for more instructions
@tqchenml
Tianqi Chen
1 year
Running LLM-based chat locally on iphone 📱with GPU acceleration. Also brings universal deployment to NV/AMD/M1 GPUs. Love to see it enabling personal assistants for everyone. Try out demos
9
89
323
0
11
49
@tqchenml
Tianqi Chen
5 years
UW SAMPL will present a tutorial about TVM on June 22nd at @ISCAConfOrg FCRC. Come and learn about full-stack ML system, learning-based automation and compiler support for new deep learning accelerators.
1
15
48
@tqchenml
Tianqi Chen
7 months
👀a new deep learning framework for apple silicons :)
@awnihannun
Awni Hannun
7 months
Just in time for the holidays, we are releasing some new software today from Apple machine learning research. MLX is an efficient machine learning framework specifically designed for Apple silicon (i.e. your laptop!) Code: Docs:
100
717
4K
0
3
49
@tqchenml
Tianqi Chen
11 months
I had an enjoyable chat with @FanaHOVA and @swyx on machine learning systems and bringing high-performance universal deployment for ML models. Checkout the latest episode in latent space 👉
@FanaHOVA
Alessio Fanelli
11 months
H100s? 🤨 I'll just run LLaMA2 70B in the browser, thanks! @tqchenml came on @latentspacepod to talk about the work MLC is doing to enable everyone to run models natively on any hardware / software stack, including Chrome and iPhones (and AMD cards!) 🎙️
2
40
178
1
11
48