Mathematics for Machine Learning -- a 47-page introduction from UC Berkeley 🚀
• Linear Algebra
• Calculus and Optimization
• Probability
A 100% free resource!
Source:
Today is my first day at
@NVIDIAAI
! 🥳
-From learning to code at 29
-through learning ML
@fastdotai
-winning a
@kaggle
competition
-jobs at 🔥 startups
-moving continents thx to AI
-to joining the illustrious Merlin team ❤️
I am beyond grateful 🙏
Will make this one count!
Favorite recent jupyter notebook discovery - the %debug magic:
1. Get an exception.
2. Insert a new cell, type %debug and run it.
An interactive debugger will open bringing you to where the exception occurred and allowing you to look around!
To understand the foundations of NLP (pre-Transformers), where would you go?
This 48-page paper is the answer 🤩
✅ concise and clear explanations
✅ sklearn, spacy, and keras code snippets
✅ all the fundamentals of NLP in a single place
Merlin Dataloader is 119x faster than my own PyTorch Dataset + Dataloader combo!
This is revolutionary for tabular data 🥳
Let's take a closer look at what is going on.
How does LangChain actually work?
We see the wonderful things it can do, but what does it send to the model?
What does the model send back?
How does it all work?
I decided to investigate 🕵️♂️
Here is how LangChain allows LLMs to perform Google searches:
Curious about recommender models?
Interested in endowing models from other domains with some of their superpowers?
Please join me on a whirlwind tour of 6 recsys architectures!
>> a thread 🧵 <<
Linear Algebra -- an introductory course to Mathematics at the heart of Machine Learning
✅ 37 bite-sized videos (< 10 minutes)
✅ stellar visualizations
✅ expertly delivered by lecturers from Imperial College London
Link:
How to speed up your tabular data processing by 1053x
A tutorial on how to vectorize a complex operation in pandas/cudf using a boolean mask
Bonus at the end: how to seamlessly run on the GPU with arbitrarily large data
1/19
Interested in working with sound in Python? 🎶🎙️🥁
now has full colab support - you can run the notebooks at a click of a button 🙂
Also, new addition - how to work with large, multi-gigabyte wav files
Two techniques to speed up your Python code by 2 - 200x:
1. Use list comprehensions (they benefit from optimizations that for loops do not).
2. Use sets when checking for membership.
Programming has changed forever 😳
Today I used GPT-4 for the first time and I can't imagine coding without it ever again.
Here is the project I worked on:
I switched to vscode for running Jupyter Notebooks so that I can use
@github
Copilot and wow 🥰
(that is coming from a vim/tmux/ jupyter notebook die-hard fan)
Here is my experience:
1yr ago I gave up on ML. I didn't know what to learn nor how
After a 5 mths break I decided to give ML one last try. If it would not work out I would need to let it go to not continue to waste my time - maybe I am unable to learn this
I then signed up for the
@fastdotai
course
I am launching a new blog -- TabularMusings 🥳
Here is the first blog post:
And here is the technology I am using and the reasons for starting the blog:
3 surprisingly effective techniques for training Computer Vision models I used to win a
@kaggle
competition
Here is how you can apply them in your projects:
The most important skills for Machine Learning:
• Python
• Linux (env setup, ssh, moving files, editor)
• git
• pdb
• creating good train - val - test splits
• ability to scan papers for relevant information
• a learner mindset
• clear writing
Anything I missed?
Learning embedding space for location?! Turns out the answer is... yes! 🙂
Two absolutely fascinating reads with a lot of good information on approaching DL projects in general.
The first one is a blog post by
@sentiance
.
How to win
@kaggle
:
✅ join a competition early
✅ read forums daily
✅ make small improvements every day
✅ find validation split that tracks LB
✅ posts by top kagglers will take you 80% of the way
✅ papers, blog posts, creativity 👉remaining 20%
✅ ensemble results
1yr ago I gave up on ML. I didn't know what to learn nor how
After a 5 mths break I decided to give ML one last try. If it would not work out I would need to let it go to not continue to waste my time - maybe I am unable to learn this
I then signed up for the
@fastdotai
course
12 months after starting the
@fastdotai
Deep Learning for Coders course, a little bit less after joining my first competition, I am now a
@kaggle
competition master.
✍️Going From Not Being Able To Code To Deep Learning Expert
✅ what are the notes to hit when learning to program
✅ how to practice Deep Learning effectively
A detailed guide based on my experience:
THREAD: The hardest things about learning deep learning online (based on personal experience):
✅ outmaneuvering thousands of engineers at Reddit, Netflix, Twitter, etc, fighting for your attention
✅ carrying on through long periods of time where you don't notice progress
What a cool project!
Plus the code is available and there is a step by step walkthrough in the Jupyter NB 🤩
Not sure how it is possible, but the blog post is even better 😁
Model training is done with
@fastdotai
!
What a treasure! Added to my reading list.
I used Deep Learning to create a Real Time Sign Language Classifier that runs on the webcam. The whole project is done from scratch.
You can find all the code in my github repo:
Here is a blog post on the same topic:
: 😃😃
The last job I used a CV to get was a developer role from 6 years ago.
I then did the
@fastdotai
courses.
I followed the advice and started to blog.
Roles started to find me.
And they were outstanding.
What makes such a big difference here?
nvt op Tuesday! 🤖
Remember those
@kaggle
kernels where a person goes meticulously though the hundreds of columns in a DataFrame to reduce their size?
What if I told you there exists an automated way to do just that? 😎
🧵 How to become a Machine Learning Engineer without putting in the work?
This thread contains everything I know on the subject.
All the tips and tricks I learned over the last 8 years.
Here it is.
Meta Learning is out! 🚀🥳
Above all, thank you for the warmth and support that you have shown me here on Twitter. That means the world to me and is completely out of this world 🥰
If you would like to continue helping me, any feedback would be greatly appreciated 😊
TIL I learned about log-mean-exp pooling!
Max pool can produce very sparse gradients making it hard for the network to learn. Avg pool is sometimes not applicable to the task.
LME pooling allows you to find a spot somewhere between the two extremes
THREAD: Can you start learning cutting-edge deep learning without specialized hardware? 🤖
In this thread, we will train an advanced Computer Vision model on a challenging dataset. 🐕🐈 Training completes in 25 minutes on my 3yrs old Ryzen 5 CPU.
Let me show you how...
How to train your TensorFlow or PyTorch RecSys models 400x faster?
Use Merlin Dataloaders 😄
Two new examples just got merged! 🥳
✅ TensorFlow:
✅ PyTorch:
The law of working on machine learning projects:
✅ you are unable to tell if a problem can be solved until you build a baseline
✅ any time estimates you make before building a baseline are fortune-telling
This is my favorite tweet of all time 🙂I got quite a few new followers so thought I'd share it with you
One thing I would add is that `l` and `ll` give you context around the line where exception was raised.
Favorite recent jupyter notebook discovery - the %debug magic:
1. Get an exception.
2. Insert a new cell, type %debug and run it.
An interactive debugger will open bringing you to where the exception occurred and allowing you to look around!
What does docker give you as a Data Scientist?
• reproducibility?
• ease of switching between envs?
• anything else?
I keep meeting people gaslighted into using docker where it doesn't make any sense in their situation.
Thinking of writing about this.
My second week at
@NVIDIA
is over! The most amazing thing is that everyone I run into (HR, IT, teammates) is both
✅ great at what they do
✅ very friendly
I am starting to suspect this is by design 😄
And yeah, there are GPUs 😻
A junior data scientist begins working on a project by tweaking the architecture to improve results.
A senior data scientist starts by learning about the business problem that the model is attempting to solve.
I am in love with Optuna 😍
This is one of the least BS libraries I have ever come across.
"You want your parameters optimized? Good.
Because that is what we do"
A tutorial on candidate generation (retrieval) with Merlin Models coming soon 🙂
I finished watching
@fastdotai
v4 part 1 lectures and it is the best
@fastdotai
course yet! 🤗
Highlights:
✅amazing intro to history of DL
✅THE learning path -> create your own dataset, train a model, build an app, deploy
✅unbelievably good ethics lecture by
@math_rachel
I have never been too excited about
@spacy_io
, but OMG have I been wrong!
I'm only starting to use it and it has the nicest human friendly API I have ever come across!
Have you seen spacy.explain?! How cool is that!
And then there is the amazing course:
What is the best way to get started with
@PyTorch
?
This tutorial by
@jeremyphoward
!
✅ implement a fully-fledged NN from scratch
✅ learn foundational Deep Learning concepts along the way
✅ get exposed to best practices (set_trace, etc)
Link:
Object detection using
@fastai
with only 300 examples in the training set.
Total train time: 3 min
Please note the model definition - again very few lines of code. Annotated images myself - details in the NB. Shown in red are predictions.
The
@kaggle
OTTO RecSys competition is underway! 🥳🚀
I share everything that I know
So far:
• 13x 🥇 medals for posts
• 4x 🥇 medals for kernels
If you'd like to jump into RecSys and be guided by me, I've put this post for you:
see you there! 🙌🙂
If you were looking for an introduction to modern ML techniques with practical examples, which book should you choose? 🤔
I had the pleasure to review the new book by
@rasbt
and I believe it to be just the answer 🙂
And a very good one at that!
Here is what I found
>> a 🧵 <<
A new
@kaggle
competition just launched! 🥳
New Playground Series, first in 2023!🚀
Nice, clean tabular dataset where feature engineering will most likely shine ❤️🔥
If you would like to jump right in, I created a starter notebook:
Happy Kaggling! 🙂
I love peering into the future and watching the GTC keynote is my favorite way to do so!
Here are my highlights from the keynote from a few hours ago.
Lots of amazing developments so buckle up and let's go for a ride! 🙂
1 of 15
Probably the best, most actionable advice for getting started on
@kaggle
I have ever come across.
From a 4x
@kaggle
GrandMaster and teacher extraordinaire of all ML things and beyond, Chris Deotte:
This guest lecture by
@jeremyphoward
was a great watch.
A very thorough overview of novel techniques essential to doing deep learning well.
Awesome refresher plus I picked up a couple new things that seem extremely useful!
Would you like to study
@fastdotai
v4 with me? I created this little companion app while going through the course you can find here:
Currently only lec 1 & 2 but will be releasing a new lecture every Monday. Fun fact: the lectures do not end at 7! 😄
Information on recommender systems is hard to come by
But did you know that
@eugeneyan
has put together a list of
✅ 68 RecSys papers and articles
✅ 57 papers and articles on Search and Ranking
This is amazing 🥳🍾 thank you so very much for this!!!
What I thought becoming employable in deep learning would be:
✅learning a lot of math
What it ended up being:
✅learning how to talk about your work
✅being able to point to the things you've done
✅figuring out how to use social media and limiting the negative footprint it has
let me please share with you a full docker setup for serving a
@fastai
model. The repository includes:
✅jupyter NB for training and saving the model
✅starlette endpoint performing inference
✅rails frontend
all < 40 lines of code (excl HTML + train)
I just gained a superpower 🥳
Thank you so much
@fastdotai
for execnb!🙏🙏🙏
I can now:
• treat my notebooks as inputs
• modify code/set variables ➡️ run experiments
• create data flows as I please
OMG can't wait to see what I will build with this 😄
A dream come true 🥰🙏
What my colleagues have achieved is amazing 🙂
• 683x speedup vs CPU
• 43x speedup vs single A100-80GB GPU
Key: use the high-memory bandwidth of the GPU for embedding lookups
Available with just three lines of Python code (TF2) 🙂
Read more here:
The world is conspiring to keep you debt-laden and working longer than you physically can.
In the process, you miss out on life.
Learn a high-value skill (machine learning, programming, design) to set yourself free.
This is neat 😊
✅Train a model
✅Realize you want some functionality the framework doesn't provide
✅Define said functionality in your notebook
✅🥳🎉💃
(this is using
@fastdotai
v2)
This talk between Jensen Huang and
@ilyasut
from two days ago is a masterclass on what makes ChatGPT tick!
And an awesome first-hand account of Deep Learning history.
@ilyasut
shared 2 core ideas behind
@OpenAI
:
THREAD: What is SHAP?
SHAP = SHapely Additive Explanations of model predictions on 𝗮 𝗽𝗮𝗿𝘁𝗶𝗰𝘂𝗹𝗮𝗿 𝗲𝘅𝗮𝗺𝗽𝗹𝗲
They answer the question:
How much each feature contributed to the prediction 💡
Here is how SHAP does its magic.
The
@kaggle
Amex competition just ended 🥳
With it comes this awesome write-up by
@ChrisDeotte
who earned (another 😄) 🥇:
• only two but very different archs
• leak-free ensembling and feature selection
And here is the kicker...
Today I am opening my upcoming book for pre-orders! 🥳
👉👈
20 chapters and 15 000 words so far 🙂
Each chapter received at least a single rewrite. Still a long way to go🙂
The price will increase from $12.5 to 25$ upon release.
Can one deliver a RecSys masterclass with a focus on
• online vs batch predictions
• monitoring: data distribution shifts
• model deployment
in 25 minutes? 🤔
Apparently,
@chipro
can! 🙂 A must-watch
Link to video:
If you are a data scientist, introduce yourself below 👇
This is an awesome community of people working with data 😊
Let's say hi and learn from each other!
Received my copy today 🥰
Not only the how, but more importantly the why that will take you from your first encounter with deep learning to state of the art across so many domains.
What an amazing book!!!
Came across this really well written intro to HTML & CSS (covers more advanced concepts like flexbox and responsive design as well).
If you are starting to learn web dev or would like to brush up on some of the concepts, this seems like a great resource
I just created my first ever Python pkg in 50 mins.
Pre-
@fastdotai
I would read docs, not understand much, write 1 line of code, read more docs, run out of time, get discouraged.
Now I skim docs, skim tutorial, copy license, copy and edit , hack readme,
A lot of very valuable thoughts packed into 21 minutes by
@martinfowler
✅what is 'agile' really?
✅what is the role of a developer in a well functioning org?
✅what are the ethics of being a programmer?
✅is software industry a meritocracy?
The most important Machine Learning skill:
👉 How to create a good validation set. 👈
But most people make a couple of basic mistakes.
Read this, and I guarantee you won't be one of them:
>> a thread 🧵 <<
This is so awesome I need to preserve this for posterity
(AKA myself a week from now trying to remember how to do this...) 😄
Excel style conditional formatting using pandas 🤩
+ font-size control
From
@fastdotai
's fastbook:
Taking MOOCs changed my life.
• they reignited my passion for learning
• gave me the confidence I can learn complex things on my own
• taught me marketable skills
MOOCs mattered to me.
Be careful before dismissing their enormous potential.
How to build your personal website and serve it using GithubPages for free:
👉
👉
My new personal site:
👉
built without writing a single line of CSS using 👇in around 2 hours
mvp.css is a nice easy way to quickly make a decent-looking web page, without even needing any class attributes. View the source of this page to see how clean the HTML is.
Extremely big heartfelt thank you to everyone who made this possible ❤️
I started taking
@fastdotai
courses thinking I could achieve anything I applied myself to on my own.
But that was a misconception.
Your are only as strong as the people around you.
I ran into SLURM but its docs are written for MLOps engs with 20 years of experience.
So I wrote a blog post from the perspective of someone who does devops only when they need to.
SLURM is not scary at all! 😊
Maybe some of this can be of help to you!
🎥Modern Artificial Intelligence 1980s - 2021 by
@SchmidhuberAI
!
This talk delivers! 🙂
✅ starts with the Big Bang (literally)
✅ history of everything explained in the first 10 minutes
✅ only accelerates from there 🙂
👉
Here is what I learned...
What is the one resource I would recommend for anyone getting into RecSys?
This lecture by
@xamat
.
• it covers several foundational methods
• more importantly, it will teach you how to think about RecSys problems
Here are a couple of highlights: