David Andrés 🤖📈🐍 Profile Banner
David Andrés 🤖📈🐍 Profile
David Andrés 🤖📈🐍

@daansan_ml

Followers
11,305
Following
404
Media
841
Statuses
9,323

📈 I summarise Machine Learning, NLP and Time Series concepts in an easy and visual way • 💊Follow me in 👉 Inquiries in david @mlpills .dev

Spain
Joined May 2022
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@daansan_ml
David Andrés 🤖📈🐍
16 days
Logistic Regression clearly explained 👇
Tweet media one
6
234
1K
@daansan_ml
David Andrés 🤖📈🐍
8 months
Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods. Let's find out more 🧵👇
Tweet media one
17
646
3K
@daansan_ml
David Andrés 🤖📈🐍
7 months
In Data Science you can find multiple data distributions... But where are they typically found? 🤔 This is part 1 - tomorrow I'll share the second one! Check it out 🧵👇
Tweet media one
14
362
2K
@daansan_ml
David Andrés 🤖📈🐍
7 months
There are several types of data distributions you might encounter in a dataset. Here are some common ones 👇🧵
Tweet media one
18
256
1K
@daansan_ml
David Andrés 🤖📈🐍
7 months
Is your data normal? 🤔 What I meant is if your data follows a normal distribution... Discover this elegant distribution 🧵👇
Tweet media one
13
265
1K
@daansan_ml
David Andrés 🤖📈🐍
15 days
Normal Distribution clearly explained 👇
Tweet media one
4
241
1K
@daansan_ml
David Andrés 🤖📈🐍
8 months
ARIMA is one of the most popular traditional statistical methods used for time series forecasting. THREAD 🧵 👇
Tweet media one
19
178
889
@daansan_ml
David Andrés 🤖📈🐍
6 months
ARIMA models have three parameters: 'p', 'q' and 'd'. They need to be optimized... but, before that, do you know how to interpret each of them? Learn what each of them mean here 🧵 👇
Tweet media one
13
198
811
@daansan_ml
David Andrés 🤖📈🐍
7 months
Where can you find the most common data distributions? (2nd part) Check this thread for real-world examples! 🧵 👇
Tweet media one
7
156
764
@daansan_ml
David Andrés 🤖📈🐍
6 months
Time Series data with seasonality? Split it into its main 3 components! Check an example here (code at the end) 👨‍💻 🧵 👇
Tweet media one
7
145
747
@daansan_ml
David Andrés 🤖📈🐍
6 months
Are you familiar with the most common Machine Learning algorithms? Today, I introduce 6 of the most commonly used ones! Check them out 🧵 👇
Tweet media one
10
193
730
@daansan_ml
David Andrés 🤖📈🐍
7 months
ARIMA models are essential in Time Series forecasting. You can add multiple components to make them fit your particular data: go from a basic AR model to a complex SARIMAX model! 🧵 👇
Tweet media one
17
163
715
@daansan_ml
David Andrés 🤖📈🐍
7 months
ARIMA is one of the most popular traditional statistical methods used for time series forecasting. THREAD 🧵 👇
Tweet media one
10
145
694
@daansan_ml
David Andrés 🤖📈🐍
11 months
ARIMA is one of the most popular traditional statistical methods used for time series forecasting. THREAD 🧵 👇
Tweet media one
21
145
680
@daansan_ml
David Andrés 🤖📈🐍
7 months
Volatility can be a big problem in Time Series forecasting! Be careful with it: ✅ Low volatility ❌ High volatility Learn how you can take it into account 🧵👇
Tweet media one
14
135
678
@daansan_ml
David Andrés 🤖📈🐍
6 months
Do you want to forecast seasonal time series data? Remove the seasonality and add it back at the end! That's basically what STL method does.
Tweet media one
7
140
656
@daansan_ml
David Andrés 🤖📈🐍
7 months
ARIMA is really useful for time series forecasting, however you can only forecast 1 variable at a time... VAR (Vector AutoRegression) solves this problem! Discover more 🧵 👇
Tweet media one
9
153
658
@daansan_ml
David Andrés 🤖📈🐍
6 months
Do you have outliers in your data? What should you do with them? 🤔 Here's a guide on effectively managing them 🧵 👇
Tweet media one
16
158
648
@daansan_ml
David Andrés 🤖📈🐍
1 year
How can you detect outliers? But first of all, what are outliers? 🤔 🧵 👇
Tweet media one
20
145
602
@daansan_ml
David Andrés 🤖📈🐍
7 months
⭐ Time Series is an essential skill in Data Science. You don't know where to start? Here you have a roadmap for you to start on the right foot! Have a look 👇 🧵
Tweet media one
13
148
595
@daansan_ml
David Andrés 🤖📈🐍
8 months
After fitting a Time Series model such as ARIMA, you should always check the 𝗿𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 to assess how well your model captures all the patterns in the data. See how to do it 👇
Tweet media one
9
130
586
@daansan_ml
David Andrés 🤖📈🐍
14 days
K-Nearest Neighbors clearly explained 👇
Tweet media one
6
134
575
@daansan_ml
David Andrés 🤖📈🐍
1 month
In Data Science you can find multiple data distributions... But where are they typically found? 🤔 This is part 1 - tomorrow I'll share the second one! Check it out 🧵👇
Tweet media one
4
112
571
@daansan_ml
David Andrés 🤖📈🐍
1 year
What is data normalization, and how can it be achieved? Let's find out more about this! 🧵 👇
Tweet media one
11
114
553
@daansan_ml
David Andrés 🤖📈🐍
7 months
What is data smoothing? ...and why may you need it? 🤔 Read this thread to learn more about it! 🧵 👇
Tweet media one
9
112
550
@daansan_ml
David Andrés 🤖📈🐍
7 months
Your data is possibly too noisy! You can try these 2️⃣ techniques to discover its trend, seasonality or even outliers! 🧵 👇
Tweet media one
10
98
543
@daansan_ml
David Andrés 🤖📈🐍
6 months
Having an imbalanced dataset is a problem. 😟 Discover SMOTE, it can help you deal with this! 🧵 👇
Tweet media one
32
87
537
@daansan_ml
David Andrés 🤖📈🐍
8 months
Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training. One essential aspect of data preprocessing is ✨feature scaling✨, which involves adjusting the range and distribution of the data. 🧵 👇
Tweet media one
7
106
533
@daansan_ml
David Andrés 🤖📈🐍
6 months
Do you want to identify outliers or find a global trend in your Time Series data? LOWESS may be what you are looking for! It means Locally Weighted Scatterplot Smoothing, and you can find out more about it here 🧵 👇
Tweet media one
10
100
533
@daansan_ml
David Andrés 🤖📈🐍
7 months
5 great courses to learn Time Series Analysis and Forecasting in Python 🧵👇👇👇
Tweet media one
11
121
527
@daansan_ml
David Andrés 🤖📈🐍
5 months
You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest. However, you need to reframe your problem as a Supervised Learning one. Learn here how to do it 🧵 👇
Tweet media one
9
109
524
@daansan_ml
David Andrés 🤖📈🐍
7 months
🚨Your data may be hiding a trend, seasonality or even outliers !! Let's learn 2️⃣ basic techniques to smooth your data and get rid of the noise 🧵 👇
Tweet media one
14
100
522
@daansan_ml
David Andrés 🤖📈🐍
8 months
Linear Regression is a fundamental algorithm in supervised Machine Learning used for predictive modeling. Learn more about it here 🧵 👇
Tweet media one
12
107
517
@daansan_ml
David Andrés 🤖📈🐍
8 months
Time Series Forecasting plays a crucial role in predicting future values based on historical patterns. However, most of the time, to achieve accurate and reliable results, one of the key prerequisites is working with stationary data. But, why is that? 🤔 🧵 👇
Tweet media one
5
87
510
@daansan_ml
David Andrés 🤖📈🐍
4 months
In the ARIMA methodology, the AR part stands for Auto-Regressive model. An AR model suggests that the current value of a time series is a linear combination of its previous values and a random error term. Let's find out more about it! 👇 🧵
Tweet media one
9
112
503
@daansan_ml
David Andrés 🤖📈🐍
8 months
You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest. However, you need to reframe your problem as a Supervised Learning one. Learn here how to do it 🧵 👇
Tweet media one
3
118
495
@daansan_ml
David Andrés 🤖📈🐍
7 months
Make sure your model is considering all your data features equally! Scaling can be your life saver! Learn how to do it when you have normally distributed features 🧵👇
Tweet media one
13
105
479
@daansan_ml
David Andrés 🤖📈🐍
6 months
Discover how Kernel Smoothing can discover hidden trends in your data! Do you know this Data Smoothing technique? Find out more here 🧵 👇
Tweet media one
7
104
482
@daansan_ml
David Andrés 🤖📈🐍
6 months
Stationarity is a property of a Time Series where its statistical features such as mean and variance remain constant over time. It's crucial for Time Series analysis because many statistical models assume stationarity for reliable forecasts. Find out how to check it 🧵👇
Tweet media one
12
119
464
@daansan_ml
David Andrés 🤖📈🐍
8 months
What is the difference between Classification and Regression in Machine Learning? 🤔 🧵 👇
Tweet media one
13
114
442
@daansan_ml
David Andrés 🤖📈🐍
6 months
Too much noise on your time series data? Looking for hidden trends? You may want to consider data smoothing. Here's when to use it 🧵 👇
Tweet media one
5
85
453
@daansan_ml
David Andrés 🤖📈🐍
7 months
In this week's 💊MLPills we talk about how to discover the Data Distribution of your dataset features. Join almost 5000 subscribers and don't miss any future issues... for free! (Check next tweet)
Tweet media one
3
95
445
@daansan_ml
David Andrés 🤖📈🐍
5 months
What is the difference between Classification and Regression in Machine Learning? 🤔 🧵 👇
Tweet media one
4
93
418
@daansan_ml
David Andrés 🤖📈🐍
5 months
ARIMA is one of the most popular traditional statistical methods used for time series forecasting. Let's understand its components 🧵 👇
Tweet media one
6
97
418
@daansan_ml
David Andrés 🤖📈🐍
2 months
You've trained your ARIMA model, but is it a good model? Today you'll learn how to evaluate the performance of your model. Also when to use each metric 🧵👇
Tweet media one
6
105
411
@daansan_ml
David Andrés 🤖📈🐍
9 days
Retrieval Augmented Generation (RAG) for LLM systems clearly explained 👇
Tweet media one
6
104
491
@daansan_ml
David Andrés 🤖📈🐍
6 months
How can you estimate a suitable value for 'p' in your ARIMA model? Here you have the definite guide! 🧵👇
Tweet media one
8
83
398
@daansan_ml
David Andrés 🤖📈🐍
6 months
Have you chosen the best model? You may want to check AIC and BIC. Let's explore what they are and how they can help in finding the optimal ARIMA model 🧵👇
Tweet media one
12
112
395
@daansan_ml
David Andrés 🤖📈🐍
7 months
XGBoost is powerful and very well-known. But it's not the absolute best for every single case... Find out how to choose between the best 3️⃣ algorithms for tabular data 🧵👇
Tweet media one
12
82
396
@daansan_ml
David Andrés 🤖📈🐍
6 months
Creating the right features for Time Series data can make a significant impact on the performance of your model. Today I'll introduce 2 key ones, essential for capturing the sequential aspect of time series! 🧵👇
Tweet media one
9
76
388
@daansan_ml
David Andrés 🤖📈🐍
5 months
Understanding feature importance in machine learning models is essential for interpreting their predictions. Today I'll share with you 2 methods to get it 🧵 👇
Tweet media one
8
82
384
@daansan_ml
David Andrés 🤖📈🐍
7 months
🚨NEVER split your data randomly! At least when working with Time Series data... Learn here what are the dangers of doing so 🧵 👇
Tweet media one
13
76
383
@daansan_ml
David Andrés 🤖📈🐍
7 months
Do you need to build an ARIMA model.... but you don't want the hassle of selecting the parameters to find the optimal model? 😟 Say hello to autoArima! It simplifies the process of selecting the best ARIMA model. 👇 🧵
Tweet media one
12
77
358
@daansan_ml
David Andrés 🤖📈🐍
7 months
What is the difference between seasonality and cyclicality in time series forecasting❓ Discover it below 👇 🧵
Tweet media one
6
91
359
@daansan_ml
David Andrés 🤖📈🐍
11 months
You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest. However, you need to reframe your problem as a Supervised Learning one. Learn here how to do it 🧵 👇
Tweet media one
11
81
356
@daansan_ml
David Andrés 🤖📈🐍
6 months
ACF and PACF are two important concepts in time series analysis, especially if what you need is an ARIMA model! Let's understand what they are🧵 👇
Tweet media one
9
88
350
@daansan_ml
David Andrés 🤖📈🐍
4 months
In time series analysis, the trend component is key. It indicates the directional movement of data over time. Let's learn more about the trend 👇🧵
Tweet media one
7
87
351
@daansan_ml
David Andrés 🤖📈🐍
3 months
After fitting a Time Series model such as ARIMA, you should always check the 𝗿𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 to assess how well your model captures all the patterns in the data. See how to do it 👇 🧵
Tweet media one
4
85
354
@daansan_ml
David Andrés 🤖📈🐍
7 months
Time Series analysis and forecasting is a really valuable skill to have in your Data Science toolkit. Here are 4️⃣ reasons WHY you should learn it... Do you agree? 🧵👇
Tweet media one
11
77
346
@daansan_ml
David Andrés 🤖📈🐍
4 months
In time series analysis and forecasting, the Moving Average (MA) model plays a crucial role within the ARIMA framework. Let's delve into what it entails! 👇 🧵
Tweet media one
5
91
341
@daansan_ml
David Andrés 🤖📈🐍
10 months
Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training. One essential aspect of data preprocessing is ✨feature scaling✨, which involves adjusting the range and distribution of the data. 🧵 👇
Tweet media one
7
76
332
@daansan_ml
David Andrés 🤖📈🐍
10 months
Discover one of the most used feature scaling techniques: ✨Min-Max Scaling✨ 🧵 👇
Tweet media one
12
59
324
@daansan_ml
David Andrés 🤖📈🐍
6 months
Which value of "d" should you choose for your ARIMA model? Today I present an easy method to find it! 🧵 👇
Tweet media one
6
78
322
@daansan_ml
David Andrés 🤖📈🐍
7 months
Generating or engineering features from Time Series data when using an ML approach involves extracting meaningful information that can be used by algorithms to understand patterns, make predictions, or identify trends. Here are some feature engineering techniques 🧵👇
Tweet media one
15
90
325
@daansan_ml
David Andrés 🤖📈🐍
11 months
Prophet is an open-source library developed by Facebook for Time Series Forecasting and has many advantages. Find 6️⃣ of them below 🧵 👇
Tweet media one
7
59
318
@daansan_ml
David Andrés 🤖📈🐍
11 months
What is missing data? Missing data refers to the absence of values in a dataset where they are expected. It can arise from various reasons, such as: ▶️Data Entry Errors: Human errors during data entry can lead to missing values. For instance, someone might forget to fill in a
Tweet media one
10
87
318
@daansan_ml
David Andrés 🤖📈🐍
3 months
What is the difference between seasonality and cyclicality in time series forecasting❓ Discover it below 👇 🧵
Tweet media one
3
73
323
@daansan_ml
David Andrés 🤖📈🐍
4 months
Permutation Importance and SHAP are two model-agnostic techniques employed in machine learning for estimating the importance of features within models. Let's compare these 2 techniques 🧵👇
Tweet media one
5
86
325
@daansan_ml
David Andrés 🤖📈🐍
6 months
Does my data have a Unit Root? What is that and why it is important in Time Series forecasting? 🧵👇
Tweet media one
9
69
318
@daansan_ml
David Andrés 🤖📈🐍
5 months
Would you like to create and train a neural network using TensorFlow and Keras? You can find the main steps to achieve a simple version of this here 👇 1⃣ Begin by importing the necessary modules: - Sequential to define a linear stack of network layers - Dense for fully
Tweet media one
4
75
320
@daansan_ml
David Andrés 🤖📈🐍
1 year
What is data smoothing? ...and why may you need it? 🤔 Read this thread to learn more about it! 🧵 👇
Tweet media one
17
71
313
@daansan_ml
David Andrés 🤖📈🐍
1 month
There are several types of data distributions you might encounter in a dataset. Here are some common ones 👇🧵
Tweet media one
3
73
321
@daansan_ml
David Andrés 🤖📈🐍
8 months
When evaluating the performance of Time Series forecasting models, several metrics can be used to assess their accuracy and predictive power. Here are 4️⃣ of the most used metrics for time series forecasting 🧵 👇
Tweet media one
14
81
307
@daansan_ml
David Andrés 🤖📈🐍
4 months
How can you assess whether your ARIMA model is good or not? One way is checking the "summary" that the statsmodels library offers you 👇 🧵
Tweet media one
6
72
313
@daansan_ml
David Andrés 🤖📈🐍
5 months
Doing feature engineering for your Time Series data? Here is an interesting technique: "Time Since an Event" 🧵 👇
Tweet media one
5
62
307
@daansan_ml
David Andrés 🤖📈🐍
8 months
Cleaning your data before building your Time Series model is crucial. Learn how to do it, step by step 🧵👇
Tweet media one
8
68
309
@daansan_ml
David Andrés 🤖📈🐍
11 months
Yesterday we released a new article: "How to forecast Time Series data using XGBoost?" 🤔 Discover it below 👇
Tweet media one
16
64
304
@daansan_ml
David Andrés 🤖📈🐍
6 months
Are you familiar with the most common Machine Learning algorithms? Today, I will complete the Top 10 of the most commonly used ones! Check them out 🧵 👇
Tweet media one
6
49
302
@daansan_ml
David Andrés 🤖📈🐍
5 months
How can you estimate the value of the MA term - q - in your ARIMA model? Here you have a step-by-step guide! 🧵👇
Tweet media one
7
68
301
@daansan_ml
David Andrés 🤖📈🐍
7 months
Do you know that you can separate trend and seasonality in your time series data? Two popular decomposition methods are Seasonal Decompose and STL (Seasonal-Trend decomposition using LOESS). Let's find out more about them 🧵👇
Tweet media one
9
54
296
@daansan_ml
David Andrés 🤖📈🐍
8 months
Last week I heard about the "Fuzzy Time Series"... I had never heard about that before, so I researched it. Here's what I found 🧵👇
Tweet media one
5
60
287
@daansan_ml
David Andrés 🤖📈🐍
4 months
What is the seasonal component in time series analysis? Let's break it down! 👇🧵
Tweet media one
2
74
285
@daansan_ml
David Andrés 🤖📈🐍
1 year
Cleaning your data before building your Time Series model is crucial. Learn how to do it, step by step 🧵👇
Tweet media one
8
79
275
@daansan_ml
David Andrés 🤖📈🐍
9 months
In Time Series Analysis and Forecasting, a base model is often a simple model used as a benchmark to compare the performance of more complex models. Last time we talked about Simple Average... Let's introduce now Moving Average (MA)! 🧵 👇
Tweet media one
10
62
275
@daansan_ml
David Andrés 🤖📈🐍
7 months
What are the steps of any Data Science project? 1️⃣ Define the problem or question to be answered: Clearly articulate the problem you aim to solve or the question you want to address. 2️⃣ Gather and understand the data: Collect relevant data and gain a thorough understanding of
Tweet media one
9
69
271
@daansan_ml
David Andrés 🤖📈🐍
5 months
Permutation importance is a model-agnostic technique used to assess the importance of features in a model. This method involves systematically shuffling each feature's values one at a time and measuring the resulting change in model performance.
Tweet media one
12
55
265
@daansan_ml
David Andrés 🤖📈🐍
3 months
Today I'll introduce 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 🤖 A useful Machine Learning algorithm that Data Scientists frequently use for both classification and regression problems. Read more about it 🧵 👇
Tweet media one
5
64
261
@daansan_ml
David Andrés 🤖📈🐍
8 months
Cosine similarity is a handy method to find two items' similarities. Widely used in NLP and in Recommendation Systems. Let's explain it by using a simple example of a content-based recommender system of books 🧵 👇
Tweet media one
5
58
261
@daansan_ml
David Andrés 🤖📈🐍
7 months
Decision Trees is a key model in Machine Learning for both classification and regression. 🌳 They use a tree structure for decision-making processes (hence the name). Find out more about its components 🧵 👇
Tweet media one
7
44
254
@daansan_ml
David Andrés 🤖📈🐍
6 months
Your models may be impacted by outliers! 🚨 From where may these outliers be coming? Let's find out the possible sources 🧵 👇
Tweet media one
8
66
256
@daansan_ml
David Andrés 🤖📈🐍
1 year
What are the steps of any Data Science project? 1️⃣ Define the problem or question to be answered: Clearly articulate the problem you aim to solve or the question you want to address. 2️⃣ Gather and understand the data: Collect relevant data and gain a thorough understanding of
Tweet media one
11
62
246
@daansan_ml
David Andrés 🤖📈🐍
7 months
Looking to predict one Time Series variable based on another? Will it be beneficial? ✅ Or not? ❌ You should first check Granger causality. Check this out👇🧵
Tweet media one
8
52
242
@daansan_ml
David Andrés 🤖📈🐍
8 months
Would you like to create and train a neural network using TensorFlow and Keras? You can find the main steps to achieve a simple version of this here 👇 1⃣ Begin by importing the necessary modules: - Sequential to define a linear stack of network layers - Dense for fully
Tweet media one
13
62
248
@daansan_ml
David Andrés 🤖📈🐍
1 year
Time to introduce the ✨𝗥𝗼𝗼𝘁 𝗠𝗲𝗮𝗻 𝗦𝗾𝘂𝗮𝗿𝗲𝗱 𝗘𝗿𝗿𝗼𝗿✨, another really useful error metric for Time Series and Machine Learning! Check this out if you are a Data Scientist! 🧑‍💻 🧵 👇
Tweet media one
10
50
244
@daansan_ml
David Andrés 🤖📈🐍
8 months
ARIMA models with more than 1 variable? I introduce you to the ARIMAX models! 🧵 THREAD🧵 👇
Tweet media one
4
53
241
@daansan_ml
David Andrés 🤖📈🐍
6 months
Using an ML approach like an XGBoost model to forecast Time Series Data? Extract the maximum information from the date 👇 Read more in the post below!
Tweet media one
5
66
244
@daansan_ml
David Andrés 🤖📈🐍
3 months
Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data? The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results! Let's learn more about it 🧵 👇
Tweet media one
5
61
241
@daansan_ml
David Andrés 🤖📈🐍
1 month
Where can you find the most common data distributions? (2nd part) Check this thread for real-world examples! 🧵 👇
Tweet media one
4
63
238
@daansan_ml
David Andrés 🤖📈🐍
11 months
There is a kind of Neural Network that can be very useful to forecast Time Series data. These are called Recurrent Neural Networks or RNN. This type of neural network are especially designed to process sequential data, where the order of the data points is crucial, like Time
Tweet media one
8
71
235
@daansan_ml
David Andrés 🤖📈🐍
7 months
Build an optimal ARIMA model efficiently. That's what you can achieve with the Box-Jenkins method. From raw data to a production-ready model step-by-step 🧵👇
Tweet media one
6
47
238