Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods.
Let's find out more 🧵👇
In Data Science you can find multiple data distributions...
But where are they typically found? 🤔
This is part 1 - tomorrow I'll share the second one!
Check it out 🧵👇
ARIMA models have three parameters: 'p', 'q' and 'd'.
They need to be optimized... but, before that, do you know how to interpret each of them?
Learn what each of them mean here 🧵 👇
ARIMA models are essential in Time Series forecasting.
You can add multiple components to make them fit your particular data:
go from a basic AR model to a complex SARIMAX model! 🧵 👇
Volatility can be a big problem in Time Series forecasting!
Be careful with it:
✅ Low volatility
❌ High volatility
Learn how you can take it into account 🧵👇
ARIMA is really useful for time series forecasting, however you can only forecast 1 variable at a time...
VAR (Vector AutoRegression) solves this problem!
Discover more 🧵 👇
⭐ Time Series is an essential skill in Data Science.
You don't know where to start?
Here you have a roadmap for you to start on the right foot!
Have a look 👇 🧵
After fitting a Time Series model such as ARIMA, you should always check the 𝗿𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 to assess how well your model captures all the patterns in the data.
See how to do it 👇
In Data Science you can find multiple data distributions...
But where are they typically found? 🤔
This is part 1 - tomorrow I'll share the second one!
Check it out 🧵👇
Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training.
One essential aspect of data preprocessing is ✨feature scaling✨, which involves adjusting the range and distribution of the data.
🧵 👇
Do you want to identify outliers or find a global trend in your Time Series data?
LOWESS may be what you are looking for!
It means Locally Weighted Scatterplot Smoothing, and you can find out more about it here 🧵 👇
You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest.
However, you need to reframe your problem as a Supervised Learning one.
Learn here how to do it 🧵 👇
Time Series Forecasting plays a crucial role in predicting future values based on historical patterns.
However, most of the time, to achieve accurate and reliable results, one of the key prerequisites is working with stationary data.
But, why is that? 🤔
🧵 👇
In the ARIMA methodology, the AR part stands for Auto-Regressive model.
An AR model suggests that the current value of a time series is a linear combination of its previous values and a random error term.
Let's find out more about it! 👇 🧵
You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest.
However, you need to reframe your problem as a Supervised Learning one.
Learn here how to do it 🧵 👇
Make sure your model is considering all your data features equally!
Scaling can be your life saver!
Learn how to do it when you have normally distributed features 🧵👇
Stationarity is a property of a Time Series where its statistical features such as mean and variance remain constant over time.
It's crucial for Time Series analysis because many statistical models assume stationarity for reliable forecasts.
Find out how to check it 🧵👇
In this week's 💊MLPills we talk about how to discover the Data Distribution of your dataset features.
Join almost 5000 subscribers and don't miss any future issues... for free!
(Check next tweet)
You've trained your ARIMA model, but is it a good model?
Today you'll learn how to evaluate the performance of your model.
Also when to use each metric 🧵👇
Have you chosen the best model?
You may want to check AIC and BIC.
Let's explore what they are and how they can help in finding the optimal ARIMA model 🧵👇
XGBoost is powerful and very well-known.
But it's not the absolute best for every single case...
Find out how to choose between the best 3️⃣ algorithms for tabular data 🧵👇
Creating the right features for Time Series data can make a significant impact on the performance of your model.
Today I'll introduce 2 key ones, essential for capturing the sequential aspect of time series! 🧵👇
Understanding feature importance in machine learning models is essential for interpreting their predictions.
Today I'll share with you 2 methods to get it 🧵 👇
Do you need to build an ARIMA model.... but you don't want the hassle of selecting the parameters to find the optimal model? 😟
Say hello to autoArima!
It simplifies the process of selecting the best ARIMA model.
👇 🧵
You can forecast Time Series data using a Machine Learning algorithm like XGBoost or Random Forest.
However, you need to reframe your problem as a Supervised Learning one.
Learn here how to do it 🧵 👇
After fitting a Time Series model such as ARIMA, you should always check the 𝗿𝗲𝘀𝗶𝗱𝘂𝗮𝗹 𝗱𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 to assess how well your model captures all the patterns in the data.
See how to do it 👇 🧵
Time Series analysis and forecasting is a really valuable skill to have in your Data Science toolkit.
Here are 4️⃣ reasons WHY you should learn it...
Do you agree? 🧵👇
In time series analysis and forecasting, the Moving Average (MA) model plays a crucial role within the ARIMA framework.
Let's delve into what it entails! 👇 🧵
Data preprocessing is a crucial step in the machine learning pipeline, ensuring that the dataset is ready for training.
One essential aspect of data preprocessing is ✨feature scaling✨, which involves adjusting the range and distribution of the data.
🧵 👇
Generating or engineering features from Time Series data when using an ML approach involves extracting meaningful information that can be used by algorithms to understand patterns, make predictions, or identify trends.
Here are some feature engineering techniques 🧵👇
What is missing data?
Missing data refers to the absence of values in a dataset where they are expected.
It can arise from various reasons, such as:
▶️Data Entry Errors: Human errors during data entry can lead to missing values. For instance, someone might forget to fill in a
Permutation Importance and SHAP are two model-agnostic techniques employed in machine learning for estimating the importance of features within models.
Let's compare these 2 techniques 🧵👇
Would you like to create and train a neural network using TensorFlow and Keras?
You can find the main steps to achieve a simple version of this here 👇
1⃣ Begin by importing the necessary modules:
- Sequential to define a linear stack of network layers
- Dense for fully
When evaluating the performance of Time Series forecasting models, several metrics can be used to assess their accuracy and predictive power.
Here are 4️⃣ of the most used metrics for time series forecasting
🧵 👇
Are you familiar with the most common Machine Learning algorithms?
Today, I will complete the Top 10 of the most commonly used ones!
Check them out 🧵 👇
Do you know that you can separate trend and seasonality in your time series data?
Two popular decomposition methods are Seasonal Decompose and STL (Seasonal-Trend decomposition using LOESS).
Let's find out more about them 🧵👇
In Time Series Analysis and Forecasting, a base model is often a simple model used as a benchmark to compare the performance of more complex models.
Last time we talked about Simple Average...
Let's introduce now Moving Average (MA)! 🧵 👇
What are the steps of any Data Science project?
1️⃣ Define the problem or question to be answered: Clearly articulate the problem you aim to solve or the question you want to address.
2️⃣ Gather and understand the data: Collect relevant data and gain a thorough understanding of
Permutation importance is a model-agnostic technique used to assess the importance of features in a model.
This method involves systematically shuffling each feature's values one at a time and measuring the resulting change in model performance.
Today I'll introduce 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 🤖
A useful Machine Learning algorithm that Data Scientists frequently use for both classification and regression problems.
Read more about it 🧵 👇
Cosine similarity is a handy method to find two items' similarities.
Widely used in NLP and in Recommendation Systems.
Let's explain it by using a simple example of a content-based recommender system of books 🧵 👇
Decision Trees is a key model in Machine Learning for both classification and regression. 🌳
They use a tree structure for decision-making processes (hence the name).
Find out more about its components 🧵 👇
What are the steps of any Data Science project?
1️⃣ Define the problem or question to be answered: Clearly articulate the problem you aim to solve or the question you want to address.
2️⃣ Gather and understand the data: Collect relevant data and gain a thorough understanding of
Looking to predict one Time Series variable based on another?
Will it be beneficial? ✅ Or not? ❌
You should first check Granger causality.
Check this out👇🧵
Would you like to create and train a neural network using TensorFlow and Keras?
You can find the main steps to achieve a simple version of this here 👇
1⃣ Begin by importing the necessary modules:
- Sequential to define a linear stack of network layers
- Dense for fully
Time to introduce the ✨𝗥𝗼𝗼𝘁 𝗠𝗲𝗮𝗻 𝗦𝗾𝘂𝗮𝗿𝗲𝗱 𝗘𝗿𝗿𝗼𝗿✨, another really useful error metric for Time Series and Machine Learning!
Check this out if you are a Data Scientist! 🧑💻
🧵 👇
Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data?
The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results!
Let's learn more about it 🧵 👇
There is a kind of Neural Network that can be very useful to forecast Time Series data. These are called Recurrent Neural Networks or RNN.
This type of neural network are especially designed to process sequential data, where the order of the data points is crucial, like Time
Build an optimal ARIMA model efficiently.
That's what you can achieve with the Box-Jenkins method.
From raw data to a production-ready model step-by-step 🧵👇