Latest preprint gathering some of our papers using affine spline to dive into AI models. The math is exact, intuitive and actionable...allowing us to derive new methods that improved SOTAs. Dive in if you want to make AI less of a trial-and-error science!
Deep Neural Networks are powerful... but how do you provably enforce some constraints into them? With
@ylecun
we introduce POLICE a simple method that does just that provably without sampling or changes in your loss/training (and it uses affine splines)!
Do you want a self-supervised learning based kernel (and embedding) of your data without training a deep network? Here it is... SimCLR and VICReg in the kernel regime (no training), to be used whenever training a deep network is not an option!
Keep training your Deep Network past the point of perfect training set accuracy and its robustness will increase. Why? Because the spline partition keeps concentrating near the decision boundary ➡️the DN is affine all around the training samples!
-A new paper explaining batch norm !
-Please... there are already >>1 papers showing it helps optimization,
@ylecun
even said it in 1998
-Wait! it does more and can be studied from a spline viewpoint e.g. batch norm fits a random weight DN to your data!
⬇️
If you train your AI system without labels, SSL is probably what you will end up using. But you might hit many walls along the way as SSL builds upon decades of research. To help, we compiled this guide:
whether you train/deploy/research, give it a read!
Everything you ever wanted to know about Self-Supervised Learning but were afraid to ask.
A giant cookbook of SSL recipes.
By a large crowd from Meta-FAIR with various academic collaborators led by
@randall_balestr
and Mark Ibrahim.
HUGE diff between Decision Tree (any variant) and Deep Network explaining generalization/extrapolation: DTs only partition the space where there is training data, DNs also partition the space where there is no training data by extrapolating the subdivision
If you don't use self-supervised learning today, it can only be due to two reasons:
1. you did not hear of SSL yet
2. you don't have time/GPUs to use it
Now that we resolved 1., check out our latest preprint where we resolve 2. !!
Very happy to *finally* share our latest findings with
@ylecun
tying different SSL methods to known spectral embedding methods (in addition of providing as many insights/ideas as we could)...
⬇️ is a very brief summary of some key results :)
Supervised and self-supervised learning? Two separate methods for different cases... one might say! With
@CabannesVivien
@ylecun
Leon Bottou we show instead that both live on the same continuum... opening the door to novel principled learning strategies!
Happy that our Active Self-Supervised Learning got accepted at ICCV! We prove that DNNs learn optimal representations only from positive data pairing. Since positive pairs are way cheaper than labels to query we also study that new active learning strategy
Latest preprint with Léon Bottou and
@ylecun
on the impact of regularization/data-augmentation on per-class performances (for better or worse)! Using them improves average generalization but some classes will have worse performance than without them
🧵1/4
Batch-normalization (BN)--used in pretty much all non-transformer AI models--minimize the total least square objective between the training points and the model's input space partition! TLDR: total least square is all you need to dive into AI theory!
Every technical person knows about ordinary least-squares (OLS) but most don’t know *total* least-squares (TLS).
These measure fitting error differently: OLS minimizes sum of sq. vertical distances whereas TLS minimizes the sum of orthogonal distances from data to fit line
1/2
Very happy to share our preprint, a joint-work with
@imisra_
and
@ylecun
, which is about data-augmentations (DAs), or rather, the expectation and variance of models' predictions and training losses under randomly augmented samples!
(1/5)
Very happy to share that our paper at the intersection of Information Theory/Self-Supervised Learning/Spline Theory got into
#NeurIPS
! We show how to (i) do information theory with deterministic network (ii) derive new SSL guarantees/methods from it!
Affine splines enable you to do deep learning theory without resorting to the linearized/kernel regime i.e. you study what practitioners actually deploy. But even more important, splines provide the coolest viz. of deep networks you could dream of!
List of useful spline papers⬇️
How are Deep Neural Networks black-boxes if you can visualize them in an 'exact' manner?
Our new
#CVPR23
paper, presents a fast and scalable PyTorch toolbox to visualize the linear regions, aka partition+decision boundary, of any DNN (red🔻)!
🧵 1/N
Learning good representations using manifold learning? Spectral embedding? Energy based models? Self-supervised learning? All share one goal: learning non-collapsed representations with minimal variations. Join
@CabannesVivien
@albertobietti
for a journey:
100% true. That is why I strongly recommend anyone learning deep learning to also take a basic digital signal processing course. At least to get the basics of convolution (CNNs), aliasing (sub-sampling/pre-processing), FIR and IIR filters (RNNs), wavelet thresholding (AEs)
Here is a very good reason why the Nyquist–Shannon sampling theorem requires that your function is low-pass before you sub-sample to downscale. If you just sub-sample without smoothing, a bad guy can place another image exactly on the pixels you sub-sample. Adversarial aliasing.
Wanna
- use Information Theory
- but with deterministic deep networks
- to study and improve self-supervised learning?
We do just that and explain how in our latest preprint with
@ziv_ravid
@ylecun
@timrudner
and Kenji!
Bonus: it uses affine splines ;)
Learning by reconstruction ``easily'' provides eye-candy samples...but the learned representation's ability to solve perception tasks is often a letdown. We pinpoint that misalignement, measure it, and show how some denoising tasks (masking) sometimes help
Very happy to introduce our preprint working out the geometry of LLMs... no approximation or simplification! Side effects: we extract informative features from LLMs that can solve various tasks such as toxic prompt detection and we bypass Llama2's RLHF!
Honored to join
@BrownCSDept
to keep pushing for theoretically grounded AI solutions! From self supervised learning (what else do you need?) to fairness, we have one motto: Prove Once Train Once
I want to thank everyone I have talked/pdb/trained/published with... you made me!
Please welcome
@randall_balestr
, joining
@BrownCSDept
as assistant professor! His research focuses on novel theoretical solutions to guide practitioners, to safeguard users, and to pave the way towards a truly autonomous AI solution. Learn more:
Decision trees do not combine input dims at each node but an oblique DT does
1. ODTs are not easily interpretable due that fact
2. some deep networks can be turned into ODTs (very deep + lots of nodes)
this does not help much for DN interpretability (1+2)
It has never been simpler to prevent DNs to overfit! Guillotine Regularization (accepted at TMLR) (i) adds a few layers on top of your favorite DN during training, (ii) removes them post-training, (iii) trains a linear layer on top of the frozen DN!
How to assess SSL models’ downstream performance with no labels, no tuning/training, and in a matter of minutes?
With
@garridoq_
,
@laurentnajman
, and
@ylecun
, we answer this question by introducing RankMe, a simple metric based on the rank of embeddings!
Aaaand we are live from Vienna at poster 1002! Come by to discuss about training dynamics, splines, and the two stage learning that secretly occurs within your deep networks!
Very happy to share our preprint that explains why residual connections provably make the loss surface of deep networks everywhere less erratic and eccentric (better conditioned)... hence resnet/densenet are easier to optimize under SGD out-of-the-box.
1/2
Less is more, which is why we put unsupervised learning on a DIET! By predicting the datum index (as if it were its class) DIET learns SOTA representations without labels!
+ it works without projector/siemese nets/... on resnets/vits/convnexts/..
WYSIWYG⬇️
Happy to be at
#ICML2022
! And happy to chat/brainstorm about SSL/splines/data-augmentation/... at the
@MetaAI
booth (Tuesday/Wednesday, 8:30 am until early afternoon)... or DM me!
We had found that training with a projector (MLP layers topping your DN) reduces the DN's learned biases e.g. to poor data-augmentation. We now found that you can control this effect only by changing the projector's input dimension!
Self-supervised learning involves many design choices (architecture, data-augmentation, ...) and cross-validation is not always an option. That is why, in our latest paper, we theoretically study the interplay between those choices and provide guidelines:
How to inject prior knowledge into Self Supervised Learning:
-loss
-architecture
-data augmentation
we add a fourth🕑dimension with Guided Positive Sampling:
-embedding space to query positive samples
removing the need to define strong DA + trains faster!
Even if the Fourier transform was not explicitly invoked, it has been present for decades as the preferred convolution algorithm for large image and/or filter sizes!
Here is yet another classic read from
@ylecun
on the subject
Self Supervised Learning learns informative and organized representations of unlabeled data... but involve many moving pieces...
Q:which are necessary and which are sugar coating?
A:
Bonus: removing the sugar coating makes SSL training stable and reliable
Happy that our work on understanding the interplay between architecture/data-augmentation on Self-Supervised Learning downstream perfs. has been accepted at
#ICML2023
! YES, you can successfully use SSL with ``bad'' DA as long as your DN archit. is right
We hope you have found all the answers you needed in our cookbook around SOTA representation learning with SSL! But wait, we will be giving away even more tips and tricks at our
#ICML2023
tutorial!
Monday/1:30pm local/exhibit hall2
speakers include
@imisra_
@mcaron31
@endernewton
Everything you ever wanted to know about Self-Supervised Learning but were afraid to ask.
A giant cookbook of SSL recipes.
By a large crowd from Meta-FAIR with various academic collaborators led by
@randall_balestr
and Mark Ibrahim.
We previously showed () how many SSL methods could be unified using an inter-sample relationship graph (spectral embedding). From that, we now propose a new SSL method: 𝕏-CLR ()!
better loss=less spurious correlations being learned
Representation learning is often done by considering samples to be either identical (same class, positive pairs) or not–with no middle ground. We propose 𝕏-CLR to learn from soft inter-sample relationships, and get better accuracy & improved robustness.
Aaaaand we are back on the ground at poster 602 to cover a breaking news: learning a representation by reconstruction will not produce something useful for perception tasks! They don't have the same taste in features! Come by to learn why and to discuss alternative solutions!
Aaaand we are live from Vienna at poster 1002! Come by to discuss about training dynamics, splines, and the two stage learning that secretly occurs within your deep networks!
Excited to share our
#NeurIPS2023
paper explaining part of the per-class accuracy degradation that data augmentation introduces: it creates asymmetric label-noise between coarse/fine classes of the same object e.g. car and wheel! We also find a remedy⬇️
Training dynamics of surrogate quantities e.g. the loss are well studied but do not provide many insights into the DN's geometry. But linear regions concentration do just that and still exhibit a double descent dynamic that is controlled by regularization
POLICE code is now available:
Quick facts:
- POLICE only takes 5 lines of code
- code is jit/CPU/GPU friendly (PyTorch)
- it will only take a few minutes to generate all the figures
Eager to see the figures/papers/ideas you will create from it!
Deep Neural Networks are powerful... but how do you provably enforce some constraints into them? With
@ylecun
we introduce POLICE a simple method that does just that provably without sampling or changes in your loss/training (and it uses affine splines)!
Interestingly the ReLU and Swish relation is well understood from a spline viewpoint akin to the relation between k-NN and isotropic GMM:
deterministic vs probabilistic region assignment!
The same goes for absolute value vs Mish, and many more!
More at
with all the sparsity-aware context based memory loading papers coming out, (PowerInfer getting 11x and Apple getting 25x speedup on GPU) ReLU's dead zone is turning out to be important
llama-class models (SwiGLU) might not have much longevity afterall
once all the Metal work
Our
#CVPR2023
submission has been accepted! ()
We develop an exact+fast algo to compute a Deep Network partition characterizing its geometry and decision boundary e.g. to rapidly sample from the latter for viz/active learning!
Code:
Not too surprising since e.g. batch-norm with random weights provably aligns the DN's partition to the data geometry:
just from its mini-batch statistics!
"The Expressive Power of Tuning Only the Norm Layers"
lead by
@AngelikiGiannou
&
@shashank_r12
We show that large frozen networks maintain expressivity even if we only fine-tune the norm & bias layers.
Awesome website summarizing our latest TMLR paper demonstrating how deep networks pruning can be easily explained/visualized and improved simply by formulating it in terms of the DN's spline partition!
Paper:
Code:
Vision Language Models have fueled recent AI breakthroughs...but the next generation will need to do more than just scale up datasets and models sizes! Dive into our latest preprint and benchmark library to understand why and to stress-test your ideas!
Happy to have four papers accepted to
#NeurIPS2022
! Shoutout to incredible co-authors/colleagues
@imisra_
@ylecun
@bobak_kiani
and Leon Bottou! I will tweet about each in the coming days... but⬇️
TLDR: Never stop improving papers from reviews/comments... perseverance is the key!
Happy to share our
#CVPR2022
paper w/
@imtiazprio
,
@rbaraniuk
providing a simple solution to provably sample from the (anti-)modes of pre-trained generative networks... also leading to new StyleGAN2/3/BigGAN FID SOTAs
🧵(1/4)
colab:
Delighted to share that our work with
@garridoq_
and
@ylecun
got an oral+poster at
#ICML2023
! We enable truly label-free hyper-parameter search for SSL (validated on SimCLR/VICReg/DINO/.. and many datasets) aiming for best linear perf. without fine-tuning!
A (Deep) Network has always been "any computational graph with forward and (optionally) backward data-flow" (see⬇️) this is a big class that includes kernels, trees, k-NN... just by a change of arch. so when people say moving away from DNs do they mean moving away from computers?
Happy to share our accepted
@TmlrSub
with
@eiclab
about deep network (DN) pruning from an affine spline perspective!
In short, pruning removes/projects the DN partition boundaries (nice avenues to theoretically understand/improve pruning)
Some insights ⬇️
Amazing work resulting from an amazing collaboration made possible by
@forai_ml
and
@sarahookr
!
TLDR: we still have a lot to learn around what brings stochasticity in deep network training (init/batching/DA) and by how much. This paper takes an important step in quantifying them!
Our newest Paper Profiles video goes behind the scenes of our recent community-driven research collaboration, "FAIR-Ensemble: When Fairness Naturally Emerges from Deep Ensembling." Thanks to
@weiyinko_ml
and
@mrdanieldsouza
for taking the time to chat!
Accepted as an oral
#CVPR2022
!🥳
Taking this opportunity to say that this comes as a result of years of work in building a theoretical bridge between deep networks<->continuous piecewise affine operators. Theory is a guide that reduce the set of unknown to be cross-validated 🧑🔧!
Happy to share our
#CVPR2022
paper w/
@imtiazprio
,
@rbaraniuk
providing a simple solution to provably sample from the (anti-)modes of pre-trained generative networks... also leading to new StyleGAN2/3/BigGAN FID SOTAs
🧵(1/4)
colab:
SSL and supervised learning unified under one loss (only the inter-sample similarity graph varies between them) at
#ICCV23
Friday/10:30/Nord/023
Hello to cheap expert-free active/supervised learning by asking if samples come from the same class, not asking for the class label!
New Constructive Approximation paper: Deep Networks with (leaky-)relu and least-square loss have continuous piecewise quadratic per-layer loss landscape. From that we precisely study how the DN architecture impacts that loss landscape and SGD convergence!
Two ICLR23:
- spotlight: new fine-grain labels for Imagenet+insights into failure modes of models/DA/losses
- poster: theory unraveling the failures that emerge when deploying self-supervised learning on uncurated data
One goal: understanding when/why deep learning can fail
⬇️
Delighted to share our latest preprint with Bobak Kiani,
@ylecun
, and Seth Lloyd where we propose an **efficient and scalable** gradient based training of orthogonal/unitary matrices (e.g. used in each layer of a recurrent network/convolutional network).
+1! Academic programs should favor candidates who did not get that chance, to mentor them and to bring them at the top for post-PhD adventures. Taking PhD candidates to bloat your group's publication counts year-one goes against the academic spirit... (teaching statement anyone?)
If you have multiple papers before you even began a PhD, it likely means you had access that others didn't.
I wish more PhD programs would take a step back and stop this absurd practice of favoring multiple papers before someone even begins a training program.
RankMe: cheap/fast label-free hparam selection for DNNs will be at ICML
- oral: ballroom B 4pm (local) Wed.
- poster⬇️: exhibit hall 1
#609
1:30pm Thur.
Also includes insights around representations' ranks, their surprising consistency across datasets, ...
Delighted to share our
@TmlrOrg
paper with F. Bordes and P. Vincent! We use the latest diffusion model to interpret/visualize the features of black-box models (DNNs, ...) by conditioning the generation with the model's features.
We obtain many insights⬇️⬇️
"We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be.
(1/2)
Do you remember the POLICE (fast PrOvable LInear Constraints Enforcement for deep networks ) ? An application to adversarial robustness is now available
POLICE can be used as a one-shot robustifier, or during training/fine-tuning!⬇️
Deep Neural Networks are powerful... but how do you provably enforce some constraints into them? With
@ylecun
we introduce POLICE a simple method that does just that provably without sampling or changes in your loss/training (and it uses affine splines)!
Delighted to share that--by dint of all my coauthors--I will be at
#ICML2024
to present our findings! From LLM geometry to adversarial grokking, without forgetting the provable benefits of moving away from reconstruction for representation learning!
link:
Self-Supervised Learning methods have strong a priori on the type of data distribution you train on. With
@mido_assran
et al. we highlight what are those a priori and how to tune them at our advantage e.g. to improve SSL on uncurated and/or imbalanced data
Very happy to share that we will be presenting that work at
#ICML2024
! Moving away from reconstruction is key to learn better semantic abstractions... but we can only do so by first understanding why learning by reconstruction falls short!
Learning by reconstruction ``easily'' provides eye-candy samples...but the learned representation's ability to solve perception tasks is often a letdown. We pinpoint that misalignement, measure it, and show how some denoising tasks (masking) sometimes help
With very large models and/or slow training frameworks (GPT-3, self-supervised learning, ...) I believe that theoretically-backed methods will regain grounds... brute-force cross-validation of everything is no longer an option! MuTransfer of
@TheGregYang
embodies that perfectly
Another benefit of batch norm lies in the randomness of the mini-batch statistics (from one to the other) inducing a jittering effect in the partition and increasing the decision boundary margin to training samples! batch size controls the jittering strength and can be controlled
We will be presenting this work at
#ICML2024
diving deeper into LLMs' geometry and how that can help in understanding their current limitations. For example, increasing a prompt's intrinsic dimension bypasses RLHF!
Congrats
@Rom_Cosentino
@shekkizh
Very happy to introduce our preprint working out the geometry of LLMs... no approximation or simplification! Side effects: we extract informative features from LLMs that can solve various tasks such as toxic prompt detection and we bypass Llama2's RLHF!
🔔LLM update!
- The few hundred features we extract from Mistral/Llama2-7B to characterize your prompt (e.g. for domain separation or toxicity detection) also work on Llama2-70B
- We validate them on the official Jigsaw Kaggle challenge and reach SOTA
Very happy to introduce our preprint working out the geometry of LLMs... no approximation or simplification! Side effects: we extract informative features from LLMs that can solve various tasks such as toxic prompt detection and we bypass Llama2's RLHF!
In our latest preprint we show that Deep Ensembles have fairness benefits even when each model uses the same training set/architecture/optimizer. We also characterize by how much random init./data-augmentation/data-ordering impact the learned model between training episodes :)
Our new preprint is out! In FAIR-Ensemble, we explore per-group performances after predictions averaging of Deep Networks (same architecture, hyper-parameters) and fairness naturally emerges!
Paper:
Code:
1/8
Thanks to an amazing team (
@byoubii
@D_Bouchacourt
@marksibrahim
et al.) we are releasing fine-grained distribution shift annotations for each Imagenet eval image and many train ones along with controlled robustness analysis of many SOTA models e.g. looking at the impact of DA!
Even when looking at high-dimensional spaces and architectures with thousands of units per layer+multiple layers, training time of the constrained model is only about 4~5x slower that the unconstrained one which is a small cost to pay for a provable constraint enforcement method!
In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort."
Sir Ronald Fisher, (last page)
Very happy to speak at SSL4EO! The core part of my talk will follow our latest papers to (i) provide principled insights into SSL, and (ii) give guidelines to design your own pipeline:
1/4: why do we need to move away from learning by reconstruction ()
⬇️
"We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be.
(1/2)
Aaaand we are now covering that surprising event happening on poster 705 with guest
@shekkizh
: diving into LLMs geometry and using those insights to derive features from pretrained models or to bypass RLHF through natural prompt manipulations!
Aaaaand we are back on the ground at poster 602 to cover a breaking news: learning a representation by reconstruction will not produce something useful for perception tasks! They don't have the same taste in features! Come by to learn why and to discuss alternative solutions!
POLICE can also be applied to classification/SSL tasks, you do not need to change your loss, optimizer, or architecture, and its complexity is only determined by the complexity of doing a single forward-pass in your model X the number of vertices defining the constrained region
By upgrading the FFCV library, we enable super fast/single GPU training of SSL. We also explore how to cross-validate SSL methods, and show that known failure cases were just the result of poor hyper-parameters!
Huge effort by Florian Bordes (MVP!), Pascal Vincent and myself :)
Our geometric characterization of LLMs ( at
#ICML2024
) tied the prompts' intrinsic dimensions to their ability to make a LLM generation toxic.
@Tenyx_AI
researchers extended our results for reasoning!
Q: Can reasoning and safe generation coexist with LLMs?
Unlocking better reasoning in LLMs can go beyond just longer context & bigger models!
Our recent research () offers a geometric view of the expressive power and reasoning capabilities of LLMs. Stay tuned for more insights!
@Rom_Cosentino
#LLM
#Reasoning
Impressive program at the upcoming World AI Cannes Festival in France
@WAICANNES
! The AI Society / AI Today & Tomorrow track alone has an impressive list of speakers including
@ylecun
, anyone can attend (for free) with the Discovery Pass !
As a bonus, our findings also explain why long training time is required to `````finally''''' capture the features useful for perception tasks as part of the representation. Our findings open new avenues to speed up training through new denoising tasks!
New preprint + AI4Science
#ICML2024
workshop: ScaLES!
ScaLES provides a differentiable confidence score for samples generated from pretrained models. Applied to Latent Space Optimization, ScaLES improves the solutions to black-box optimization problems!
You have to train an ensemble of Deep Networks with same training set and architecture.
Q: How to maximize the ensemble fairness?
1. vary the weight initialization
2. vary the data sampling
3. vary the data-augmentation seed
4. all the above
Answer at the AFT workshop on Friday!
@weiyinko_ml
@mrdanieldsouza
Find out more about what model design choices can mitigate unfair outcomes by reading "FAIR-Ensemble: When Fairness Naturally Emerges from Deep Ensembling."
📜
The DN partition boundary (random weights) shows an higher concentration of regions around the data distribution (from using batch norm alone!). This fitting is proved analytically: batch norm statistics shift/bend the partition boundaries to the data, and depth is crucial!⬇️
We always hear about applied deep learning results relying on many tricks, but never about theoretical results around deep learning that often rely on even more assumptions! Both are highly specialized, need fine-tuning to work on new cases and both struggle to impress each other
Provable control of the quality and diversity of sampling for pre-trained deep generative networks ... without any additional learning! Check us out at
#CVPR2022
tomorrow at 8:30 am in Hall B1, Oral Session 3.1.1, and Poster Session 3.1
Happy to share our
#CVPR2022
paper w/
@imtiazprio
,
@rbaraniuk
providing a simple solution to provably sample from the (anti-)modes of pre-trained generative networks... also leading to new StyleGAN2/3/BigGAN FID SOTAs
🧵(1/4)
colab:
Real-world deep learning interview problems and solutions:
Interesting resource for everyone! I especially enjoy the thorough references that have been put throughout the set of problems!
@ISusmelj
can't anyone add a feature saying that if the type of inferred vehicle changed chaotically during 5 sec, let's just represent it as an "unidentifiable blob" until it stabilizes? at least would not look as buggy on the frontend...
Furthermore, Batch-Normalization which is known to force the DN partition to concentrate near the training samples () prevents robustness to emerge! In fact, Grokking is all about understanding the DN partition migration dynamics...
We indeed need a more closed-loop SSL e.g. where data-augmentations of positive pairs are guided by the deep network's guess (action) of what new views (sensory inputs) would provide it with new information leading to a sharper understanding of the presented image/scene!
#NeuroAI
: Could principles of embodied sensorimotor neuroscience unify and improve the various Self-Supervised Learning (SSL) methods? How could the brain self-supervise itself?
We are happy to share our
#NeurIPS2022
paper with
@franz_scherr
and Q. Guo🧵:
@WriteArthur
Well, yes there are the natural statistics of the training data that will influence the generation, but the type of DA/regularization that was used during training also plays a huge role. See for example our latest preprint on that point
Latest preprint with Léon Bottou and
@ylecun
on the impact of regularization/data-augmentation on per-class performances (for better or worse)! Using them improves average generalization but some classes will have worse performance than without them
🧵1/4
One can only imagine the vast amount of knowledge that was distilled into this, which is why that end-result would have never been possible without all the incredible co-authors that decided to collaborate for one purpose: sharing their knowledge and past experiences!