Leland McInnes @leland_mcinnes profile

Leland McInnes

@leland_mcinnes

Followers

5,864

Following

819

Media

79

Statuses

3,864

A mathematician dabbling in the world of data science. Researcher at the Tutte Institute for Mathematics and Computing. UMAP, HDBSCAN, PyNNDescent. He / Him.

https://t.co/oA52CL32uE

Ottawa, Ontario

Joined October 2016

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

ジェンティルドンナ • 122735 Tweets

#Solingen • 92556 Tweets

満塁ホームラン • 87481 Tweets

大谷翔平 • 76911 Tweets

TO MAESTRO KHEM WITH LOVE • 70486 Tweets

Brighton • 66535 Tweets

#ラヴィットロック2024 • 48025 Tweets

史上最速 • 45225 Tweets

Vielfalt • 40809 Tweets

#V最協S6 • 35877 Tweets

ML IN MACAU • 32896 Tweets

ケンタッキー • 29733 Tweets

Täter • 29577 Tweets

Gabbar • 29418 Tweets

大谷さん • 28280 Tweets

Cemal Enginyurt Tutuklansın • 27338 Tweets

Slogan • 26155 Tweets

#FNTHWIN • 26013 Tweets

Messer • 25479 Tweets

悪役令嬢の中の人 • 24728 Tweets

グランドスラム • 23699 Tweets

大谷選手 • 22243 Tweets

江戸川花火大会 • 19652 Tweets

Kアリーナ • 18431 Tweets

オオタニサン • 16940 Tweets

ALNP FANMEET IN HK • 14985 Tweets

スーパースター • 14206 Tweets

STRAY KIDS DOMINATE SEOUL • 10845 Tweets

Textbausteine

EBiDAN museum 2024

Zustände

Oluwa Dolarz

満塁HR

清宮フレンズ

マーフィー

口田くん

発券済み

しょーじくん

浴衣の人

浅野翔吾

韓国女子

韓国の女の子

浅野くん

友也くん

DXPO By Danamon

ブライト

宮城くん

障子くん

うさほー

#ミリビリ

Last Seen Profiles

@blondkookmin

@manriquearturo

@ElliotCox51

@Keiona__

@ErinMHarding

@Pedigree_India

@MasterGktau

@michalporat

@LexNarco

@waylajayy

@julians_world

@nathaniel_mcrae

@prisselah

@Sammygirl252

@MrToshii

@NiklasAnzinger

@ivebeenjack

@Braainsio

@costa2x

@RacingPost

Leland McInnes

@leland_mcinnes

7 years

Our paper on UMAP, a faster alternative to t-SNE, is now up on arXiv! The paper provides a more detailed account of the theoretical underpinnings of the algorithm, as well as performance benchmarks.

15

573

1K

Leland McInnes

@leland_mcinnes

5 years

The first release candidate for UMAP 0.4 is out providing lots of new features, including performance improvements, embedding to different manifolds, inverse transform, and plotting tools.

10

359

1K

Leland McInnes

@leland_mcinnes

4 years

The latest version of umap-learn is now out. Version 0.5 includes some major new features, including ParametricUMAP, DensMAP, AlignedUMAP, model composition, and model updating. Thank you to everyone who contributed! 1/14

9

304

1K

Leland McInnes

@leland_mcinnes

9 months

The landscape of the Machine Learning section of ArXiv.

23

166

797

Leland McInnes

@leland_mcinnes

5 years

Understanding UMAP - an interactive introduction to the algorithm and how to us (and mis-use) it from @_coenen and @adamrpearce . A must read for anyone interested in dimension reduction.

Understanding UMAP

UMAP is a new dimensionality reduction technique that offers increased speed and better preservation of global structure.

pair-code.github.io

7

232

653

Leland McInnes

@leland_mcinnes

4 years

UMAP 0.4 is now out! It includes a host of new features, including plotting support, better sparse data support, inverse transforms, and embedding to non-euclidean manifolds. pip install umap-learn See this thread for some of the new features:

Leland McInnes

@leland_mcinnes

5 years

UMAP 0.4 supports embedding to non-Euclidean manifolds, including spheres, Poincare disks, and more.

8

41

132

5

178

584

Leland McInnes

@leland_mcinnes

6 years

An updated and significantly expanded version of our UMAP paper is now on arXiv: More explanation, algorithm descriptions, and more experiments looking at stability, and working directly on high dimensional data -- as high as 1.8 million dimensional data!

11

231

572

Leland McInnes

@leland_mcinnes

8 months

Introducing DataMapPlot for creating beautiful presentation ready plots of data maps. 🧵

10

111

543

Leland McInnes

@leland_mcinnes

6 months

14

102

513

Leland McInnes

@leland_mcinnes

6 years

UMAP version 0.3 is now available. You can now add new data to an existing embedding, embed using labelled data, or use both features for metric learning. Documentation is on readthedocs: .

8

187

499

Leland McInnes

@leland_mcinnes

2 years

Ever needed a few more colours than the standard colour cycle for your plot? Ever wanted a categorical colour palette based around your own custom colours? With glasbey you can create and extend custom categorical colour palettes with ease.🧵

12

60

465

Leland McInnes

@leland_mcinnes

7 years

The new numba based version of UMAP is out. Now faster than ever, it takes only 2.5 minutes to embed the full 70000 points of the 784-dimensional "Fashion MNIST" dataset.

11

173

465

Leland McInnes

@leland_mcinnes

4 years

Here's a really nice simple intuitive explanation of th HDBSCAN clustering algorithm:

A gentle introduction to HDBSCAN and density-based clustering

Explaining HDBSCAN in ~5 minutes

towardsdatascience.com

4

75

293

Leland McInnes

@leland_mcinnes

6 years

Really enjoying the #mlprague conference. Slides for my talk on topological approaches to unsupervised learning problems can be found here:

Learning Topology: topological methods for unsupervised learning

A whirlwind tour of how topological ideas and methods can provide powerful solutions to unsupervised learning problems.

speakerdeck.com

6

66

260

Leland McInnes

@leland_mcinnes

6 months

A major update for DataMapPlot adds interactive plots. See for an example. Let's dig in to what you can do with DatMapPlot 0.2 ... 🧵

6

55

250

Leland McInnes

@leland_mcinnes

5 years

Pynndescent, an approximate nearest neighbor search library, got a major update recently. Index construction is now multicore by default. Querying is now much faster -- competitive with some of the fastest ANN libraries around. (1/4)

4

49

242

Leland McInnes

@leland_mcinnes

3 years

A new round of Approximate Nearest Neighbour search benchmarking by is out, including lots of new libraries and algorithms. It is good to see PyNNDescent still performing very well.

5

38

190

Leland McInnes

@leland_mcinnes

6 years

I just started playing around with @datashader edge bundling for visualizing graphs associated to UMAP embeddings. Here's one for MNIST:

7

29

182

Leland McInnes

@leland_mcinnes

6 years

My talk at PyData NYC on dimension reduction is now available. Hopefully it provides a useful basic taxonomy to help people navigate the vast zoo of dimension reduction techniques.

A Bluffer's Guide to Dimension Reduction - Leland McInnes

PyData NYC 2018Dimension reduction is a complicated topic with a vast zoo of diverse techniques for different specialised problems. This talk will seek to cu...

www.youtube.com

6

42

177

Leland McInnes

@leland_mcinnes

5 years

The slides for my talk today at #PyDataLA can be found here:

Learning Topology: topological methods for unsupervised learning

A whirlwind tour of how topological ideas and methods can provide powerful solutions to unsupervised learning problems.

speakerdeck.com

3

44

168

Leland McInnes

@leland_mcinnes

4 months

A new release of DataMapPlot adds the ability to place labels over top of the map for a word-cloud style look. As usual there remain a lot of options to fine tune and customize to your needs.

2

32

165

Leland McInnes

@leland_mcinnes

4 years

This is some amazing work from @tim_sainburg . Some major takeaways: - lightning fast transform/inverse_transform operations (comparable to PCA if you have a GPU); - semi-supervised classification: 97.8% accuracy on MNIST with only 4 labelled items per class!

Tim Sainburg

@tim_sainburg

4 years

New paper "Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning" with @leland_mcinnes and @TqGentner ! 1/

4

61

294

1

38

154

Leland McInnes

@leland_mcinnes

1 year

Have you been frustrated that HDBSCAN doesn't use all your cores, or is too slow? Fast-hdbscan is a numba based version of HDBSCAN that can use all your cores and significantly outperform the hdbscan python package for low-d Euclidean data.

2

18

145

Leland McInnes

@leland_mcinnes

5 years

UMAP 0.4 supports embedding to non-Euclidean manifolds, including spheres, Poincare disks, and more.

8

41

132

Leland McInnes

@leland_mcinnes

3 months

If you want to experiment with a work-in-progress Embedding Vector Oriented Clustering library you can try out EVōC (pronounced "evoke").

GitHub - TutteInstitute/evoc: Embedding Vector Oriented Clustering

Embedding Vector Oriented Clustering. Contribute to TutteInstitute/evoc development by creating an account on GitHub.

github.com

8

29

132

Leland McInnes

@leland_mcinnes

6 years

I just gave a talk at #PyDataNYC on dimension reduction -- focussing on core intuitions and unifying concepts. You can find the slides for it here:

A Guide to Dimension Reduction

Talk given at PyData NYC 2018 on Dimension Reduction: a quick tour of a broad swathe of the field with a focus on core ideas and intuitions rather than …

speakerdeck.com

5

52

129

Leland McInnes

@leland_mcinnes

7 years

Initial experimental version of UMAP code now on github: . Aiming for better dimension reduction than t-SNE.

7

49

122

Leland McInnes

@leland_mcinnes

6 years

Here's a useful term when looking at t-SNE/UMAP plots ...

Data Science Fact

@DataSciFact

6 years

Apophenia: The tendency to see patterns in random data.

5

177

437

2

32

121

Leland McInnes

@leland_mcinnes

6 years

Slides for my @SciPyConf talk this morning on UMAP can be found here:

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

This talk will present a new approach to dimension reduction called UMAP. UMAP is grounded in manifold learning and topology, making an effort to preser…

speakerdeck.com

1

40

119

Leland McInnes

@leland_mcinnes

6 months

The landscape of Machine Leaning on ArXiv: Now available in a zoomable, searchable version with paper titles on hover.

Leland McInnes

@leland_mcinnes

9 months

The landscape of the Machine Learning section of ArXiv.

23

166

797

0

27

113

Leland McInnes

@leland_mcinnes

3 years

If you have GPU resources handy the new HDBSCAN implementation in @RAPIDSai cuML is amazingly fast. You can get to millions of points clustered in only a few minutes!

GPU-Accelerated Hierarchical DBSCAN with RAPIDS cuML – Let’s Get Back To The Future | NVIDIA...

Read about the HDBSCAN algorithm, a density-based clustering method that is robust and practical for use in industry and scientific computing applications.

developer.nvidia.com

3

20

112

Leland McInnes

@leland_mcinnes

3 months

If you could get a clustering algorithm and library specifically designed for fast clustering of embedding vectors (CLIP, sentence-transformers, Cohere-embed, etc.), what features would you most want it to have?

30

17

111

Leland McInnes

@leland_mcinnes

3 years

Playing with some nlp related tools I've been working on, I ended up with some nice visualizations. This is Top2Vec style topic words on a UMAP layout of 20-newsgroups document vectors using masked word-clouds for each newsgroup.

3

19

110

Leland McInnes

@leland_mcinnes

5 years

Using UMAP to make neural net activation spaces more interpretable.

OpenAI

@OpenAI

5 years

In collaboration with Google, we're releasing Activation Atlases: a new technique for visualizing what interactions between neurons can represent. 💻Blog: 📝Paper: 🔤Code: 🗺️Demo:

18

848

2K

2

22

107

Leland McInnes

@leland_mcinnes

6 years

The video of my talk on UMAP at @SciPyConf is already up. You can find it here: #SciPy2018 #UMAP

UMAP Uniform Manifold Approximation and Projection for Dimension...

This talk will present a new approach to dimension reduction called UMAP. UMAP is grounded in manifold learning and topology, making an effort to preserve th...

www.youtube.com

2

33

103

Leland McInnes

@leland_mcinnes

6 years

If you want to spend some time exploring a UMAP embedding of images (like MNIST) @GrantCuster put together a nice tool:

Embedding Explorer

An interactive UMAP visualization of the MNIST hand-written digit data set.

grantcuster.github.io

2

37

104

Leland McInnes

@leland_mcinnes

5 years

A great introduction to HDBSCAN and density based clustering:

0

22

100

Leland McInnes

@leland_mcinnes

6 years

Simplicial Autoencoders using UMAP theory to build better autoencoders (and a nice introduction to UMAP as well):

3

30

98

Leland McInnes

@leland_mcinnes

7 years

What if I redesigned HDBSCAN from scratch based on the theory behind UMAP? Apparently it might actually work fine and look something like this:

HDBSCAN using UMAP theory.ipynb

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

3

23

99

Leland McInnes

@leland_mcinnes

3 years

My #scipy2021 talk on PyNNDescent, a library for fast approximate nearest neighbour search is now available:

PyNNDescent Fast Approximate Nearest Neighbor Search with Numba |...

null

www.youtube.com

5

23

97

Leland McInnes

@leland_mcinnes

4 years

The first release candidate for umap-learn 0.5 is out. Take the opportunity to verify the new version works for you.

3

15

97

Leland McInnes

@leland_mcinnes

3 years

A great example of what UMAP is for: look at your data and realise it wasn't what you thought -- and then use it to ask better questions about your data before proceeding with fancier ML tools.

Alex Lu

@alexijielu

3 years

It was only when we visualized the UMAP that we got suspicious: the representations of all IDRs split into two big blobs. That's when we decided to interpret the features, and then we realized: half the features had a big "M" capturing the start methionine.

1

3

10

0

21

95

Leland McInnes

@leland_mcinnes

5 years

Here's a new cluster extraction method for HDBSCAN that has some significant potential benefits and makes the algorithm that much more flexible.

A Hybrid Approach To Hierarchical Density-based Cluster Selection

HDBSCAN is a density-based clustering algorithm that constructs a cluster hierarchy tree and then uses a specific stability measure to extract flat clusters from the tree. We show how the...

arxiv.org

0

28

92

Leland McInnes

@leland_mcinnes

5 years

My talk on topological data analysis at ML Prague is already online! It provides a brief whirlwind tour of why topological methods matter for unsupervised learning problems. #mlprague

Leland McInnes · Topological Approaches for Unsupervised Learning

Many topics in unsupervised learning can be viewed as dealing with the relative geometry of data. In mathematics, topology and homotopy theory are the fields that deal with similar kinds of questions.

slideslive.com

0

35

92

Leland McInnes

@leland_mcinnes

3 years

Slides for my #SciPy2021 talk on PyNNDescent can be found here:

PyNNDescent: Fast Approximate Nearest Neighbors with Numba

A PDF version of slides for my SciPy 2021 talk on PyNNDescent.

speakerdeck.com

0

20

92

Leland McInnes

@leland_mcinnes

6 years

A spin off from UMAP work: a small easy to install library for approximate nearest neighbour search.

GitHub - lmcinnes/pynndescent: A Python nearest neighbor descent for approximate nearest neighbors

A Python nearest neighbor descent for approximate nearest neighbors - lmcinnes/pynndescent

github.com

2

25

89

Leland McInnes

@leland_mcinnes

5 years

This is a nice way to get some sense of what UMAP is doing at least for low dimensional data.

Max Noichl

@MaxNoichl

5 years

2D UMAP of a 3D woolly mammoth, to build intuitions about how features are preserved in dimensionality reduction. Wonderful 3D scan from the people at @3D_Digi_Si .

5

27

76

0

26

87

Leland McInnes

@leland_mcinnes

5 years

The video for my talk at PyData LA 2019 on "Topological Methods for Unsupervised Learning" is now online:

Leland Mcinnes: Topological Techniques for Unsupervised Learning |...

www.pydata.orgPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the internati...

www.youtube.com

1

13

86

Leland McInnes

@leland_mcinnes

3 years

Hypergraphs and simplicial complexes are going to become ever more prevalent. Here's a great article on some of the reasons why they are so interesting.

How Big Data Carried Graph Theory Into New Dimensions | Quanta Magazine

Researchers are turning to the mathematics of higher-order interactions to better model the complex connections within their data.

www.quantamagazine.org

2

21

85

Leland McInnes

@leland_mcinnes

7 years

Embedding the MNIST test set with a new manifold learning approach. Captures more global structure than t-SNE.

6

26

82

Leland McInnes

@leland_mcinnes

6 years

I'm considering dropping python 2.7 support for hdbscan and umap-learn. Let me know if this would be extremely painful for you. Also let me know if this would make you happy.

16

1

83

Leland McInnes

@leland_mcinnes

5 years

Embed All The Things! John Healy's PyData LA talk on using UMAP on diverse types of data is well worth watching.

John Healy: MAP All the Things | PyData LA 2019

www.pydata.orgPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the internati...

www.youtube.com

2

22

77

Leland McInnes

@leland_mcinnes

4 years

I really want to emphasize how amazing @numba_jit is. Pynndescent is pure python code relying on numba for acceleration. It is performance competitive with *highly optimized* C++ code. I still can't actually believe how incredibly well numba works!

3

20

78

Leland McInnes

@leland_mcinnes

4 years

Delivery is apparently a little slower to Canada, but I finally got my copy of @math3ma 's book! Certainly worth the wait...

2

5

79

Leland McInnes

@leland_mcinnes

5 years

Suppose UMAP could represent data not as 2d points, but as 2d gaussians with a full covariance matrix. Would that be useful? What would be the best way to represent that visually?

11

8

77

Leland McInnes

@leland_mcinnes

4 years

I have been revisiting pynndescent recently, and with help from the @numba_jit team I managed to get some significant performance gains. Preliminary tests on @fulhack 's ann-benchmarks is looking very promising. Hopefully I'll have a new 0.5 release with these changes out soon.

2

17

74

Leland McInnes

@leland_mcinnes

5 months

I just added support for these binary embedding vectors to pynndescent. Using them directly with UMAP should be possible very soon...

Nils Reimers

@Nils_Reimers

5 months

🚀 𝐂𝐨𝐡𝐞𝐫𝐞 𝐄𝐦𝐛𝐞𝐝 𝐕𝟑 - 𝐢𝐧𝐭𝟖 & 𝐛𝐢𝐧𝐚𝐫𝐲 𝐒𝐮𝐩𝐩𝐨𝐫𝐭🚀 I'm excited to launch our native support for int8 & binary embeddings for Cohere Embed V3. They slash your vector DB cost 4x - 32x while keeping 95% - 100% of the search quality.

15

79

443

4

8

72

Leland McInnes

@leland_mcinnes

4 years

Plots not meta enough? Here is a nice UMAP plot of different plots. From "Viral Visualizations: How Coronavirus Skeptics Use Orthodox Data Practices to Promote Unorthodox Science Online"

2

16

74

Leland McInnes

@leland_mcinnes

5 years

Support for an "inverse transform" has been added to UMAP 0.4, providing the ability to generate a high dimensional representation of a point in the embedding space.

2

19

66

Leland McInnes

@leland_mcinnes

4 years

AlignedUMAP allows sequences of different UMAP embeddings to be aligned with each other according to relations among the datasets. This can be particularly useful for situations such as time evolving data. 7/ 14

8

9

69

Leland McInnes

@leland_mcinnes

4 years

An upcoming feature currently in the 0.5dev branch of UMAP will make this much easier to do. e.g. mapper1 = umap.UMAP(metric='euclidean').fit(continuous_data) mapper2 = umap.UMAP(metric="dice").fit(discrete_data) consensus_mapper = mapper1 * mapper2

Nikolay Oskolkov

@NikolayOskolkov

4 years

I just published UMAP for Data Integration

2

57

181

0

14

67

Leland McInnes

@leland_mcinnes

1 year

Slides for my talk at @PyDataSeattle today can be found here: #PyDataSeattle

2023 PyData Seattle Data Maps for Data Exploration for sharing

Leland McInnes, John Healy, Colin Weir, Benoit Hamelin 1 DATA MAPS for DATA EXPLORATION https://refikanadol.com/works/wdch-dreams/

docs.google.com

4

24

65

Leland McInnes

@leland_mcinnes

6 years

A paper in @JOSS_TheOJ for the UMAP software implementation is now published: . Thanks to the editors ( @arokem ) and reviewers ( @TerryTangYuan ) for providing such a smooth process for publication.

1

23

65

Leland McInnes

@leland_mcinnes

4 years

An interesting alternative approach to topic modelling using doc2vec, UMAP and HDBSCAN:

GitHub - ddangelov/Top2Vec: Top2Vec learns jointly embedded topic, document and word vectors.

Top2Vec learns jointly embedded topic, document and word vectors. - ddangelov/Top2Vec

github.com

0

12

62

Leland McInnes

@leland_mcinnes

4 years

@DrPattiJones PCA provides a global linear projection onto the hyperplane defined by the directions of global maximal variance in your data. UMAP attempts to stitch together many local views of the data accounting for local variance, into an intermediate structure, then represent that in low D

3

60

Leland McInnes

@leland_mcinnes

6 years

Inspired by the t-SNE animation from @ChaseClarkatUIC I decided to try something similar for UMAP. Here is an animation for varying values of the n_neighbors parameter. Increasing values give more weight to global structure over local structure.

2

20

60

Leland McInnes

@leland_mcinnes

6 years

UMAP now has 1,000 github stars! Thanks to all the users and contributors! There are more features coming in version 0.3 soon, and some exciting ones in very early development.

1

11

57

Leland McInnes

@leland_mcinnes

3 years

@ch402 @SuhnyllaKler @AnthropicAI An example of current work: is linear optimal transport applied to word vectors a decent sentence/document embedding model? It turns out yes, yes it is. There's still a long way to go to scale and benchmark on larger datasets, but it's promising.

Document Embeddings with the Vectorizers Library

Document Embeddings with the Vectorizers Library. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

4

11

56

Leland McInnes

@leland_mcinnes

5 months

A new minor release of umap-learn adds some very useful features: - Updating ParametricUMAP to Keras3 (kindly provided submitted by @fchollet ); - Initial support for binary embedding vectors with metric="bit_hamming" and metric="bit_jaccard".

1

3

57

Leland McInnes

@leland_mcinnes

3 years

I'll be giving a talk on PyNNDescent, a library for approximate nearest neighbour search, at #SciPy2021 on Friday.

0

10

54

Leland McInnes

@leland_mcinnes

5 years

Code from my lightning talk: ensemble topic modelling in Python with pLSA for fast stable topic modelling with the enstop package: #SciPy2019

GitHub - lmcinnes/enstop: Ensemble topic modelling with pLSA

Ensemble topic modelling with pLSA. Contribute to lmcinnes/enstop development by creating an account on GitHub.

github.com

1

17

51

Leland McInnes

@leland_mcinnes

3 years

HDBSCAN is now in RAPIDS!

RAPIDS AI

@RAPIDSai

3 years

Out now, RAPIDS release 21.06! New #cuML and #cuGraph algorithms, new list functionality, a whole new way to measure @RAPIDSai progress with the change to CalVer, and much more!

1

24

67

2

6

53

Leland McInnes

@leland_mcinnes

6 years

It's well worth reading the paper on FIt-SNE -- useful techniques and fun math.

George C. Linderman

@GCLinderman

6 years

@F_Vaggi @leland_mcinnes FIt-SNE uses an O(N) interpolation scheme to accelerate the computation of the gradient at each step. More details are available in the preprint () or some notes I wrote ()

1

20

1

9

48

Leland McInnes

@leland_mcinnes

6 years

I belatedly got to experimenting with FIt-SNE from @GCLinderman . It's very impressive and very fast -- definitely the implementation you should be using if you want to use t-SNE for visualization.

1

11

48

Leland McInnes

@leland_mcinnes

6 years

Good news for #rstats users looking for dimension reduction: An R package wrapping UMAP: ; and an independent implementation of UMAP in R: !

GitHub - tkonopka/umap: Uniform Manifold Approximation and Projection - R package

Uniform Manifold Approximation and Projection - R package - tkonopka/umap

github.com

0

25

46

Leland McInnes

@leland_mcinnes

5 years

For UMAP on other platforms I recommend GPU: (in cuML; thanks @cjnolet and @rapidsai !) R: Julia: Matlab:

Uniform Manifold Approximation and Projection (UMAP)

An algorithm for manifold learning and dimension reduction.

www.mathworks.com

1

11

45

Leland McInnes

@leland_mcinnes

1 year

The video of my talk from PyData Seattle on "Data Maps for Data Exploration" is now available:

Leland McInnes - Data Mapping for Data Exploration | PyData Seattle...

www.pydata.orgAs embeddings and and vector databases become ever more popular we need to develop new tools for exploratory data analysis. One such approach i...

www.youtube.com

1

11

44

Leland McInnes

@leland_mcinnes

6 years

Thanks also go to James Melville, author of the UWOT implementation of UMAP for R (), who has joined as a co-author.

GitHub - jlmelville/uwot: An R package implementing the UMAP dimensionality reduction method.

An R package implementing the UMAP dimensionality reduction method. - jlmelville/uwot

github.com

0

9

45

Leland McInnes

@leland_mcinnes

8 months

Documentation is on ReadTheDocs: Code is on Github: $ pip install datamapplot

GitHub - TutteInstitute/datamapplot: Creating beautiful plots of data maps

Creating beautiful plots of data maps. Contribute to TutteInstitute/datamapplot development by creating an account on GitHub.

github.com

2

3

45

Leland McInnes

@leland_mcinnes

5 years

The ambient coordinates of your data (coming from features) need not be related to the intrinsic notion of distance internal to the data itself. An idea worth wrapping your head around.

Topology Fact

@TopologyFact

5 years

'It's not so easy to free oneself from the idea that coordinates must have an immediate metrical meaning.' -- Albert Einstein

0

26

107

4

14

42

Leland McInnes

@leland_mcinnes

6 years

Really interesting to see UMAP on real-world data!

Evan Newell

@EvNewell1

6 years

Checkout Etienne Becht's bioRxiv preprint that compares UMAP with t-SNE for visualizing CyTOF and scRNAseq data. Many advantages of UMAP over t-SNE for high dimensional single-cell data! @leland_mcinnes

11

109

227

2

10

43

Leland McInnes

@leland_mcinnes

5 years

Documentation for UMAP 0.4 now includes examples of UMAP usage for visualization, exploratory analysis, and scientific publications. If you have a compelling use case, we would love to include it as well.

4

3

42

Leland McInnes

@leland_mcinnes

6 years

This was a fantastic series of of posts! If you want a well written intro to some of the ideas in topological data analysis this is a great place to start.

Jeremy Kun ([email protected])

@jeremyjkun

6 years

@asemic_horizon @scikit_tda @leland_mcinnes I wrote a series of posts leading up to some TDA (see "Topology" section here: ) And then a few posts in the TDA family before I lost steam (see Computational Topology section of )

1

4

36

1

13

42

Leland McInnes

@leland_mcinnes

5 years

Many thanks to @datametrician @cjnolet and @rapidsai for making this possible -- definitely some amazing performance available for UMAP on GPU!

Nicolas Fernandez

@franschrandez

5 years

Reproduced the #UMAP on #RAPIDS example by @ceshine_en () on Colab (with help from ). Seeing 60X speedup on Colab @leland_mcinnes @rapidsai @keithjkraus @datametrician @rodaramburu see Colab Gist

0

13

44

1

18

41

Leland McInnes

@leland_mcinnes

5 years

Code from my lightning talk: ensemble topic modelling in Python with pLSA for fast stable topic modelling with the enstop package:

GitHub - lmcinnes/enstop: Ensemble topic modelling with pLSA

Ensemble topic modelling with pLSA. Contribute to lmcinnes/enstop development by creating an account on GitHub.

github.com

1

10

42

Leland McInnes

@leland_mcinnes

3 years

It is a huge testament to the power of @numba_jit that a pure python library like PyNNDescent can be performance competitive with C++ libraries from Google (ScaNN), Microsoft (DiskANN), and Facebook (FAISS) among others. Many, many thanks to the whole @numba_jit team!

3

6

40

Leland McInnes

@leland_mcinnes

2 years

The glasbey library is on github: Documentation can be found on readthedocs: And you can pip install it: $ pip install glasbey

GitHub - lmcinnes/glasbey: Algorithmically create or extend categorical colour palettes

Algorithmically create or extend categorical colour palettes - lmcinnes/glasbey

github.com

1

6

38

Leland McInnes

@leland_mcinnes

3 years

@EmilyTWinn13 @SC_Griffith After the flood Noah is checking up on the animals. They're all breeding well, except for a pair of snakes. Noah gets a little worried and follows them. Eventually they find a fallen tree, and suddenly ... lots of baby snakes. It turns out that adders need logs to multiply.

1

4

39

Leland McInnes

@leland_mcinnes

5 years

@rctatman Here's a plan we use: Take the term-frequency matrix, remove the "expected" frequency (by subtracting, or using the column marginal as a noise model), UMAP with hellinger distance, and HDBSCAN for clustering. Still fine tuning the process, but has been very powerful so far.

3

2

39

Leland McInnes

@leland_mcinnes

5 years

An amazing introduction to UMAP and its parameters. This is for UMAP what the Distill article was for t-SNE. Great work from @_coenen and @adamrpearce as always!

Andy Coenen

@_coenen

5 years

Understanding UMAP - a high-level introduction to how the algorithm works, how to use it effectively, and how it compares with t-SNE.

8

182

631

1

19

39

Leland McInnes

@leland_mcinnes

6 years

@michaelhoffman Many of the t-SNE (and UMAP) plots I see suffer from potential over-plotting issues. This is particularly dangerous if you are trying to eyeball cluster purity. Using such plots as a starting point for further analysis rather than an endpoint is critical.

4

10

38

Leland McInnes

@leland_mcinnes

2 years

This is a fascinating paper -- using a contrastive approach on augmentations of images to learn a low dimensional representation they generate truly impressive results for image datasets!

Nik Böhm

@jnboehm

2 years

Ever wondered what image datasets look like if they could be visualized? We have developed a new algorithm for visualization based on contrastive learning. Joint work with @hippopedoid and @CellTypist . The full details are available as a preprint 🧵/16

4

66

264

1

3

37

Leland McInnes

@leland_mcinnes

6 years

I've started telling people "Look at your data, because whatever you think you know about the data is almost certainly wrong". I'm not sure it works any better, but at least I warned them...

Justin Jacobs

@Squared2020

6 years

“Have you tried looking at the data?” is my most common question when talking to folks who are inexperienced with data. Over the last two years, about 90% of the time, the answer has been, “Why?” or “What good would that do?” 🙄

0

2

13

2

11

36

Leland McInnes

@leland_mcinnes

7 years

A new version of UMAP is now available. A new layout algorithm provides more accurate embeddings even faster than before.

1

17

35

Leland McInnes

@leland_mcinnes

6 years

I'll be speaking at the Fields Institute today on using UMAP theory for general unsupervised learning. I'll be happy to chat more about these ideas afterwards as well.

2

37

Leland McInnes

@leland_mcinnes

4 years

Here's a really great interactive article using UMAP to explore and compare large deep neural networks by @mwli16 and @scheidegger :

0

9

35

Leland McInnes

@leland_mcinnes

3 years

I will be co-chairing the machine learning track at SciPy this year. Submissions are open, so if you have a machine learning project in python consider submitting. This is a great opportunity to share your work with a wide audience. @SciPyConf

0

6

36

Leland McInnes

@leland_mcinnes

6 years

Getting close to finishing version 0.3 of UMAP, including some useful new features. Ideally it'll come at just before or at @SciPyConf this year.

2

4

34

Leland McInnes

@leland_mcinnes

5 years

The core neighbor search in UMAP has been expanded upon in a separate library, PyNNDescent, which provides significantly improved performance. Combined with PyNNDescent UMAP 0.4 now support multi-core computation end-to-end (MNIST in ~45s on a laptop).

GitHub - lmcinnes/pynndescent: A Python nearest neighbor descent for approximate nearest neighbors

A Python nearest neighbor descent for approximate nearest neighbors - lmcinnes/pynndescent

github.com

2

7

34

Leland McInnes

@leland_mcinnes

4 years

ParametricUMAP uses a neural network to learn a UMAP embedding. This allows for a number of significant advantages. 2/14

3

5

32