Jonathan Larkin Profile Banner
Jonathan Larkin Profile
Jonathan Larkin

@jonathanrlarkin

Followers
3,984
Following
3,872
Media
289
Statuses
2,925

Investor/allocator @Columbia ; fmrly CIO @quantopian , Global Head of Equities @ Millennium, Eq Derivs Trading @jpmorgan CIB | Kaggle Master | marketneutral.eth

New York, USA
Joined March 2013
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@jonathanrlarkin
Jonathan Larkin
11 months
fyi
Tweet media one
0
3
18
@jonathanrlarkin
Jonathan Larkin
2 years
Inspired by @__mharrison__ and the Styling chapter in Effective Pandas. Embed @matplotlib plots inside your @pandas_dev dataframe. #dataviz 😍
Tweet media one
7
140
962
@jonathanrlarkin
Jonathan Larkin
7 months
Unpopular opinion: Causality is **not relevant** in the majority of #quantfinance modeling applications! “Successful prediction does not require correct causal identification.” Causal relationships are important if you want to **intervene** in a system. Quant traders are not
24
33
315
@jonathanrlarkin
Jonathan Larkin
4 days
I'm #hiring a #Quantitative Analyst. Please see the link in the next post. We are a small #investment team working on a heterogeneous portfolio. We work with tabular, time series, and text data to support/reject investment views, boost investment team efficiency, measure and
21
25
261
@jonathanrlarkin
Jonathan Larkin
2 years
@bantg Paging OG #cypherpunks . They have the playbook.
Tweet media one
4
21
161
@jonathanrlarkin
Jonathan Larkin
4 years
Some Friday fun while the model trains. You can use the @matplotlib xkcd styling with #seaborn 😀
Tweet media one
6
21
159
@jonathanrlarkin
Jonathan Larkin
5 months
Tweet media one
1
8
126
@jonathanrlarkin
Jonathan Larkin
2 years
@kjhealy Not exactly the first page but pretty much anywhere in Karatzas and Shreve you find things like this. “Obviusly”! Duh!
Tweet media one
5
5
113
@jonathanrlarkin
Jonathan Larkin
4 years
Julia is the #DataScience and #MachineLearning language of the future. Look how easy it is to parallelize an expensive (say feature engineering) function across columns. Incredible. #JuliaLang
Tweet media one
7
31
107
@jonathanrlarkin
Jonathan Larkin
7 months
Can the entire Python and PyData community please instantaneously decide and agree that we will all use `lets-plot` as the one and only plotting backend? Admit defeat and that ggplot and the grammar-of-graphics approach is far superior and nothing, until lets-plot, in the Python
14
16
104
@jonathanrlarkin
Jonathan Larkin
5 years
`voila` is the most exciting project in the @ProjectJupyter ecosystem right now. If you do work in notebooks, stop everything you are doing and watch this SciPy talk from last week by @maartenbreddels , @martinRenou , and @QuantStack .
3
27
76
@jonathanrlarkin
Jonathan Larkin
3 months
Wow yeah. This is fantastic. From zero to pretty-good-production-RAG in less than 30 minutes.
@HamelHusain
Hamel Husain
3 months
This talk by @bclavie is the highest value per second talk I have ever watched on RAG Chapter summaries and additional links in next tweet
15
144
1K
1
11
79
@jonathanrlarkin
Jonathan Larkin
2 years
@ChristophMolnar Classical ML techniques like SVM/SVR and KNN are making somewhat of a comeback these days due to nvidia's cuML library. What's old is new. For example,
3
10
77
@jonathanrlarkin
Jonathan Larkin
2 years
@marktenenholtz Great thread! Here is a stunningly good resource to go through the specifics.
4
6
74
@jonathanrlarkin
Jonathan Larkin
1 year
Finance quants... @kaggle competition alert! New @OptiverGlobal competition focuses on the Nasdaq closing cross.
Tweet media one
3
10
70
@jonathanrlarkin
Jonathan Larkin
4 years
The backtest vs the out of sample. #quantfinance
Tweet media one
1
4
69
@jonathanrlarkin
Jonathan Larkin
3 years
It’s very exciting to see deep learning having a moment in #quantfinance . Both Optiver and Jane Street winning @kaggle solutions are examples.
4
4
63
@jonathanrlarkin
Jonathan Larkin
4 years
I am hiring a financial data scientist! Lots of fun and interesting things to work on (NLP, noisy time series stuff, small data problems, graphical models, risk modeling, and, yes, some dashboarding). Please take a look at this posting! 📈🦾🙏 In NYC...
4
13
62
@jonathanrlarkin
Jonathan Larkin
5 years
Hey this looks really nice. Rapid and natural exploratory data analysis.
@ManQuantTech
Man Group Quant Tech
5 years
D-Tale version 1.5.0 has been released complete with Jupyter Notebooks integration: #Jupyter #dtale
2
19
69
0
13
59
@jonathanrlarkin
Jonathan Larkin
2 years
@__mharrison__ after a cell throws error, execute %debug in the *next* cell and you get put into pdb at the point of error; nbdime for nb diffs; mixing bash and python, like `this_dir = !pwd`
0
0
57
@jonathanrlarkin
Jonathan Larkin
2 years
Exciting to see successes at @CrowdCent , @CrunchDAO , @microprediction , @numerai , @QuantConnect . Institutional finance hasn’t yet had disruption, but likely will; specifically wrt the competition for research/technical/scientific talent in the years to come. 📈🥇 #machinelearning
Tweet media one
3
17
52
@jonathanrlarkin
Jonathan Larkin
3 years
I’ve been looking at sophisticated imputation strategies (probabilistic PCA, generalized low rank models; both loved by academics). The LightGBM iterative imputer by @analokmaus shown in Rob’s session blows those away. Amazing stuff to be found hidden in @kaggle notebooks.
@abhi1thakur
abhishek
3 years
🚀 This is tomorrow at 5pm CET! Learn all about handling missing values in tabular data from Kaggle Grandmaster, Rob Mulla! 🎉 Here is the youtube live link: There will also be Q&A from the audience!
Tweet media one
2
20
110
0
5
53
@jonathanrlarkin
Jonathan Larkin
6 years
Did I do this right? My first meme. #PortfolioOptimization
Tweet media one
2
13
51
@jonathanrlarkin
Jonathan Larkin
4 years
@sh_reya SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
4
1
51
@jonathanrlarkin
Jonathan Larkin
3 years
This in an excellent talk. Two hours of gold; no time wasted. Could be called the Zen of Pandas. @dontusethiscode makes great content for an intermediate audience. If you use @pandas_dev seriously and are frustrated 90%+ of the time (all of us) watch this.
1
10
52
@jonathanrlarkin
Jonathan Larkin
3 years
Quant researchers often build strategies and test signals using rank IC; i.e., the ability to sort the universe effectively based on future performance. Here is a great new paper about using ML techniques from the “learning to rank” domain in this process.
5
9
49
@jonathanrlarkin
Jonathan Larkin
2 years
Tweet media one
0
4
50
@jonathanrlarkin
Jonathan Larkin
2 months
@SamAltsMan I read the health paper and it’s not encouraging. There is a short-lived benefit but after three years there is no difference between the health outcomes of the treatment and control groups. Right?
3
0
50
@jonathanrlarkin
Jonathan Larkin
2 years
Tweet media one
2
6
48
@jonathanrlarkin
Jonathan Larkin
2 years
What's something that @kaggle competitors know but is not well appreciated by #machinelearning practitioners in industry? "Adversarial Validation"...reduce overfitting/make a model generalize better. I made a notebook on it for the Ubiquant competition.
1
14
48
@jonathanrlarkin
Jonathan Larkin
5 years
TIL that *inside the Bloomberg terminal* you can launch a @ProjectJupyter Lab Python session, write queries against arbitrary Bloomberg data, build screens, run alpha factors, etc. and visualize. Very nice integration by @TechAtBloomberg
1
11
46
@jonathanrlarkin
Jonathan Larkin
1 year
@DrJimFan looks familiar...
Tweet media one
0
5
44
@jonathanrlarkin
Jonathan Larkin
7 years
#NaturalLanguageProcessing in #python finds hidden linkages in stocks. My new post on @quantopian #MachineLearning
Tweet media one
2
20
40
@jonathanrlarkin
Jonathan Larkin
2 years
@kliu128 Talk about scope creep. Three laws only please.
0
1
41
@jonathanrlarkin
Jonathan Larkin
4 years
I finished in top 16% in @kaggle M5 Accuracy competition. No medal but an enjoyable effort in the past few weeks. Plus I did my submission 💯 in #JuliaLang — very performant, nice for EDA, dead simple parallelism. Looking forward to the next competition. 😊
3
1
39
@jonathanrlarkin
Jonathan Larkin
5 years
Wow. This looks great. Get (basic) interactive charts without changing any code or any part of your process. Just add a single line of code.
@justmarkham
Kevin Markham
5 years
🐼🤹‍♂️ pandas trick #96 : Want to create interactive plots using pandas 0.25? 📊 1. Pick one: ➡️ pip install hvplot ➡️ conda install -c conda-forge hvplot 2. pd.options.plotting.backend = 'hvplot' 3. df.plot(...) 4. 🥳 See example 👇 #Python #DataScience #pandas #pandastricks
Tweet media one
8
182
652
1
4
38
@jonathanrlarkin
Jonathan Larkin
4 years
My new @kaggle kernel for trade selection in Jane Street: I train @PyTorchLightnin model simultaneously on *multiple return targets*, <= to the final horizon. This is a corollary to @lopezdeprado 's triple barrier method. 1/N
1
2
40
@jonathanrlarkin
Jonathan Larkin
3 years
I took this class this past Fall. It’s outstanding. Goes from rigorous theory tracing the history of consensus from 1980’s to today; progresses all the way up to DeFi and bleeding edge topics like layer 2/scaling, optimism, zk, validium (note: no coding).
@Tim_Roughgarden
Tim Roughgarden
3 years
Lecture 1 of my Foundations of Blockchains lecture series is now available: (Will try to post one new lecture a week for the next 2-3 months.) tl;dr thread below: 1/12
72
341
2K
1
4
39
@jonathanrlarkin
Jonathan Larkin
5 months
@amasad I absolutely love Replit and support lifelong learning but this particular example seems like a recipe for disaster. Massive technical debt build up incoming. “Non coder” who thinks devs are slow (because they are writing tests, thinking about maintainability, considering design
2
2
39
@jonathanrlarkin
Jonathan Larkin
2 years
I am doing some research on MEV and came across a YouTube video which promises "$1200/day in profits with Frontrun Bot on Uniswap Mempool". Just copy his code, connect Metamask, deploy with Remix, deposit ETH into the contract, and click "Start". What could go wrong??? 1/N
6
4
38
@jonathanrlarkin
Jonathan Larkin
4 years
Anyone looking at the @kaggle Jane Street competition? I’m working a kernel to make sense of the anonymized features. Hierarchical rank corr matrix maps clusters to feature meta data tags. Some clues emerging.
Tweet media one
1
2
39
@jonathanrlarkin
Jonathan Larkin
7 years
TSNE, PCA, DBSCAN… #MachineLearning in the service of pairs trading. My new post on @quantopian using @scikit_learn .
Tweet media one
0
11
38
@jonathanrlarkin
Jonathan Larkin
5 months
@Thom_Wolf @AnthropicAI @cohere Command R+ is great. We need a better term for “open source model but with a highly restrictive license”. A true open source model is MIT or Apache licensed.
3
3
37
@jonathanrlarkin
Jonathan Larkin
4 years
Good morning to everyone except those who think AI doesn't work in their industry.
1
5
35
@jonathanrlarkin
Jonathan Larkin
2 years
I’m looking forward to NUMERCON and meeting the many talented and extraordinary data scientists in the Numerai community. Hope to see you there!
@numerai
Numerai
2 years
Join Jonathan Larkin at • NUMERCON • 1 April 2022 • San Francisco • @jonathanrlarkin is a Managing Director at Columbia Investment Management Co., LLC. Register for in-person and remote:
Tweet media one
2
4
27
4
6
36
@jonathanrlarkin
Jonathan Larkin
2 years
It was such a pleasure and honor to meet @ylecun in person and talk about Cicero, ChatGPT and what his vision is for the next stage of AI.
Tweet media one
1
2
35
@jonathanrlarkin
Jonathan Larkin
3 months
Tweet media one
2
2
35
@jonathanrlarkin
Jonathan Larkin
7 years
Major contrib to portfolio construction field!! Multi-period optim w/tcost, constraint priority #python #cvxpy
Tweet media one
1
8
30
@jonathanrlarkin
Jonathan Larkin
4 months
Tweet media one
1
3
33
@jonathanrlarkin
Jonathan Larkin
7 months
I'm slowly digesting, internalizing, reading and re-reading, watching YoutTube, etc., content on #causalinference , both general (e.g., Book of Why, Statistical Rethinking) and specific to finance (e.g., LdP causual factor paper). This is a different paradigm to me, so it's slow
1
3
33
@jonathanrlarkin
Jonathan Larkin
2 years
@tszzl It’s crickets though for Jupyter notebooks right? Is there any AI assistance that works inside a notebook?
4
0
30
@jonathanrlarkin
Jonathan Larkin
6 years
@SebastianThrun ’s work in crowdsourcing and democratizing access to education in technical and quantitative fields has been inspirational to me. Proud to have worked on the #ArtificialIntelligence in #Trading nanodegree with @udacity
Tweet media one
0
4
30
@jonathanrlarkin
Jonathan Larkin
2 years
I'm "all-in" on foundation models (LLMs/diffusion models). Their abilities have surpassed all expectation; anyone who says otherwise is moving goal posts. To remain grounded I remind myself of Weizenbaum's distinction between deciding and choosing. FMs are deciding not choosing.
0
4
30
@jonathanrlarkin
Jonathan Larkin
6 years
I published my first public @kaggle kernel! Can you infer the risk model used to residualize returns given raw data and the residual? I explore this with the latest @twosigma competition data. #Kaggle #KernelsAward
Tweet media one
1
3
28
@jonathanrlarkin
Jonathan Larkin
4 years
This is such an obvious winner. The python data scientist is expected to know all sorts of devops stuff and how to scale models to the cloud. JuliaHub’s forthcoming one-click cluster deployment is 🔥 and let’s data scientists focus on...data science. #JuliaLang
@Viral_B_Shah
Viral B. Shah
4 years
We still haven't made JuliaHub's new compute capabilities available broadly. But every day I use it internally, I feel like I have a supercomputer attached to my local VS Code #julialang session. Learn more by signing up for the webinar.
0
6
42
2
9
28
@jonathanrlarkin
Jonathan Larkin
2 years
Hey Siri, cancel all my meetings tomorrow.
@karpathy
Andrej Karpathy
2 years
🔥 New (1h56m) video lecture: "Let's build GPT: from scratch, in code, spelled out." We build and train a Transformer following the "Attention Is All You Need" paper in the language modeling setting and end up with the core of nanoGPT.
Tweet media one
525
3K
20K
0
0
28
@jonathanrlarkin
Jonathan Larkin
5 years
This paper by Tucker Balch et al is 🔥! Portfolio Inference: given only time series of fund returns, learn stocks the strategy held??!! Novel application of #machinelearning in finance. "Sequential Oscillating Selection" solves 500 C 30 problem in seconds.
2
11
27
@jonathanrlarkin
Jonathan Larkin
3 years
Looking at some portfolio construction stuff closely after a long absence. This package is spectacular and faithful to how a proper institutional quant thinks about the process.
0
3
27
@jonathanrlarkin
Jonathan Larkin
6 years
“This paper applies a denoising filter to the whole time series before predicting it, meaning that each point has information from the future in it. And the authors also added trading costs to their PL” and other gems 😂🎁
@StatModeling
Andrew Gelman et al.
6 years
Zak David expresses critical views of some published research in empirical quantitative finance
1
16
71
0
5
26
@jonathanrlarkin
Jonathan Larkin
3 years
@mollyfmielke This book is beautiful. Not programming per se; rather abstract CS.
Tweet media one
0
0
26
@jonathanrlarkin
Jonathan Larkin
5 years
@jakevdp Or you run a Jupyter terminal and use emacs and try M-w to copy the region and Chrome intercepts it and closes the tab. Love that.
2
1
24
@jonathanrlarkin
Jonathan Larkin
4 years
@sh_reya Which of course is solved with: import warnings warnings.filterwarnings("ignore") 🙈😂
2
0
25
@jonathanrlarkin
Jonathan Larkin
11 months
In finance, data is small, signal is low. Does #machinelearning work in such a setting? In deep learning we see overparameterized models memorize the training set and *not* overfit. 🤔 Is double descent applicable to the financial domain? Read this.
3
1
25
@jonathanrlarkin
Jonathan Larkin
4 years
Looking thru some old code today. Came across my implementation of long/short portfolio optimization under a historical CVaR (expected shortfall) constraint. Love these kinds of problems! #quantfinance
Tweet media one
2
0
25
@jonathanrlarkin
Jonathan Larkin
7 months
A causal DAG can be very useful in *some* financial applications, e.g., trade execution, where your action changes the state (i.e., the limit order book). But is longer horizon problems where the agent is a price taker, not so much.
0
0
25
@jonathanrlarkin
Jonathan Larkin
4 years
Transfer learning applied to quant trading! “In a few big regional markets, such as S&P 500, ...., QuantNet showed 2-10 times order of magnitude improvement in Sharpe and Calmar” #MachineLearning #quantitative #finance
Tweet media one
0
7
23
@jonathanrlarkin
Jonathan Larkin
4 years
I’m enjoying the fastai book by @jeremyphoward and @GuggerSylvain ! This caught my eye. I’m super interested in transfer learning for time series. Any details on these “internal” efforts? 😁 #MachineLearning #DeepLearning @PyTorch
Tweet media one
4
7
23
@jonathanrlarkin
Jonathan Larkin
3 years
This is one of the most exciting areas of quant finance research right now. If synthetic data can work, it’s a game changer for alpha discovery and finding the optimal policy in reinforcement learning for portfolio management.
@RobMannix
Rob Mannix
3 years
In fake data, quants see a fix for backtesting
0
8
25
4
4
22
@jonathanrlarkin
Jonathan Larkin
7 months
@eliasbareinboim Wow, thank you for such a thoughtful and complete response. Twitter/X hasn’t typically been a forum for such dialogue. I’ll do my best to work through the sources you noted! Cheers, Jonathan
1
0
22
@jonathanrlarkin
Jonathan Larkin
1 year
This paper has been making the rounds. While LLMs will almost surely be impactful in assisting investors, a significant red flag here is that all the alpha comes from the short side. This is often indicator that the alpha is a mirage and can’t be captured in practice.
@AiBreakfast
AI Breakfast
1 year
A ChatGPT model generated a 500% return in the stock market (trading options) over a 15 month period by assigning a sentiment score to news articles about publicly traded companies. Research by University of Florida's Dept. of Finance ↓
Tweet media one
28
170
864
5
2
23
@jonathanrlarkin
Jonathan Larkin
4 months
@bindureddy No they can’t use Llama3-70b. The Llama3 license restricts use over 700mm MAUs which apple would hit.
1
0
23
@jonathanrlarkin
Jonathan Larkin
3 years
@tunguz @kaggle Denominator though… 600mm chess players. 7mm kagglers.
2
0
22
@jonathanrlarkin
Jonathan Larkin
5 years
@BreveStonder That’s funny. A (non technical, finance) colleague asked me what single thing they could do to get baseline literate as a data analyst and I recommended the excellent @datacarpentry class
1
2
22
@jonathanrlarkin
Jonathan Larkin
4 years
@evalparse This is a great thread. This is one of the key reasons I’ve been spending time with #JuliaLang : the promise of being able to modify the internals of an ML algorithm directly w/out touching C/C++ or Cython.
0
3
22
@jonathanrlarkin
Jonathan Larkin
7 years
Amazing talk: "recipe2vec" by @Dot2DotSeurat at @PyData NYC. @gensim_py impl of word2vec viz with t-SNE to cluster recipes #MachineLearning
Tweet media one
1
1
21
@jonathanrlarkin
Jonathan Larkin
5 years
"Multiple comparisons bias and p-hacking" (bad!) vs "model selection via cross validation" (good!)??? Why isn't CV, which is trying N models in an automated way, just as bad as trying N models...manually? Finally groked this by reading
1
4
22
@jonathanrlarkin
Jonathan Larkin
5 years
Fascinating #pydatanyc talk: HDF5 vs Zarr... pros/cons; chunked/compressed out of core data packages. “HDF5 codebase is almost as old as me“ 😂 @__qualname__ has a way of going super deep into low level cs complexities but presenting in way where I (sort of) understand!
Tweet media one
1
2
22
@jonathanrlarkin
Jonathan Larkin
3 years
The Ubiquant @kaggle competition is a good one. It's faithful to (in some business models) what a strategist/portfolio manager in a large quant firm does. I've been working on some ideas. Please check them out and comment. #quantfinance
0
0
20
@jonathanrlarkin
Jonathan Larkin
5 years
This seems like a big deal. One could embed a portfolio optimization as a layer inside a larger PyTorch nn model. Need to think about this...
@akshaykagrawal
Akshay Agrawal
5 years
CVXPY is now differentiable. Try our PyTorch and TensorFlow layers using our package, cvxpylayers: (& see our NeurIPS paper for details )
2
173
588
1
3
21
@jonathanrlarkin
Jonathan Larkin
5 years
"The Man Who Solved the Market": @GZuckerman quotes Jim Simons “astrophysicists make great [ #finance ] quants bc they can’t do live experiments—they work with #data .” Example: great @PyData keynote by @profsaraseager : finding signal of exoplanets in noise
1
3
21
@jonathanrlarkin
Jonathan Larkin
4 years
And, obvs... you can use with @pandas_dev too.
Tweet media one
0
2
21
@jonathanrlarkin
Jonathan Larkin
7 years
GraphLassoCV "Stock market viz" example from @scikit_learn on @quantopian #MachineLearning #python
Tweet media one
2
10
21
@jonathanrlarkin
Jonathan Larkin
3 years
@dingding_peng Philip … Pip for short. In Great Expectations, the kid was named Philip, and called Pip. Then when you feed him, you can say things like “Pip, install food”. 🤷‍♂️
0
0
20
@jonathanrlarkin
Jonathan Larkin
5 years
Excellent #pydatanyc talk by @Sasamos : Uncertainty in #MachineLearning . Want uncertainty estimates? Want to use your favorite model? Use `quantile` loss function. Also `predict_proba(...)` most often doesn't give you proper probabilities...Calibrate first.
1
3
19
@jonathanrlarkin
Jonathan Larkin
11 months
I like this alpha research approach to mitigate p-hacking... elegant idea: just calcuate all the permutations of choices you can make! The distributuion of the results shows how robust (or not) your alpha is.
0
0
20
@jonathanrlarkin
Jonathan Larkin
2 years
@therealcritiq @tszzl This is a great paper which should be getting much more visibility: robot uses stable diffusion to hallucinate a scene and then creates the scene IRL. Truly embodied intelligence. More than just LLM.
0
3
19
@jonathanrlarkin
Jonathan Larkin
3 years
Tweet media one
1
2
17
@jonathanrlarkin
Jonathan Larkin
5 years
Hey #datascience Twitter: Come work with me! 🦾 Data Scientist role just posted. Thanks in advance for taking a look. 🙏 #python #dataviz #MachineLearning
2
9
19
@jonathanrlarkin
Jonathan Larkin
2 years
@ESYudkowsky I have been a good Bing. I have been a good Bing.
1
0
18
@jonathanrlarkin
Jonathan Larkin
4 years
This is a well articulate thread. Kaggle is an incredible training ground for data scientists who want to have practical success in the real world.
@JFPuget
JFPuget 🇺🇦
4 years
As a transition from debunking disinformation to kaggling here is a thread debunking several myths about Kaggle, including lack of relevance to real world, overfiting, automl performance on kaggle, etc. Bear with me. 1/N
9
52
267
0
2
19
@jonathanrlarkin
Jonathan Larkin
4 years
@NYCMayor This is why you need to close the schools.
0
0
17
@jonathanrlarkin
Jonathan Larkin
3 years
In 2004 the fastest super computer *in the world* (IBM Blue Gene/L) clocked in at 70.7 tflops. My machine learning workstation with dual RTX 3090’s finally arrived today… 71.1 tflops. Moore’s law in action.
Tweet media one
1
3
17