Nathan Cloos Profile
Nathan Cloos

@nacloos

Followers
381
Following
165
Statuses
49

PhD student at MIT BCS

Joined April 2014
Don't wanna be here? Send us removal request.
@nacloos
Nathan Cloos
7 months
Can LLMs play the game Baba Is You?šŸ§© In our new @icmlconf workshop paper, we show GPT-4o and Gemini-1.5-Pro fail dramatically in environments where both objects and rules must be manipulated! Here is an example of correct gameplay: (1/n)
22
83
451
@nacloos
Nathan Cloos
15 days
RT @PaglieriDavide: DeepSeek performed well where short term reasoning and planning are key. šŸ§©CoT traces showed strong intuitive reasoninā€¦
0
1
0
@nacloos
Nathan Cloos
2 months
Our package aims at being exhaustive. If your implementation is missing, checkout our GitHub to add your similarity measures! Paper: GitHub: Work with @GuangyuRobert and Chris Cueva. (6/6)
0
1
5
@nacloos
Nathan Cloos
4 months
Comparing similarity scores across studies is hard. To make it easier, we are developing a Python package that benchmarks and standardizes existing similarity measures. Package: Paper: (11/10)
0
0
12
@nacloos
Nathan Cloos
4 months
For linear regression, the results depend heavily on the choice of hyperparameters. Even when cross-validated and regularized, high linear regression scores may not always guarantee encoding task relevant information! (7/10)
Tweet media one
1
2
6
@nacloos
Nathan Cloos
4 months
What is a good value for a similarity score? There is no absolute answer! An angular Procrustes score above 0.5 may be a good score for the Mante 2013 dataset but a score above 0.8 is required for the Siegel 2015 dataset. And this also depends on the similarity measure. (6/10)
1
1
5
@nacloos
Nathan Cloos
4 months
Next we apply our optimization method to five neural datasets and discover that a high similarity score does not guarantee that models encode task-relevant information in a manner consistent with neural data! (5/10)
Tweet media one
1
3
10
@nacloos
Nathan Cloos
4 months
We mathematically derive the sensitivity of CKA, angular Procrustes, and Normalized Bures Similarity to the variance of principal component dimensions, and explain the dependence CKA shows to high variance components. (4/10)
Tweet media one
1
1
7
@nacloos
Nathan Cloos
4 months
In this animation, random noise datasets are optimized to maximize different similarity measures (starting with an initial value of 0 to a maximum near 1). What do we learn? CKA can be near its maximum value even when only the first principal component is captured! (3/10)
1
2
15
@nacloos
Nathan Cloos
4 months
We identify what drives high similarity scores by differentiating through similarity measures to directly maximize the score (see animation above). (2/10)
1
0
3
@nacloos
Nathan Cloos
6 months
RT @Ansh_soni1234: To ask how similar the brain is to a neural network we need a similarity metric. In a new paper I asked how much the metā€¦
0
55
0
@nacloos
Nathan Cloos
7 months
@generatorman_ai @icmlconf Sure I agree the real game can get really hard but the environments we selected for our benchmark are pretty accessible.
0
0
2