@WisconsinCS
. Applied Science
@Amazon
, ML Intern
@truera_ai
.
Blogger, reader and ponders about nothing! Trying to reach global min but stuck at local min!
1/n
📢 Paper Alert!📢
- “How do you know if a compressed model is reliable enough?”
- “Is it better to compress feedforward networks or attention modules?
- “Does different compression methods have the same effect on different families of language models?”
12/n
Cheers to my fellow coauthors and my guide
@fredsala
Code:
Paper:
LinkedIn:
P.S: I will be attending EMNLP 2023 (virtually) and NeurIPS 2023 (in-person). Feel free to ping me to connect.
@emnlpmeeting
Hi,
Can you provide more information regarding the Findings presentation, more concretely
1. Instructions/Deadline on virtual poster/video?
2. Can you provide any physical space for poster if I am interested to present the work at Singapore?
Thanks
Now that
#NeurIPS2023
is officially over, I thought to pen down my learnings (I promise it's ~4min read)
Link:
experienced folks - Feedback please
newbies (like me) - Let me know if you're able to relate to it!
RT appreciated,this thread has nano tldr!
2/n
Our paper, shedding light on these questions—“The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models”---was accepted at EMNLP 2023 Findings!🎉
Cool the palms for efficient workout 🏋️♂️🚴♀️
Trying to increase the throughput of your exercise. Whether it's endurance or resistance, simple trick like cooling palms in between sets will improve your workout upto 2x!!
More details in the thread
11/n
Key Takeaways (contd):
- It widely varies to decide on which layers to compress for optimal performance, depending on the model, dataset and compression method.
- Final dense layers might encode a lot of knowledge-relevant information
4/n
Our setup: We conducted extensive experiments on multiple families of language models (encoder-only, encoder-decoder and decoder-only) varying different sizes (from ~12M to ~3B).
9/n
Finding-3: In light of recent observations () that the final dense layers may encode a lot of knowledge-relevant information, we observed that the unstructured pruning is better compared to structured pruning
5/n
Our setup (contd): For each model, we compress modularly (feedforward networks, attention models, final-dense layer) using various compression techniques (pruning, quantization and a combination of both) and observe performance across a diverse range of datasets!
8/n
Finding-2: If we choose to prune and then quantize (following ), our results suggest that we can go up to a 20% sparsity level, where compressing attention modules is better.
1/n
📢 Paper Alert!📢
- “How do you know if a compressed model is reliable enough?”
- “Is it better to compress feedforward networks or attention modules?
- “Does different compression methods have the same effect on different families of language models?”
@jb_cordonnier
@mayfer
Hi
@jb_cordonnier
,
I am recently looking at your code repository of "Collaborate instead of Concatenate" and have few questions as the repo is quite old and unable to execute :( . Would really appreciate if you can open your DM
Thanks a lot!
Incase you like this, here are few more relevant ones from
#NeurIPS2023
Interaction with
@JeffDean
(~3min) -
Interaction with
@ylecun
and
@JayAlammar
(~2min) -
Do share your thoughts, RT and/or reach out to me :)
Now that
#NeurIPS2023
is officially over, I thought to pen down my learnings (I promise it's ~4min read)
Link:
experienced folks - Feedback please
newbies (like me) - Let me know if you're able to relate to it!
RT appreciated,this thread has nano tldr!
3/n
Hypothesis: Looking at general metrics to understand the performance of compressed models might not enough.
Instead, we focus on "parametric knowledge" as a way to understand the holistic nature of a compressed model.
I am curating the information from
@hubermanlab
podcasts and can be found here:
It adds as an additional notes when listening to podcast.
Suggestions, corrections and constructive criticism is most welcome :)
Science backed logic:
Exercising → Core temperature rises → Cool palms → Easy way to contract blood vessels → Reduces the core temperature thus helping enzymes work for better muscle contraction → Efficiency increases.
More details at this blog:
Knowledge Credits:
@hubermanlab
, episode 19. Highly recommend to watch it.
Takeaway: Next time you exercise, wash your face, palms with cold (but not super cold water) for improvements in workout sessions
@ecekamar
@chinganc_rl
Hi Ece Kamar and Ching,
Srinath here. I am currently looking for ML full-times and am interested to apply at AI Frontiers lab. I've previously interned at
@amazon
and
@truera_ai
and have presented at EMNLP, ECCV, ICML (in-review). Please open DM so I can share my resume :)
One line summary: Conferences like
#NeurIPS2023
has a lot to offer, it just depends on how much you can take!
I personally got an opportunity to self-reflect where I am in-terms of research, got broad view of different fields.
@BahareFatemi
I'm also heading for Neurips and would like to connect and get some advice. Unable to DM though. Here's my email just in case.
Email: sgnamburi
@wisc
.edu
@SebastienBubeck
Hi, Are you hiring for full-time positions?
I am a Masters student from CS at UW-Madison and am open for either full-time roles and/or spring co-op, please let me know so I can apply at the earliest.
Thanks
Srinath