muhtasham @Muhtasham9 profile

muhtasham

@Muhtasham9

Followers

1,359

Following

849

Media

232

Statuses

1,628

In my pre-training years

https://t.co/R9QYS6Iwzm

Latent Space

Joined March 2020

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Chan • 301874 Tweets

Messi • 224087 Tweets

SURPRISE FROM BECKY • 171399 Tweets

Kalafina • 131276 Tweets

ROADRIDER X LINEMAN • 97725 Tweets

#नवरात्रि • 44443 Tweets

BABYBOSS YINYIN DAY • 42729 Tweets

#Navratri2024 • 37684 Tweets

A. Luxury • 32846 Tweets

梶浦さん • 29569 Tweets

Maa Durga • 28408 Tweets

#もうすぐ三角チョコパイの季節 • 27389 Tweets

マナー講師 • 22738 Tweets

渋沢栄一 • 22719 Tweets

マナー違反 • 20251 Tweets

梶浦由記 • 16422 Tweets

शक्ति उपासना • 16342 Tweets

おーちゃん • 14159 Tweets

WIN AMAZING EMBASSY • 11644 Tweets

जगत जननी • 10032 Tweets

訓戒処分

갑타 팝업

A. Royal Bronze

事実上更迭

Goddess Durga

山口達也

クエンシー

ドラけし

地蔵くん

新マナー

A. 5500mAh

船長のライブ

偽投稿拡散

新1万円札

महाराजा अग्रसेन

パドレス

The Bengal Chapter

ジーコサッカー

沖縄独立

如月アテンション

MAGSINADYA KAY JOSH

カプくじ

Feiertag

フィリップ

木村ミノル

YA-MANと試合予定

ミノル容疑者

大麻取締法違反容疑

#メークアップコレクション2024

#キミパイ

Last Seen Profiles

@pptmkzu

@binance_staking

@Lourdesschaf

@RodandRoseMusic

@jenniemeusol

@cecig_s

@turk_ifsa2019

@adribangtancafe

@elis1310

@maloiyvest

@justehabiller

@conotherapy

@lexi_est_2020

@OfficialHiFive

@CarricoReid

@ERTIberia

@hasad1234

@notbytherain

@the_2024

@LittleDeerVidi

Pinned Tweet

muhtasham

@Muhtasham9

1 year

w boss

3

1

68

muhtasham

@Muhtasham9

6 months

A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)

muhtasham

@Muhtasham9

2 years

Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.

0

2

13

7

138

887

muhtasham

@Muhtasham9

9 months

Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this

9

33

261

muhtasham

@Muhtasham9

2 years

Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…

This link will take you to a page that’s not on LinkedIn

lnkd.in

1

41

238

muhtasham

@Muhtasham9

2 years

Meta: Multi-tasking while reading about Multi-task NLP models

3

10

130

muhtasham

@Muhtasham9

7 months

StarCoder2 running on M2 8GB

1

7

91

muhtasham

@Muhtasham9

7 months

DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4?

Stealing Part of a Production Language Model

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our...

arxiv.org

7

5

79

muhtasham

@Muhtasham9

2 years

@_jasonwei @arankomatsuzaki Might contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says

NLP's Clever Hans Moment has Arrived

A review of Timothy Niven and Hung-Yu Kao, 2019: Probing Neural Network Comprehension of Natural Language Arguments

thegradient.pub

2

5

71

muhtasham

@Muhtasham9

1 year

@Mascobot

muhtasham

@Muhtasham9

1 year

🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate

5

17

60

1

2

65

muhtasham

@Muhtasham9

1 year

🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate

5

17

60

muhtasham

@Muhtasham9

7 months

The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --model

mlx-community (MLX Community)

huggingface.co

BigCode

@BigCodeProject

7 months

Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open!

13

192

675

2

13

56

muhtasham

@Muhtasham9

8 months

Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev

2

4

35

muhtasham

@Muhtasham9

2 years

@tszzl Here is PDF by @amasad

1

34

muhtasham

@Muhtasham9

1 year

If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.

Machine Learners Guide to Real World - 🌉 A Deep Dive into the LLM Bootcamp Experience: Revolutio...

muhtasham.github.io

0

10

33

muhtasham

@Muhtasham9

8 months

🚀Now supports real-time streaming

muhtasham

@Muhtasham9

8 months

Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev

2

4

35

2

7

31

muhtasham

@Muhtasham9

2 years

Let's see how different LM's multiply matrices / think 💭 using this Space GPT-J-6B i see what you did there👀 Built using amazing @Gradio Blocks 🧱 APIs, also you can use new @huggingface 🤗 Community Tab to make suggestions and collaborate

Aran Komatsuzaki

@arankomatsuzaki

2 years

Large Language Models are Zero-Shot Reasoners Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.

59

566

2K

2

11

28

muhtasham

@Muhtasham9

2 years

Ultimate comeback

0

4

31

muhtasham

@Muhtasham9

6 months

Using the example of the language model (i.e. decoder-only) LLaMa-2, let’s look at the main major architectural improvements for LLM: — Post LayerNorm → Pre LayerNorm (). This makes the convergence more stable. Now the process goes in such a way that the

On Layer Normalization in the Transformer Architecture

The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be...

arxiv.org

1

0

27

muhtasham

@Muhtasham9

1 year

📢 Just published: How traditional OS concepts like Branch Prediction & Virtual Memory Paging shape today's Large Language Models ( #LLMs ). LLMs = CPUs of early computing? Feedback welcome! 🔗

Machine Learners Guide to Real World - 2️⃣ Concepts from Operating Systems That Found Their Way in...

muhtasham.github.io

0

3

28

muhtasham

@Muhtasham9

6 months

— Absolute position embedding → RoPE (). The method itself is that we rotate the token embeddings by an angle depending on the position. And it works well. In addition, the method opened up a number of modifications to expand the context to very large

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In...

arxiv.org

1

0

23

muhtasham

@Muhtasham9

2 years

Your car gathers a shocking amount of data about you, which you don’t get to see, and the manufacturer sells that to third parties, who use it in ways that are counter to your interests.

Your car knows too much about you. That could be a privacy nightmare.

Modern cars collect a lot of data on their drivers.

mashable.com

0

19

28

muhtasham

@Muhtasham9

7 months

"Flops are cheap, bandwidth is adding more pins, and latency is physics. Deal with it. "

1

6

23

muhtasham

@Muhtasham9

11 months

@vboykis He deployed on Friday

0

1

26

muhtasham

@Muhtasham9

1 year

@alex_valaitis @MosaicML Was going to skip but, not correct! @MosaicML is not open source LLM startup, its platform, and don’t sleep on them yet, they just released this today, 2x context length of LLAMA!

Announcing MPT-7B-8K: 8K Context Length for Document Understanding | Databricks Blog

Today, we are releasing MPT-7B-8K, a 7B parameter open-source LLM with 8k context length trained with the MosaicML platform. MPT-7B-8K was pretrained starting from the MPT-7B checkpoint in 3 days on...

www.databricks.com

1

2

24

muhtasham

@Muhtasham9

6 months

— ReLU activation → SwiGLU (). Gated Linear Units (a family of methods to which SwiGLU belongs. It adds the operation of element-wise multiplication of matrices, one of which has passed through the sigmoid and thus controls the intensity of the signal

GLU Variants Improve Transformer

Gated Linear Units (arXiv:1612.08083) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function. Variations on GLU are possible,...

arxiv.org

1

0

21

muhtasham

@Muhtasham9

2 years

When your model is training and you see live footage of forward and back prop via @weights_biases

0

4

21

muhtasham

@Muhtasham9

1 year

@CisLmu researcher distilling latest paper about instruction tuning

1

4

20

muhtasham

@Muhtasham9

6 months

Attention modifications (), for example, using one K-V pair of matrices per group of Q matrices at once. This improvement mainly already affects the optimization of inference. But there are also a huge number of methods aimed at reducing the quadratic

GQA: Training Generalized Multi-Query Transformer Models from...

Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to...

arxiv.org

2

19

muhtasham

@Muhtasham9

6 months

Except it’s called AI engineering now Come to @aiDotEngineer conf to learn more

vicki

@vboykis

6 months

2013 — 2023: you were hired to do machine learning but do data engineering 2023 — : you were hired to do machine learning but do web dev

20

35

762

3

2

21

muhtasham

@Muhtasham9

9 months

Eugene's blog:

Evaluation & Hallucination Detection for Abstractive Summaries

Reference, context, and preference-based metrics, self-consistency, and catching hallucinations.

eugeneyan.com

0

20

muhtasham

@Muhtasham9

1 year

Burning some gpus after first @LangChainAI meetup in Munich

1

3

18

muhtasham

@Muhtasham9

4 years

New SOTA on BCI SSVEP spellers. Our new DNN achieves impressive information transfer rates (ITR) with only 0.4 seconds of stimulation: 265.23 bits/min on the benchmark and 196.59 bits/min on BETA dataset. Paper: Code: #bci #ssvep

3

1

14

muhtasham

@Muhtasham9

2 years

the amount of details one can get from @weights_biases is absolutely electric 💥

0

2

16

muhtasham

@Muhtasham9

1 year

All started with GPT2 moment, but only last week trained internal model and it did good, but fine-tuning made 50% better. @amasad

1

3

17

muhtasham

@Muhtasham9

1 year

Thanks for putting this together @nathanbenaich and @NotionHQ

1

3

17

muhtasham

@Muhtasham9

1 year

Full house 🦜 @full_stack_dl

0

1

16

muhtasham

@Muhtasham9

7 months

MLX weights below

mlx-community/zephyr-7b-gemma-v0.1-4bit · Hugging Face

huggingface.co

Lewis Tunstall

@_lewtun

7 months

Happy to share the latest Zephyr recipe based on @Google 's Gemma 7B 🔷🔶! Outperforms Gemma 7B Instruct on MT Bench & AGIEval, showing the potential of RLAIF to align this series of base models 💪 🧑‍🍳 I hope this recipe enables the community to create many more fine-tunes!

3

40

162

0

3

14

muhtasham

@Muhtasham9

6 months

“there's a graveyard of ideas around attention” @TrentonBricken

0

3

13

muhtasham

@Muhtasham9

6 months

LayerNorm → RMSNorm (). RMSNorm is computationally simpler, but works with the same quality. (5/6)

Root Mean Square Layer Normalization

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling...

arxiv.org

1

0

15

muhtasham

@Muhtasham9

10 months

@lvwerra Yay congrats also got recently promoted to Sr Random Seed Engineer

1

0

15

muhtasham

@Muhtasham9

2 years

“The thing that determines whether you’re the product isn’t whether you’re paying for the product: it’s whether market power and regulatory forbearance allow the company to get away with selling you.” — @doctorow

1

9

14

muhtasham

@Muhtasham9

1 year

@saahil addressing industry's challenges in scaling MLOps in multimodal settings

0

3

12

muhtasham

@Muhtasham9

7 months

Spotted GPT-5 in the wild

0

1

14

muhtasham

@Muhtasham9

8 months

@swyx Shameless plug but this would make it easier to compare

muhtasham

@Muhtasham9

9 months

Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this

9

33

261

1

0

12

muhtasham

@Muhtasham9

6 months

machine learning is low-precision linear algebra during developing TPU google cut down mantissa from 23 bits to 5 bits and invented bf16 fast forward now we have 1.58 bit LLMs

Simon Willison

@simonw

6 months

Huh, I missed this earlier this month: Microsoft Research used a similar trick for their "1.58-bit" LLM BitNet

4

2

40

0

11

muhtasham

@Muhtasham9

2 years

Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.

0

2

13

muhtasham

@Muhtasham9

9 months

Supporting local compute pfp by @evanjconrad

3

0

13

muhtasham

@Muhtasham9

5 months

#iclr folks come by we have pizza

1

13

muhtasham

@Muhtasham9

11 months

Top recommendation: Beautifully written in-depth explanation of this concepts, which I failed to do in my initial blog High quality tokens, future LLMs can boost their reasoning and get sense of humor from @charles_irl if this blog ends up in their dataset

Charles 🎉 Frye

@charles_irl

11 months

PagedAttention, Virtual Context, Speculative Decoding, Register Tokens: the last year has seen many ideas from systems programming applied to LLMs. Not many folks live in that intersection, so I wrote an explainer post to make them a bit more accessible!

18

286

1K

1

3

10

muhtasham

@Muhtasham9

10 months

Whats the bottleneck of your GPU-floor? @anyscalecompute meetup

0

1

12

muhtasham

@Muhtasham9

1 year

@nathanbenaich @huggingface 🤗

1

0

10

muhtasham

@Muhtasham9

7 months

Uncle jokes followed by biggest GPU heck yeah #NVIDIA #GTC24

1

2

11

muhtasham

@Muhtasham9

5 months

PSA if you need GPUs for your research Hit this companies up they have compute grants @PrimeIntellect @dstackai @fal @fal especially if you work on diffusion models

Lucas Beyer (bl16)

@giffmana

5 months

Does your *university* nlp/vision/ml lab have more or less than 64 A100 and 100+ other GPUs?

22

3

32

0

2

11

muhtasham

@Muhtasham9

7 months

It´s here

NVIDIA AI Developer

@NVIDIAAIDev

7 months

Accelerate your coding tasks, from code completion to code summarization with StarCoder2, the latest state-of-the-art, open code #LLM built by @HuggingFace , @ServiceNow , and NVIDIA. Learn more 👉

1

36

126

1

0

10

muhtasham

@Muhtasham9

2 years

@rasbt @3scorciav

2

1

11

muhtasham

@Muhtasham9

2 years

Reminder: Join amazing Transformers lecture by @giffmana tomorrow

Munich🥨NLP

@MunichNlp

2 years

🥨NEW EVENT🥨 Transformers in all glory details: @GoogleAI Brain Team Scientist Lucas Beyer @giffmana will explain the currently most dominant deep learning architecture for natural language processing in an exclusive event with @MunichNlp . Details below👇

1

3

11

0

4

9

muhtasham

@Muhtasham9

1 year

@NaderLikeLadder @alecqfong

0

2

9

muhtasham

@Muhtasham9

8 months

Will try to feed 10M tokens over weekend

1

0

8

muhtasham

@Muhtasham9

11 months

Sharing @huggingface collection of old models from RoBERTa all the way to GPT2 pre-trained and finetuned on Tajik language, stay tuned for more to come, mistral-7b, llama2-7b, and others on the way

Tajik Language Models - a muhtasham Collection

huggingface.co

1

0

9

muhtasham

@Muhtasham9

1 year

iCoffe Pro Max

2

1

10

muhtasham

@Muhtasham9

7 months

GitHub - xai-org/grok-1: Grok open release

Grok open release. Contribute to xai-org/grok-1 development by creating an account on GitHub.

github.com

0

1

9

muhtasham

@Muhtasham9

5 months

"Flops are cheap, bandwidth is adding more pins, and latency is physics. Deal with it."

Andrej Karpathy

@karpathy

5 months

@vrushankdes Great read! My experience is that you’re fighting physics but also the nvidia compiler and the stack overall, and even after pulling *a lot* of tricks we still can’t achieve more than ~80-90% mem bw on many kernels that you’d naively think should be ~100. And the rabbit hole

2

1

39

0

1

10

muhtasham

@Muhtasham9

1 year

Transformers everywhere…

0

8

muhtasham

@Muhtasham9

7 months

Great tune! Smooth run on m2 8gb python -m mlx_lm.generate --model mlx-community/OpenCodeInterpreter-SC2-3B-4bit --prompt "Write a quick sort in C++" --temp 0.0 --colorize

Xiang Yue

@xiangyue96

7 months

🌟 Big thanks for making StarCoder 2 open-source! 🚀 We've swiftly finetuned it on our Code-Feedback instruction dataset, the dataset behind OpenCodeInterpreter. 📈 HumanEval Scores are boosted ~30%. 3B Model: from 31.7 to 67.1! 7B Model: from 35.4 to 75.6! 🛠️ CodeFeedback has

42

64

264

0

3

9

muhtasham

@Muhtasham9

6 months

is this this the company motto? smh @EMostaque stay strong king

Amjad Masad

@amasad

6 months

Corporate AI drama is accelerating faster than AI itself.

39

86

1K

0

8

muhtasham

@Muhtasham9

7 months

#SD3

0

7

muhtasham

@Muhtasham9

9 months

Patterns from CIDR database conference: Stanford - turns out databases are actually LLMs and every problem is an ML problem. Berkeley - let me solve some NP hardish algorithmic problem using LP and other techniques that might find application 50 years later. CMU - let me

0

2

8

muhtasham

@Muhtasham9

1 year

💫StarCoder which was released today by @BigCodeProject is prime example of Open Source outcompeting Big shot out to @lvwerra @harmdevries77 @Thom_Wolf @huggingface @ServiceNowRSRCH

Dylan Patel

@dylan522p

1 year

Google "We Have No Moat, And Neither Does OpenAI" Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI This is the opinion of one Googler, we do not agree, simply sharing. $GOOGL $MSFT $META $AI $NVDA $AMZN $AAPL

31

122

685

0

8

muhtasham

@Muhtasham9

7 months

🟩

muhtasham

@Muhtasham9

1 year

w boss

3

1

68

0

8

muhtasham

@Muhtasham9

8 months

Repo:

GitHub - Muhtasham/pod-helper: 🎧 Pod-Helper: Real-time audio transcription and repair on consumer...

🎧 Pod-Helper: Real-time audio transcription and repair on consumer hardware - Muhtasham/pod-helper

github.com

1

0

8

muhtasham

@Muhtasham9

1 year

@vboykis Also rich

0

muhtasham

@Muhtasham9

2 years

𝙏𝙝𝙧𝙚𝙚 𝙩𝙝𝙞𝙣𝙜𝙨 𝙚𝙫𝙚𝙧𝙮𝙤𝙣𝙚 𝙨𝙝𝙤𝙪𝙡𝙙 𝙠𝙣𝙤𝙬 𝙖𝙗𝙤𝙪𝙩 𝙑𝙞𝙨𝙞𝙤𝙣 𝙏𝙧𝙖𝙣𝙨𝙛𝙤𝙧𝙢𝙚𝙧𝙨 by @MetaAI Summary thread 🧵

1

7

muhtasham

@Muhtasham9

7 months

Image and prompt by yours truly @marksaroufim teaching style is like a casual conversation with a senior engineer on your team

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC

1

20

105

1

7

muhtasham

@Muhtasham9

1 year

@MattNiessner @synthesiaIO Forget AutoGPT, AutoProf is the real deal

0

8

muhtasham

@Muhtasham9

4 months

@isidentical prolly you have seen this but cool data processing pipeline

Processing 2 Billion Images for Stable Diffusion Model Training - Definitive Guides with Ray Series

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

www.anyscale.com

1

0

8

muhtasham

@Muhtasham9

7 months

Super model MLX weights below

mlx-community/phi-2-super-4bit · Hugging Face

huggingface.co

anton

@abacaj

7 months

Release phi-2-super. Fine tuned over phi-2 and aligned with cDPO. MT-bench of 7.1875, surpassing many larger models. Humaneval score 60.98%, Humaneval-Plus 54.88%

45

60

554

0

2

7

muhtasham

@Muhtasham9

2 years

Nett hier. Aber waren Sie schon mal in @TU_Muenchen ?

1

0

7

muhtasham

@Muhtasham9

6 months

@ClementDelangue Yeah your runway should be enough to do this

1

0

7

muhtasham

@Muhtasham9

1 year

“LLMs are not database, they are not up to date, think of them as are reasoning engine and some sort of retrievers will solve the the issue of up do date knowledge” @sama

0

2

muhtasham

@Muhtasham9

1 year

Based @ykilcher at @tum .ai summit

2

0

7

muhtasham

@Muhtasham9

1 year

Kinda like this emoji 🌉 but with crescent 🌙

0

6

muhtasham

@Muhtasham9

5 months

I want to be in sf so badly this summer

4

0

8

muhtasham

@Muhtasham9

3 years

looking forward to this talk @PyConDE #PyConDE #PyDataBerlin

Financial Portfolio Management with Deep Reinforcement Learning #PyConDE #PyDataBerlin #PyData

intelligent_portfolio_optimization_with_deep_reinforcement_learning

2022.pycon.de

0

4

8

muhtasham

@Muhtasham9

1 year

Lot of wisdom from @kagglingdieter

2

0

7

muhtasham

@Muhtasham9

6 months

@jtvhk bruhh they should just outsource to @sfcompute

0

8

muhtasham

@Muhtasham9

1 year

How to get rich from LLMs 🤑 This made my day @full_stack_dl

0

8

muhtasham

@Muhtasham9

2 years

Beating OpenAI large v2 with Fine-tuned *medium* model from 85.8 WER down to 23.1 WER special thanks to @LambdaAPI and @huggingface team especially @sanchitgandhi99 and @reach_vb

muhtasham/whisper-medium-tg_tj · Hugging Face

huggingface.co

0

8

muhtasham

@Muhtasham9

7 months

Took some time off web-sockets

1

0

8

muhtasham

@Muhtasham9

1 year

TIL: @lexfridman hails from Buston, Tajikistan 🇹🇯 When our paths cross, I'll be ready with a friendly, "What's up, homie?"

1

6

muhtasham

@Muhtasham9

5 months

sneaking into libraries w @oliverpfaffel

0

1

7

muhtasham

@Muhtasham9

1 year

bf16 >> fp16 more numerically stable in practice

0

6

muhtasham

@Muhtasham9

11 months

@amasad @perplexity_ai @googlecloud Damn time to switch all dev to iPad with Replit Core

0

2

muhtasham

@Muhtasham9

2 years

@bradneuberg @tszzl @amasad Should be from the Facebook IPO, so around 2012

1

0

7

muhtasham

@Muhtasham9

1 year

Thanks @dk21 and @jefrankle for this amazing session, can’t wait for upcoming sessions

Weights & Biases

@weights_biases

1 year

We are LIVE🎉 Tune in for Lesson 3 of the Training & Fine-Tuning LLMs Course with @MosaicML 📚 You will learn data scaling laws to construct custom datasets, & dive deep into data curation, ethics, storage, & streaming best practices. Stream now🔗

0

2

6

0

1

7

muhtasham

@Muhtasham9

4 months

Roasting coffee beans and GPUs

0

7

muhtasham

@Muhtasham9

8 months

Germany is probably the only country you get invited to dinner by VC and the day after get asked to paypal the amount, or probably recession hitting hard on everyone

1

0

6

muhtasham

@Muhtasham9

2 years

Benedikt sharing the learnings from 5 data science competitions for recommender systems he did over the last 3 years.

0

6

muhtasham

@Muhtasham9

2 years

With the swarm of users experimenting @bing Chat aka Sydney. I feel similar vibes like that of “OMG LaMDA is sentient guy”. Again many things can be said but before folks start posting terminator images let me leave this here …

1

7

muhtasham

@Muhtasham9

1 year

@Francis_YAO_

Efficiently Scaling Transformer Inference

We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths....

arxiv.org

0

1

7

muhtasham

@Muhtasham9

2 years

Found the famous books cover page star while hiking today @aureliengeron

0

7