/ @gazorp5 profile | Muskviewer

/

@gazorp5

Followers

555

Following

4K

Media

274

Statuses

3K

pro dog walker, ex-face:b00c, googie

Joined January 2013

Don't wanna be here? Send us removal request.

/

@gazorp5

1 year

time to panic

13

38

446

/

@gazorp5

11 months

People give Zuck and LeCun credit for Meta's open sourcing of AI models, but most don't know that @schrep (ex-CTO) has been the most important executive sponsor for AI and open source at least 2015. I doubt Llama, PyTorch, fairseq, etc would've been public without him.

3

120

/

@gazorp5

1 year

sponsor lucidrains otherwise he has to get a real job :(

1

17

118

/

@gazorp5

1 year

@Suhail It's definitely a sign of groupthink.

3

0

98

/

@gazorp5

2 years

@mayfer If you think about stereolithography as a TV projector, ASML provides the bulb. Everything else is TSMC.

3

1

68

/

@gazorp5

1 year

@RJdoesVR i can't tell if this is a parody. .

1

0

64

/

@gazorp5

1 year

@jimkxa Swamps are the best moats. Only experts will wade in, everyone else is deterred by the alligators.

1

54

/

@gazorp5

2 years

@micsolana Having a written debate where both sides can do research and cite facts would be better than a verbal debate.

3

1

48

/

@gazorp5

11 months

@Teknium1 no way musk would've allowed open sourcing anything that openai would've done under tesla. look at tesla's github repo, nothing useful there.

3

0

47

/

@gazorp5

1 year

Anyone want to match me in sponsoring lucidrains?.

3

4

47

/

@gazorp5

10 months

@jeremyphoward . @schrep is awesome.

/

@gazorp5

11 months

People give Zuck and LeCun credit for Meta's open sourcing of AI models, but most don't know that @schrep (ex-CTO) has been the most important executive sponsor for AI and open source at least 2015. I doubt Llama, PyTorch, fairseq, etc would've been public without him.

0

42

/

@gazorp5

2 years

Japan demonstrating they're 20 years ahead. They've already got superconducting pliers while everyone is working with flakes.

固体量子(研究室公認VTuber)

@QM_phys_kyoto

2 years

室温超伝導再現してみた！. わけではありません🤣.反磁性再現実験ってこの動画の挙動に似てませんか？. 磁場で磁化したプライヤー.磁場の大きさは表面で20mTくらいです

1

20

37

/

@gazorp5

1 year

@realGeorgeHotz Sponsoring lucidrains on Github.

0

2

37

/

@gazorp5

1 year

@Teknium1 Bing Chat is still touchy about the subject.

4

0

30

/

@gazorp5

1 year

@pandas_dev ? Realistically you wouldn't accept a PR that rewrote the API. Are you suggesting a fork?.

0

29

/

@gazorp5

1 year

@kane > aims to build solid evidence base for policies and standards on traditional medicine practices and products, helping countries integrate it into their health systems & regulate its quality. if they're doing science, it seems ok to me?.

4

0

23

/

@gazorp5

1 year

@Suhail If their mind was easily changed because of the last week of Gemini, instead of realizing the power of the Lindy effect, don't you think that's a little odd? If my assumptions are wrong I apologize!.

1

0

27

/

@gazorp5

2 years

@jeremyphoward Why do you think it would randomly load some users chat instead of hallucinating?.

1

0

25

/

@gazorp5

1 year

@ralphbrooks YouTube has billions of hours of diverse videos. What is the purpose of ue5?.

5

0

27

/

@gazorp5

1 year

@drexalt 100% agreed, what do you think @soumithchintala? 😀.

1

23

/

@gazorp5

2 years

@3blue1brown Tiktok is china's revenge for the opium wars? Didn't expect such a spicy take from @3blue1brown.

0

20

/

@gazorp5

11 months

@aaronlucas21 Lego robotics, FRC, robocup, tiny mouse, BattleBots are all robotic competitions for varying ages in the US, from elementary school to college to professional. What's different about yours?.

2

0

20

/

@gazorp5

2 years

@daniel_eth present day GPT-4 or GPT-4 trained on data from 2002?.

1

0

20

/

@gazorp5

1 year

@xsphi I got more confused about escape velocity after reading your tweet, the answer is escape velocity only applies to non-propelled objects (like a rock) to escape gravity, it doesn't apply to rockets.

3

0

20

/

@gazorp5

1 year

@airkatakana Advanced seems to answer it okay.

0

19

/

@gazorp5

1 year

@alterwyx being that guy, but doubt this would pass a blind taste test unless the food was very acidic.

1

0

19

/

@gazorp5

11 months

@vikhyatk Main purpose is to get more investor money. It's brilliant.

2

0

17

/

@gazorp5

1 year

@minimaxir It definitely looks AI on first glance, but the details are too coherent. People seem to think over saturated, soft overhead light photos are AI now.

1

0

18

/

@gazorp5

11 months

@alexkoch_ai this is teleoperated?.

1

0

16

/

@gazorp5

2 years

@roydanroy Yes nuns have to move now, they're very sad about it.

1

0

17

/

@gazorp5

2 years

@blamelessjay its pure self-deception. if he took an undergraduate test in any of those areas, he would fail. this is why you don't drop out of high school.

2

0

14

/

@gazorp5

11 months

@jeffclune Moore's law is fine, even without redefining it to mean something completely different.

2

1

15

/

@gazorp5

1 year

"Starting today, Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding and more."

3

2

14

/

@gazorp5

11 months

@Carnage4Life I know this guy. He spread fake rumors about where the missing Malaysian plane was in college. Unsurprising that he would do this.

1

0

13

/

@gazorp5

2 years

@typedfemale 🤔

1

0

14

/

@gazorp5

1 year

even before the schism, i found Ilya's thinking to be more mystical than scientific, as evidenced by interviews. scott aaronson's blog for example:

2

0

14

/

@gazorp5

10 months

@eshear @BamaBonds Not linear algebra, you need non linearities to approximate arbitrary functions. /pedant.

2

0

13

/

@gazorp5

1 year

@HamelHusain @Tim_Dettmers Have you tried this? Without NVLink, FSDP is 4-5x slower. This only works if you're doing DDP.

1

12

/

@gazorp5

1 year

AMD scoop: their next generation data center GPUs will have block floating point support. Supposedly the range of fp32/bf16 but in 9 bits, will increase performance substantially without relying on fp8 conversions (cough h100). Should work for inference and training.

2

13

/

@gazorp5

1 year

@jimkxa The people living at CUDA Castle are doing very well for themselves 😃 Moats are often something highly unpleasant or costly, whether its wading through legal regulation or a crappy codebase. x86 had a monopoly for 20+ years; it was working for them.

3

1

12

/

@gazorp5

10 months

@paularambles the images of women have a bit of bias

3

0

73

/

@gazorp5

2 years

@tszzl He admitted that pytorch was superior to tensorflow a couple weeks ago.

0

13

/

@gazorp5

1 year

@andrewmccalip unconditioned generation != training data.

0

10

/

@gazorp5

1 year

@SchmidhuberAI @ylecun The world would be in a better place if you spent the time picking petty fights doing research instead.

2

0

13

/

@gazorp5

2 years

@alexkaplan0 @condensed_the Who to believe, a university materials research group or a guy who makes frozen coffee?.

0

10

/

@gazorp5

1 year

@finbarrtimbers Quantization for training or inference? For inference, it's been around for >5 years. Any NNs that run on mobile (like image filters) use quantization. Example from 2018:

0

12

/

@gazorp5

1 year

@soumithchintala metamate hallucinates like crazy, its more like a court jester than a useful chatbot assistant.

1

0

11

/

@gazorp5

1 year

@francoisfleuret @PyTorch That post is 2 years old. You can wrap your module to use cuda graph automatically with torch.compile with reduce-overhead mode.

1

0

11

/

@gazorp5

1 year

@minimaxir transformers probably wouldn't exist without word2vec tbh. skipgram is basically early BERT in its objective.

3

0

9

/

@gazorp5

1 year

@ML_PhDer MIT has not been relevant in ML for at least 10 years.

1

0

9

/

@gazorp5

10 months

@proales @netcapgirl google was getting a ton of data from email - thats why amazon and other companies don't reveal what products you've bought in the order receipt anymore.

0

10

/

@gazorp5

1 year

@typedfemale where can i get this unabridged version.

0

11

/

@gazorp5

1 year

@jkronand This is hilarious! Cargo-culting PyTorch's nn.Module's interface, without realizing the forward method implies a corresponding automatic backward method.

2

0

9

/

@gazorp5

2 years

@alyssamvance > while most skeptics have stuck to unpersuasive name-calling, arguments from psychoanalysis, and Twitter dunks. Please tell me you can see the irony in this statement.

1

0

9

/

@gazorp5

11 months

@ericjang11 I wish GPUs depreciated that quickly. If 75% YoY was true, then the A100 80GB should cost $300 (released in 2021, assuming $20k initial price).

0

10

/

@gazorp5

1 year

@lawhsw Dunking on Marcus is stealing candy from a baby. It's too easy to be fun.

2

0

9

/

@gazorp5

10 months

@francoisfleuret if you want to send a python object (with its methods intact), isn't that by definition arbitrary code injection? otherwise you could json encode the __dict__.

1

0

9

/

@gazorp5

1 year

@jon_victor_ Can you confirm it was Ilya that made the advance, and not someone reporting to him?.

2

0

9

/

@gazorp5

2 years

@daniel_271828 tbf there's a lot of things barley can do that AI can't do.

3

0

9

/

@gazorp5

1 year

@marksaroufim @tarantulae Ideally the PyTorch docs would be a source of truth that can be relied on instead of a patchwork of blogs, forum posts, and pull requests that need to be combined together to figure out how something works.

0

8

/

@gazorp5

1 year

@tsarnick AWS off by an order of magnitude.

2

0

8

/

@gazorp5

1 year

+19 people sponsored lucidrains on github in one day, awesome!.

1

0

11

/

@gazorp5

11 months

benchmark hacking is so 2018, it doesn't really matter what your MMLU or humaneval score is in 2024. blind A/B tests with subject matter experts is the only thing that matters aka vibes. in that sense, we're probably close to reaching the limits of lmsys chatbot arena.

2

0

9

/

@gazorp5

1 year

@hardmaru it's been a bad year for coups attempted by russians.

0

1

8

/

@gazorp5

2 years

@dylan522p @Yampeleg That's not how copyright works. News websites publish and rewrite each others content all the time. Threatening legal action for something that you yourself have done isn't cool.

0

1

9

/

@gazorp5

1 year

@_akhaliq the title make the paper sound cooler than it actually is.

0

9

/

@gazorp5

2 years

@ericjang11 @Tesla_Optimus @1x__tech are you contractually obligated to post videos at 1x speed? 🤔 jokes aside, this is quite impressive, is it a deep rl model?.

1

0

9

/

@gazorp5

1 year

@marksaroufim @tarantulae The problem isn't the quantity of documentation, but quality. Many of the docs you've listed are incorrect in some way because PyTorch has changed significantly after it was written, and the docs have not been updated to reflect the existing design, making it confusing for newbs.

1

0

9

/

@gazorp5

2 years

Guy whose entire shtick is leaking internal memos from companies gets mad and threatens legal action from someone who does it to him.

Dylan Patel

@dylan522p

2 years

@Yampeleg Ya that's not cool. You didn't pay either because I see you already did a chargeback. And no you have no right to publish this, violating copyright. I will be launching legal action in Israel.

0

2

8

/

@gazorp5

1 year

@pcastr because our environment was designed for humans.

1

0

8

/

@gazorp5

1 year

@tsarnick

2

0

8

/

@gazorp5

1 year

@zacharylipton Most sota mobile vision models are found via NAS iirc e.g. EfficientNet/MobileNet/FBNet.

0

5

/

@gazorp5

1 year

World models are about learning cause and effect, not a literal map of the world. See @hardmaru or @ylecun's papers. Maybe take whatever the coauthor (MIT professor) has to say with a grain of salt, if he could get something so basic wrong.

Wes Gurnee

@wesg52

1 year

Do language models have an internal world model? A sense of time? At multiple spatiotemporal scales?. In a new paper with @tegmark we provide evidence that they do by finding a literal map of the world inside the activations of Llama-2!

1

0

7

/

@gazorp5

2 years

@francoisfleuret Technically you didn't specify that the train wasn't infinitely long. .

1

0

7

/

@gazorp5

2 years

@CixLiv Heat dissipation module for $70? Is it made from diamond?.

2

0

7

/

@gazorp5

1 year

@soumithchintala @drexalt good to hear things have changed for the better! back in the day, internal impact was #1, and open source was tertiary. even in FAIR OSS was not considered important work. (as people on xformers could tell you 👀).

0

8

/

@gazorp5

2 years

@_akhaliq @Gradio This is the exact same approach as MiniGPT-4, just with CLIP instead of BLIP and LLaMA instead of Vicuna. Lot of parallel work these days.

1

8

/

@gazorp5

1 year

@IanCutress AMD is already usable for finetuning and inference of LLMs. There's no secret sauce.

0

7

/

@gazorp5

1 year

@gautamcgoel Tell your friend that gay marriage is legal in 2023.

0

6

/

@gazorp5

2 years

@deliprao in this context, it sounds more like "pfizer will give the FDA early access to do drug testing before release" rather than free handouts?.

0

7

/

@gazorp5

1 year

@ednewtonrex Does this apply to text as well, since all language models to date have been trained on copyrighted text?.

2

0

7

/

@gazorp5

11 months

@francoisfleuret if you want to see autonomous clothing folding.

1

0

7

/

@gazorp5

2 years

@zswitten Gatekeeping *words* is hilarious. Like someone doesn't have the right to write if they use a tool to help them.

2

0

7

/

@gazorp5

1 year

@_akhaliq > For each task, we employ GPT-3.5-turbo to generate instruction data for fine-tuning. Microsoft doesn't have enough money to afford gpt4 API calls?.

2

0

7

/

@gazorp5

2 years

@typedfemale This better not be who I think it is. .

1

0

6

/

@gazorp5

1 year

@Thom_Wolf The data has nothing to do with textbooks, its just an instruction dataset generated using GPT-4.

0

4

/

@gazorp5

2 years

@iquilezles @TimSweeneyEpic @elonmusk Why do you dislike the Internet Archive? If ShaderToys ever goes down permanently, people can still access the site.

0

6

/

@gazorp5

2 years

@goodside Interesting. Something similar happens on the raw GPT4 model, if it doesn't receive the assistant format string. That model isn't accessible to the general public though. Possible that OpenAI is using classifier free guidance?.

1

0

6

/

@gazorp5

10 months

@jxmnop whisper is an encoder/decoder model thats used a lot.

1

0

6

/

@gazorp5

2 years

@YiTayML If T5 is so good why isn't it being used in Palm 😬. From my own experience, T5 overfit to benchmarks, but doesn't work that well for free-form human responses. Would like it to see it in the Chatbot Arena.

0

5

/

@gazorp5

11 months

@davisblalock MosaicML libraries also fall into the bucket of "PyTorch library is its own unique, broken, unstable snowflake". I've tried out both the trainer and the dataloader. Better off writing it yourself, that way its easier to debug :).

2

0

6

/

@gazorp5

1 year

oh, thats what TPOT means.

0

5

/

@gazorp5

10 months

@LanternBioworks Dentists hate this one trick!.

0

5

/

@gazorp5

2 years

@browserdotsys modern image pipelines incorporates black-frame subtraction, which removes/mitigates fixed pattern noise.

1

0

5

/

@gazorp5

10 months

@amir The companies listed when they're raising money.

0

5

/

@gazorp5

1 year

@Teknium1 @ivanfioravanti phi-2 uses gpt-3.5, not 4. the openai overlords wouldn't allow them to use gpt4 data.

3

0

5

/

@gazorp5

1 year

@typedfemale Increasing latency? Or am I misunderstanding.

1

0

6

/

@gazorp5

2 years

@nearcyan Crawling is going to be a lot more expensive. Buying or creating thousands of user accounts, accessing websites individually, already happening because of cloudflare.

0

5

/

@gazorp5

2 years

@WenhuChen The essence of pure vs applied {math, cs}. Engineers don't need to know linear algebra to fine tune a model and deploy it to production.

0

6

/

@gazorp5

11 months

@peterjliu Hasn't been updated in 2 years, I'm skeptical that GPT-4 was trained with FiM.

2

0

5