Carter @carterwsmith profile

Carter

@carterwsmith

Followers

59

Following

3K

Statuses

133

Boston

Joined February 2018

Don't wanna be here? Send us removal request.

Carter

@carterwsmith

19 hours

@T3Bracketology from a UCI watcher for over 10 years, do you really think they are a bubble in? maybe if they beat duquesne and lost in the final to UCSD, but no other way

1

0

Carter

@carterwsmith

5 days

zynvestors rejoice

0

Carter

@carterwsmith

10 days

@_mattwelter reel farm auto generated captions are absolutely perfect great job

1

0

1

Carter

@carterwsmith

19 days

Monorepo so big it has VSCode feeling like a google doc 😭

0

Carter

@carterwsmith

19 days

@n0w00j Least skilled uiuc cs+x grad

1

0

1

Carter

@carterwsmith

28 days

@BrigitMurtaugh The inline product is useful, but every time I try to use chat (which is very hard to access or add context using keyboard shortcuts), 50% of the time it tries to use some random function call that truncates the output and results in a meaningless change

1

0

Carter

@carterwsmith

1 month

@PeterLakeSounds @Citrini7 It’s not about the difference of the outputs, it’s about what you think intelligence is

0

Carter

@carterwsmith

1 month

RT @alec_lewis: Sam Darnold has 13 games this season with a passer rating above 100. That's the second most in NFL history. Like, ever.…

0

98

0

Carter

@carterwsmith

2 months

Very convinced Trump admin won’t take action to expand H1B. If so this marks an almost complete reversal of previous policy (see “Hire American” exec. order, 2017) despite a MORE powerful “pro American worker” coalition (see any recent JD Vance tweet). Short at 64% chance.

0

Carter

@carterwsmith

2 months

- 300k remote computer job - $1500 mortgage payment in Danville, IL This is all a man needs

2

0

7

Carter

@carterwsmith

2 months

I also strongly believe with current methods we are approaching the local max of video gen- once we get individual shots to appear like a movie, what else can transformers do? Some style diffusion? Ok dude

0

Carter

@carterwsmith

2 months

@coldhealing Few experience the pleasures of the big city and central illinois within 24 hours

1

0

26

Carter

@carterwsmith

2 months

RT @JDVance: Who was driving the car?

0

28K

0

Carter

@carterwsmith

2 months

Everyone mentions discord, etc but there's no product that has gotten consistently worse to use since 2014 than Photoshop

0

Carter

@carterwsmith

2 months

@blondesnmoney

Ivory Tang

@ivory_tang

2 months

A week before thanksgiving, Cerebras announced the lowest time-to-first-token latency running Llama 3.1 405B on their chips. Here's what they intentionally DIDN'T say : At first glance, 128K context length and prices of $6/million input tokens and $12/million output tokens sounds pretty convincing. But what's not reported is *their* dollar cost/token which renders their inference impractical for any real use case essentially. Here's the math breakdown: With 44GB sram/chip, running a 405B model at FP16 (2 bytes/parameter) needs ~1TB of memory. This includes all the weights (810GB) + some additional memory for activation memory, working memory for computations, etc. The KV cache for 128K tokens requires 2 (FP16) * 2 (K and V) * 16384 (405b hidden size) * 128,000 / 1e9 = 8GB of memory per user That means to run single user decoding on a 405B, you need ceil((1000 + 8) / 44) = 23 racks. 23 racks * $2.5m / rack = $57 million upfront cost to support 1 user. Each additional user adds ~$450k. Compared to inference, training is even worse since you need to store the gradients on chip as well. All of the above doesn't even factor in power usage. Each chip/system has 750W TDP, assuming continuous operation (24/7) + average electricity cost in US ~$0.12 per kWh, that leaves 750W × 23 racks = 17.25 kW which translates to roughly $1,490/month which is insignificant compared to hardware investments. Now, the context length limit also depends on the pipeline depth, but for Llama-70B running with 4 racks (the bare minimum to run on WSE) it's ~8K tokens due to the memory architecture. While you can increase context length by adding more servers to your pipeline, 8xH100s costs ~$240K to buy and can run llama 70B no problem vs cerebras setup to run the same model costs $10m+ to buy and comes with the context length limitation. This is all to say benchmarks aren't everything when choosing how to deploy to prod. They're often an idealized version that doesn't translate when real world constraints are factored in.

0

Carter

@carterwsmith

2 months

@Raviga_Capital Just experienced it walking outside this morning. Few better feelings

0

2

Carter

@carterwsmith

2 months

@JesseQ__ @WillRagatz This is so wrong

0