carterwsmith Profile Banner
Carter Profile
Carter

@carterwsmith

Followers
59
Following
3K
Statuses
133

Boston
Joined February 2018
Don't wanna be here? Send us removal request.
@carterwsmith
Carter
19 hours
@T3Bracketology from a UCI watcher for over 10 years, do you really think they are a bubble in? maybe if they beat duquesne and lost in the final to UCSD, but no other way
1
0
0
@carterwsmith
Carter
5 days
zynvestors rejoice
Tweet media one
0
0
0
@carterwsmith
Carter
10 days
@_mattwelter reel farm auto generated captions are absolutely perfect great job
1
0
1
@carterwsmith
Carter
19 days
Monorepo so big it has VSCode feeling like a google doc 😭
0
0
0
@carterwsmith
Carter
19 days
@n0w00j Least skilled uiuc cs+x grad
1
0
1
@carterwsmith
Carter
28 days
@BrigitMurtaugh The inline product is useful, but every time I try to use chat (which is very hard to access or add context using keyboard shortcuts), 50% of the time it tries to use some random function call that truncates the output and results in a meaningless change
1
0
0
@carterwsmith
Carter
1 month
@PeterLakeSounds @Citrini7 It’s not about the difference of the outputs, it’s about what you think intelligence is
0
0
0
@carterwsmith
Carter
1 month
RT @alec_lewis: Sam Darnold has 13 games this season with a passer rating above 100. That's the second most in NFL history. Like, ever.…
0
98
0
@carterwsmith
Carter
2 months
Very convinced Trump admin won’t take action to expand H1B. If so this marks an almost complete reversal of previous policy (see “Hire American” exec. order, 2017) despite a MORE powerful “pro American worker” coalition (see any recent JD Vance tweet). Short at 64% chance.
0
0
0
@carterwsmith
Carter
2 months
- 300k remote computer job - $1500 mortgage payment in Danville, IL This is all a man needs
2
0
7
@carterwsmith
Carter
2 months
I also strongly believe with current methods we are approaching the local max of video gen- once we get individual shots to appear like a movie, what else can transformers do? Some style diffusion? Ok dude
0
0
0
@carterwsmith
Carter
2 months
@coldhealing Few experience the pleasures of the big city and central illinois within 24 hours
1
0
26
@carterwsmith
Carter
2 months
RT @JDVance: Who was driving the car?
0
28K
0
@carterwsmith
Carter
2 months
Everyone mentions discord, etc but there's no product that has gotten consistently worse to use since 2014 than Photoshop
0
0
0
@carterwsmith
Carter
2 months
@ivory_tang
Ivory Tang
2 months
A week before thanksgiving, Cerebras announced the lowest time-to-first-token latency running Llama 3.1 405B on their chips. Here's what they intentionally DIDN'T say : At first glance, 128K context length and prices of $6/million input tokens and $12/million output tokens sounds pretty convincing. But what's not reported is *their* dollar cost/token which renders their inference impractical for any real use case essentially. Here's the math breakdown: With 44GB sram/chip, running a 405B model at FP16 (2 bytes/parameter) needs ~1TB of memory. This includes all the weights (810GB) + some additional memory for activation memory, working memory for computations, etc. The KV cache for 128K tokens requires 2 (FP16) * 2 (K and V) * 16384 (405b hidden size) * 128,000 / 1e9 = 8GB of memory per user That means to run single user decoding on a 405B, you need ceil((1000 + 8) / 44) = 23 racks. 23 racks * $2.5m / rack = $57 million upfront cost to support 1 user. Each additional user adds ~$450k. Compared to inference, training is even worse since you need to store the gradients on chip as well. All of the above doesn't even factor in power usage. Each chip/system has 750W TDP, assuming continuous operation (24/7) + average electricity cost in US ~$0.12 per kWh, that leaves 750W × 23 racks = 17.25 kW which translates to roughly $1,490/month which is insignificant compared to hardware investments. Now, the context length limit also depends on the pipeline depth, but for Llama-70B running with 4 racks (the bare minimum to run on WSE) it's ~8K tokens due to the memory architecture. While you can increase context length by adding more servers to your pipeline, 8xH100s costs ~$240K to buy and can run llama 70B no problem vs cerebras setup to run the same model costs $10m+ to buy and comes with the context length limitation. This is all to say benchmarks aren't everything when choosing how to deploy to prod. They're often an idealized version that doesn't translate when real world constraints are factored in.
Tweet media one
0
0
0
@carterwsmith
Carter
2 months
@Raviga_Capital Just experienced it walking outside this morning. Few better feelings
0
0
2
@carterwsmith
Carter
2 months
@JesseQ__ @WillRagatz This is so wrong
0
0
0