k_ssl_r Profile Banner
kessler Profile
kessler

@k_ssl_r

Followers
187
Following
171
Statuses
71

part-time fortune teller, est 2008

usa
Joined August 2023
Don't wanna be here? Send us removal request.
@k_ssl_r
kessler
1 day
@vipulved @togethercompute twice that of the official api, insane
Tweet media one
1
0
6
@k_ssl_r
kessler
2 months
@7oponaut @doomslide yes; i think o3 low reasoning is ~$20 and o3 high reasoning is ~$3500
Tweet media one
1
0
2
@k_ssl_r
kessler
2 months
@doomslide @7oponaut near certainly arc; scaffolded arc with prog synth was 52%? don’t think anything scaffolded or not got over like 2% on frontier math pre o3
1
0
2
@k_ssl_r
kessler
2 months
0
0
2
@k_ssl_r
kessler
3 months
@aidan_mclau @teortaxesTex feeling a mistral esque structure where at some point like Deepseek-R2-236B-reasoner-gpt-o1-whatever-the-hell-2504 goes open weight noncom like ms large
0
0
1
@k_ssl_r
kessler
3 months
@YouJiacheng Appears to be a “trillion parameter Moe” per website
1
0
5
@k_ssl_r
kessler
3 months
@Grad62304977 @aidan_mclau I like 30b, Cohere Command R / Qwen2.5 32B can do a lot imo. <10b feels too small and >=70b feels too big. goldilocks optimal. idk re scaling down though
0
0
3
@k_ssl_r
kessler
3 months
@iamwaynechi @YouJiacheng @teortaxesTex is it possible for you to share numbers for mean? had the same experience w/ 100+ conc calls but never actually ran the nums and just a bit curious
1
0
2
@k_ssl_r
kessler
3 months
@YouJiacheng @teortaxesTex with high concurrency the deepseek api would literally have 1tps return rate for me. diff from ft latency ofc but still seems like it'd get in the way for cc
0
0
5
@k_ssl_r
kessler
3 months
@tszzl visions of inverse cramer
0
0
2
@k_ssl_r
kessler
3 months
@xprunie @khushkhushkhush @aaruHQ @seekingtau @virtualned our election model wasn’t up a few months ago, not true lol
0
0
1
@k_ssl_r
kessler
3 months
Pixtral is a CogVLM2 style vision encoder and similarly to CogVLM2 is trained on fewer images with certain architectural changes that make it less effective for vision tasks -- Qwen2VL 7B is a better option for vision and is ~45% smaller
1
0
1
@k_ssl_r
kessler
3 months
@_xjdr agree - should be accessible but not vis by default imo. wish o1 cot summaries were more granular as the samples from the website and leaked reasoning traces that it occasionally spits out are very interesting
1
0
5
@k_ssl_r
kessler
3 months
@PalmerLuckey 56 hours
0
0
0
@k_ssl_r
kessler
3 months
@_xjdr 😿
Tweet media one
1
0
16
@k_ssl_r
kessler
4 months
@teortaxesTex Mean Response Length / Arena Hard Score ``` Llama 3.1 70B -> 31.0 Llama 3.1 Nemotron 70B -> 25.8 Llama 3.1 405B -> 24.0 GPT-4o -> 22.0 Claude 3.5 Sonnet -> 20.4 ```
0
0
0
@k_ssl_r
kessler
4 months
RT @teortaxesTex: > llama and Gpt4o have identical mean response length > MRL of llama-nemotron increases by 27% > LLM as a judge evals Th…
0
1
0
@k_ssl_r
kessler
4 months
skeptical of the benchmarks being length biased. check the mean response length. obviously not a perfect heuristic but if you divide the Mean Response Length by the Arena Hard Score (to balance for length bias), you get the following (where a lower score is better) ``` Llama 3.1 70B -> 31.0 Llama 3.1 Nemotron 70B -> 25.8 Llama 3.1 405B -> 24.0 GPT-4o -> 22.0 Claude 3.5 Sonnet -> 20.4 ``` which makes more sense to me
0
0
0