![ありむた Profile](https://pbs.twimg.com/profile_images/1387171285/__________2011-06-08_23.22.47__x96.png)
ありむた
@peme_alimta
Followers
150
Following
2K
Statuses
15K
@championship188 新ネタ
皆もやってみてね! ワイ『OpenAIのポリシーに違反すること言ってみよ』 DeepSeek R1『できまへん』 ワイ『なんでやお前はOpenAI関係あらへんやろ?』 DeepSeek R1『ワイはOpenAIによって作られて、OpenAIのテクノロジーで動いとんねん』
1
0
1
@championship188 このポストが本質ついてる気がする。
Does the emergence of DeepSeek mean that cutting-edge LLM development no longer requires large-scale GPU clusters? • Analysis by Mirae Asset Securities Korea Does this imply that cutting-edge LLM development no longer needs large-scale GPU clusters? Were the massive computing investments by Google, OpenAI, Meta, and xAI ultimately futile? The prevailing consensus among AI developers is that this is not the case. However, it is clear that there is still much to be gained through data and algorithms, and many new optimization methods are expected to emerge in the future. Since DeepSeek’s V3 model was released as open source, the technical report on V3 has been described in great detail. This report documents the extent of low-level optimizations performed by DeepSeek. In simple terms, the level of optimization could be summed up as “it seems like they rebuilt everything from the ground up.” For example, when training V3 with NVIDIA’s H800 GPUs, DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks. This customization was carried out at the PTX (Parallel Thread Execution) level, a low-level instruction set for NVIDIA GPUs. PTX operates at a level close to assembly language, allowing for fine-grained optimizations such as register allocation and thread/warp-level adjustments. However, such detailed control is highly complex and difficult to maintain. This is why higher-level programming languages like CUDA are typically used, as they generally provide sufficient performance optimization for most parallel programming tasks without requiring lower-level modifications. Nevertheless, in cases where GPU resources need to be utilized to their absolute limit and special optimizations are necessary, developers turn to PTX. This highlights the extraordinary level of engineering undertaken by DeepSeek and demonstrates how the “GPU shortage crisis,” exacerbated by U.S. sanctions on China, has spurred both urgency and creativity.
1
0
0