![RayLuan_NewOaks-e/acc Profile](https://pbs.twimg.com/profile_images/1707997765278826496/89ICeTYr_x96.jpg)
RayLuan_NewOaks-e/acc
@Rockwood_XRay
Followers
1K
Following
2K
Statuses
3K
CEO@NewOaks AI; Former Tiktok PM; NewOaksAI: Human-Like AI Phone Agents That Convert in 60s
https://www.newoaks.ai
Joined April 2023
Censorship is not an excuse that Open AI and other LLMs’ prices are higher than DeepSeek. All we want is good product with low price.
A bunch of well meaning folks told me it’s hypocritical to go on an anti-China and anti-CCP narrative while benefitting from the DeepSeek models. I agree and take the feedback. DeepSeek is awesome and the researches deserve a ton of respect. America should stop resorting to virtue signaling and censorship ideas and just aim for meritocratic supremacy. We will make the Perplexity product better than the DeepSeek app and that should be the major reason a user should prefer to use Perplexity app over DeepSeek. Auxiliary reasons like censoring and data security are not what we should focus marketing on. We will work on this (finetuning and more improvements) and will have more to share soon.
0
0
1
The conclusion of this Google paper is very clear: "SFT is responsible for memory, RL is responsible for generalization" Simple conclusion: Supervised fine-tuning (SFT) is like showing students a large number of examples and answers. Students learn by imitating the examples. Reinforcement learning (RL) is like letting students solve problems by themselves, giving rewards for correct answers and penalties for incorrect answers. Students learn through trial and error and summarizing patterns. The researchers designed two tasks to test the model: General Points card game: This is a card arithmetic game that requires using four cards to make the target number 24. The researchers tested: Rule changes: Change the numerical rules of J, Q, and K to see whether the model has learned the arithmetic rules or just memorized the solutions under specific rules. Visual Variation: Change the color of the cards to see if the model can still recognize the cards despite the visual changes. Virtual Reality Navigation (V-IRL): Navigating in a virtual city using instructions and street view images. The researchers tested: Rule variation: Using different ways of giving directions (e.g., absolute direction "north" vs. relative direction "turn left"). Visual changes: Test in cities the model was not trained on to see if the model can recognize landmarks and navigate in new visual environments. Key findings: Reinforcement Learning (RL) is the generalization champion! Across all tasks, models trained with RL excel at adapting to new rules and visual environments. They learn the underlying principles of arithmetic and navigation and are able to handle situations they have never seen before. Supervised fine-tuning (SFT) tends to memorize. Models trained with supervised fine-tuning tend to memorize the training data. They perform well on tasks similar to the training data, but performance drops dramatically when the rules or visual context change. They are essentially recalling patterns in the training data rather than truly understanding the task. Reinforcement learning (RL) improves the model’s visual recognition capabilities. Interestingly, RL training even improves the model’s ability to recognize objects in images, which helps in the virtual navigation task. This suggests that RL can improve the model’s basic visual understanding capabilities. Supervised fine-tuning (SFT) is still a good helper for reinforcement learning (RL). Although reinforcement learning is better in generalization, supervised fine-tuning is still useful. It can help the model initially understand the instructions and give responses in the right format. This makes it easier for reinforcement learning to further fine-tune the model on this basis to achieve better performance. “Thinking time” is crucial for reinforcement learning (RL). Giving the model more “thinking time” (adding verification reasoning steps in reinforcement learning training) can further improve the model’s generalization ability. Application suggestions: If you want an AI model that can truly understand and adapt to new situations (generalize), reinforcement learning (RL) is a better training method. It teaches the model how to learn and how to solve problems flexibly. If you just need an AI model to perform well on tasks that are very similar to its training data, then supervised fine-tuning (SFT) might be sufficient. But it is likely to struggle when circumstances change even slightly. Think of reinforcement learning (RL) as giving your AI a “brain” that can think for itself, while supervised fine-tuning (SFT) is like giving it a “cheat sheet.” In the long run, the “brain” is obviously more powerful.
0
0
1
Dario is deeply feared by DeepSeek. That is all I can see. Control=Fear
The word "control" appears 24 times in this essay – all 24 referring to export controls Zero mentions of the challenges of controlling powerful AIs, and the words "safe", "safety", and "alignment" don't appear at all Strange for the CEO of "an AI safety and research company"🤔
0
0
1
Agreed
"probably because it's just harvesting data". It's funny how people like @bindureddy think that because it's Chinese company there must be something sinister going on where the CPC is gathering data on everyone around the world. The truth is that over the last few days Deekseek has got a massive amount of Publicity, and its obviously down to the fact that their servers just cannot handle all of the increased traffic, so they have decided to prioritise their app service.. It's really simple.
0
0
0
RT @BarrettYouTube: @bindureddy "probably because it's just harvesting data". It's funny how people like @bindureddy think that because it'…
0
1
0