Gaurav Sen
@gkcs_
Followers
58K
Following
3K
Statuses
3K
Founder of @InterviewReady3. I teach system design and computer algorithms.
Mumbai, India
Joined March 2016
Researchers invent memristors for Large language models The chips make training faster and cheaper. #LLMs #AI #Memristor
1
28
230
The race to build LLMs with System-2 thinking capabilities is heating up. The research in this space is interesting. Here are some ideas which stick out: 1. Continuous Chain of Thought (Coconut) by Meta. Basic idea: LLM performance for reasoning tasks with simple question-answer flow is poor. So we send examples along with the input query explaining how to solve a standard problem. Example query: What is 2^15? System Prompt: 5^160 = 5^(128+32). We find the value of 5^32 = 5^16 * 5^16. Do this recursively till you find the answer. Output: 2 ^ (8 + 4 + 2 + 1) = 32768. This is called Chain of Thought. Continuous chain of though improves on this by adding embeddings of each "thought" in the above example. A system prompt would look like: System Prompt: 5^160 = 5^(128+32). We find the value of 5^32 = 5^16 * 5^16. Do this recursively till you find the answer. Output: 2 ^ (8 + 4 + 2 + 1) = 32768. The response quality with this method is superior. --------- That's enough for this post (my fingers got tired of typing on my cellphone 😛)! I'll share more learnings in future posts. Follow me to see them on your newsfeed. Cheers! #AI #LLMs #Reasoning
1
4
52
Twitter generates millions of unique IDs every day. This is how. #SystemDesign #DistributedSystems #Twitter
4
30
417
@Priyansh_31Dec Is cheating a big problem in these platforms now, with the advent of AI code generation systems?
1
0
1
Papers worth reading in the AI space. 1. SFT Memorizes, RL Generalizes ( Shows that reinforcement learning is good for generalized models, which are robust to changing rules and environments. 2. Test-Time Compute >> Model Parameters ( Allocating test-time compute adaptively per prompt is more efficient than increasing model parameters. Published by Google, and recently concurred by DeepSeek. 3. An image is worth 16x16 words ( Transformers out perform CNNs when processing image data at scale. 4. Facebook Coconut ( Chain of thought reasoning can be improved with Continuous Chain of Thought (passing vectors through the stages). 6. Towards System 2 reasoning with LLMs ( Improving LLM performance with graph algorithms like A* search, and game tree algorithms like MCTS. 7. Marco-o1: Towards Open Reasoning Models ( A paper describing how a model like OpenAI o1 could be designed. 8. DeepSeek R1 ( The recent famous open-source model with has (claimed to) met OpenAI benchmarks at a fraction of it's cost. 9. DeepSeek Janus ( Another recent shocker from DeepSeek, claims to outperform OpenAI's DALL·E 3 image generation model benchmarks. -------------- You can find these papers and my other recommendations neatly listed here: Bookmark the link, because more are on the way! #AI #ResearchPapers
1
3
46
@techhdive It's on tech crunch. I have been reading the papers from DeepSeek and summarising them, so the news appears on my recommendations. You can view my favorite resources here:
1
0
0
@Prashant_wzt I read research papers everyday, through news sources like huggingface and google news. My favourites are here:
1
1
4
DeepSeek just published an image generator that outperforms DALLE-3. The architecture is that of Janus (their model from 2024), which uses separate encoders for image understanding and generation. The model has state-of-the-art benchmark performance. This comes a week after their release of R1, which met OpenAI's o1 benchmarks. A big advantage with DeepSeek is that their models and algorithms are publicly shared and verifiable. #AI #DeepSeek #Janus
2
7
68