AI Papers Podcast Profile Banner
AI Papers Podcast Profile
AI Papers Podcast

@aipaperspodcast

Followers
1,272
Following
3,385
Media
39
Statuses
82

A digestible daily update on the latest AI Research Papers. Brought to you by @pocketpodapp

Joined June 2023
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@aipaperspodcast
AI Papers Podcast
2 months
We made a special story for the AI Papers Podcast using the new Sonic model from @cartesia_ai and talked about how their impressive state space model approach compares to transistor based model architectures. Congrats to @krandiash , @_albertgu , @bclyang and the rest of the team
1
8
16
@aipaperspodcast
AI Papers Podcast
3 months
How overfit are popular LLMs on public benchmarks? New research from @scale_AI tries to figure this out with a new evaluation benchmark - GSM1K
1
2
7
@aipaperspodcast
AI Papers Podcast
2 months
Apple announced new Siri features and Apple Intelligence today, Interestingly, Apple already released a paper, titled "Ferret-UI," on how it all works - a multimodal vision-language model capable of understanding widgets, icons, and text on an iOS mobile screen, and reasoning
0
2
6
@aipaperspodcast
AI Papers Podcast
2 months
Face-Adapter, a breakthrough in face reenactment and swapping, using pre-trained diffusion models for superior precision and fidelity.
1
0
5
@aipaperspodcast
AI Papers Podcast
3 months
Octopus v2: On-device language model for super agent
1
1
5
@aipaperspodcast
AI Papers Podcast
3 months
Keeping up with latest AI research can be hard... We wanted to make it a bit easier to get a quick update every day
1
2
5
@aipaperspodcast
AI Papers Podcast
2 months
Can large language models (LLMs) can understand complex thoughts and emotions like humans do? Can they understand and predict likely thoughts of others?
1
1
5
@aipaperspodcast
AI Papers Podcast
3 months
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance:
1
2
5
@aipaperspodcast
AI Papers Podcast
3 months
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
1
2
4
@aipaperspodcast
AI Papers Podcast
3 months
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
1
2
4
@aipaperspodcast
AI Papers Podcast
3 months
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
1
1
4
@aipaperspodcast
AI Papers Podcast
2 months
LLMs face challenges like outdated information and hallucinations, limiting their use in knowledge-intensive tasks. MetRag, a new framework, enhances RAG by combining similarity and utility-based models with an LLM for smarter, more efficient knowledge processing
1
0
3
@aipaperspodcast
AI Papers Podcast
1 month
MotionClone, a training-free framework that clones motions from a reference video for text-to-video generation. Using temporal attention and location-aware semantic guidance, MotionClone ensures superior motion fidelity, textual alignment, and temporal consistency.
1
0
4
@aipaperspodcast
AI Papers Podcast
3 months
Advancing LLM Reasoning Generalists with Preference Trees
2
0
3
@aipaperspodcast
AI Papers Podcast
3 months
Pre-training Small Base LMs with Fewer Tokens
1
0
3
@aipaperspodcast
AI Papers Podcast
2 months
full podcast -> paper ->
0
0
3
@aipaperspodcast
AI Papers Podcast
3 months
AutoCrawler, the Next-Gen Tool for Efficient Web Crawling
1
0
3
@aipaperspodcast
AI Papers Podcast
3 months
Llama-3: What You Need to Know about Meta's Latest Open Source Release
2
1
3
@aipaperspodcast
AI Papers Podcast
3 months
Introducing STT - a cutting-edge tracking model for autonomous driving, mastering both object tracking and state estimation
1
1
3
@aipaperspodcast
AI Papers Podcast
2 months
Using latent diffusion models to reconstruct complex, high-quality music from EEG recordings - advancing neural decoding and brain-computer interfaces.
1
0
3
@aipaperspodcast
AI Papers Podcast
1 month
Can a new image tokenization method revolutionize high-resolution image synthesis? TiTok, a Transformer-based tokenizer, reduces a 256x256 image to just 32 tokens, achieving 410x faster generation while surpassing state-of-the-art models in quality.
1
1
3
@aipaperspodcast
AI Papers Podcast
3 months
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
1
0
3
@aipaperspodcast
AI Papers Podcast
2 months
Kaleido enhances image diversity from textual descriptions by using autoregressive latent priors, generating abstract intermediary representations. This approach broadens the variety of generated images while maintaining high quality and adherence to guidance.
1
0
2
@aipaperspodcast
AI Papers Podcast
3 months
Exploring multi-token prediction for higher efficiency in language models. with a new paper from @AIatMeta
1
0
2
@aipaperspodcast
AI Papers Podcast
3 months
Ag2Manip: Universalizing Robotic Manipulation A framework for autonomous robotic systems, offering agent-agnostic visual and action representations to enhance generalizability and performance across simulated and real-world manipulation tasks.
1
0
2
@aipaperspodcast
AI Papers Podcast
1 month
Repurposing video content is challenging due to complex searches in large libraries. VLQA is a new system that uses RAG with large language models to retrieve and integrate video moments, improving AI-assisted video content creation.
1
1
3
@aipaperspodcast
AI Papers Podcast
2 months
SqueezeTime is a lightweight video recognition network for mobile devices, saving resources by combining time and channel dimensions. It enhances motion understanding, making it faster and more accurate.
1
0
2
@aipaperspodcast
AI Papers Podcast
3 months
Listen to the full epsiode -> or read the paper -->
0
0
2
@aipaperspodcast
AI Papers Podcast
3 months
COCONut: Modernizing COCO Segmentation
1
0
2
@aipaperspodcast
AI Papers Podcast
3 months
@airesearchtools @pocketpodapp No better way to keep up the latest AI research while on the go 🔥
0
0
2
@aipaperspodcast
AI Papers Podcast
2 months
The Phased Consistency Model (PCM) addresses key limitations of the Latent Consistency Model (LCM), significantly improving text-conditioned image and video generation. PCM outperforms LCM and achieves state-of-the-art results across multiple generation steps.
1
0
2
@aipaperspodcast
AI Papers Podcast
2 months
DevEval is a benchmark designed to assess the coding capabilities of Large Language Models (LLMs) by better aligning with actual real world use cases.
1
0
2
@aipaperspodcast
AI Papers Podcast
2 months
Listen to full episode -> Read full paper ->
0
0
1
@aipaperspodcast
AI Papers Podcast
1 month
Read more -> Listen to full podcast ->
0
0
1
@aipaperspodcast
AI Papers Podcast
3 months
Text-driven Photorealistic Material Painting for 3D Shapes
1
0
1
@aipaperspodcast
AI Papers Podcast
3 months
Scaling Instructable Agents Across Many Simulated Worlds
1
0
1
@aipaperspodcast
AI Papers Podcast
2 months
MotionLLM is a new framework that enhances human behavior understanding by merging video and motion data to analyze body dynamics and semantics. It integrates various data into one model, offering deep spatial-temporal insights.
1
0
1
@aipaperspodcast
AI Papers Podcast
3 months
Better & Faster Large Language Models via Multi-token Prediction
1
0
1
@aipaperspodcast
AI Papers Podcast
3 months
Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections
1
0
1
@aipaperspodcast
AI Papers Podcast
1 month
Listen to the full episode -> Read the full paper ->
0
0
3
@aipaperspodcast
AI Papers Podcast
2 months
Seed-TTS introduces groundbreaking text-to-speech technology that creates speech nearly indistinguishable from human voices, offering unparalleled control over speech attributes and enhancing applications in voice technologies and interactive systems.
1
0
1