![Yuki Mitsufuji Profile](https://pbs.twimg.com/profile_images/1501516354025029637/u_gRGqTH_x96.jpg)
Yuki Mitsufuji
@mittu1204
Followers
4K
Following
42K
Statuses
3K
PhD, Distinguished Engineer @Sony, Lead Research Scientist/VP of AI Research @SonyAI_global, Head of Creative AI Lab, Associate Prof. @tokyotech_jp
Manhattan, NY
Joined December 2009
I'm very happy to see that MMAudio, which my talented colleagues (@mi141, A. Hayakawa, @yahshibu) and intern (@hkchengrex) at Sony AI have invested their time and effort into, is being tested by so many creative people in X arXiv:
6
12
88
1 spotlight, 5 posters from us at #ICLR2025 "Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric" led by Toshimitsu Uesaka, with strong support from Prof. Taiji Suzuki (@btreetaiji), was selected as a spotlight. Congrats🎊
0
1
6
Fast sampling methods for discrete diffusion from our lab 🏎️ Our sampling schedule optimization method (Jump Your Steps) is accepted at #ICLR2025 Another (Di4C) is about distllation for discrete diffusion!
📢Discrete diffusion models are trending! Check out the latest work from our group (@mittu1204) in this exciting field: 1️⃣ Di4C: Fast sampling through distillation 📄 2️⃣ Jump Your Step: Optimizing sampling schedules (ICLR'25) 📄
0
0
12
@eleonoragrassuc @iclr_conf @GiordanoCicchet @luigi_sigillo @dhan90001 @IspammL Congrats! Great work!
1
0
1
When you register for #ICASSP2025, don't forget to select our tutorial on diffusion models for audio: 🎶Transforming Chaos into Harmony: Diffusion Models in Audio Signal Processing🎶 See you in Hyderabad, India!🇮🇳
1
8
44
"これまでにevalaが発表してきた36の立体音響作品のサウンド・データを学習したサウンドエフェクト生成AIを用いて,空間的作品をつくる試みです.サウンド(evalaがシグネチャーとして作品の始まりに用いている汽笛の音)とテキスト(学習に用いられた作品のうち8作品のタイトルをチャンネルごとに選択)の二つをリファレンス(プロンプト)として,リアルタイムかつマルチチャンネルで「evalaのような音」が生成され続けます.このプロジェクトは,作家不在でも立体音響作品を永続的に継承・制作しうる新しいアーカイヴのかたちを探求する実験であり,本作品はその最初のスケッチとなります."
《Studies for》は,これまでにevala @evalaport が発表してきた36の立体音響作品のサウンド・データを学習したサウンドエフェクト生成AIを用い,空間的作品を創る試み. 作家不在でも立体音響作品を永続的に継承・制作しうる新しいアーカイヴの形を探求する実験. #DOMMUNE
0
3
10
RT @SonyAI_global: 🚀 PaGoDA by Sony AI: High-res image generation without retraining! Fast, efficient, and quality-focused. #NeurIPS2024 h…
0
5
0
RT @SonyAI_global: #GenWarp by Sony AI creates realistic perspectives from a single image! See how it works 🧵 #NeurIPS2024
0
3
0
🎶Large music models from our team🎶: 1. SoniDo🎼 for music mixing, demixing, transcription, etc. pdf: 2. OpenMU🧙♂️ for music captioning, reasoning, lyric understanding, etc. pdf: code: demo @ISMIRConf : #ISMIR2024
【🎸Music Foundation Model report by Sony AI】 Our team published a paper validating the effectiveness of a foundation model for music generation, showing that combining it with our music analysis techniques achieves higher performance. Paper:
0
8
50
A list of diffusion works & tutorial (at #ISMIR2024) from our lab! [ML] #NeurIPS24 (GenWarp: Novel View Synthesis) #NeurIPS24 (PaGoDA: Multi-Scale 1 Step Generator) #ICLR24 (CTM: Fast Image Gen.) #ICLR24 (MPGD: Guided Diffusion) #ICML23 (FP-Diff: Consistency-type Model) #ICML23 (Blind Inverse) [Audio/NLP] #ACL24 (Knowledge Gen.) #IJCAI24 (Music Editing) #ICASSP24 (Declipping) #ICASSP24 (Speech Enh.) #ICASSP23 (Music Transcription) #ICASSP23 (Vocoder) #ICASSP23 (Dereverb) #INTERSPEECH23 (Speech Enh.)
1
15
63
Here's a sneak peek of a crowd-based competition on sounding video generation starting from Oct. 1st! #ECCV2024
0
2
20
If you are working on a family of inverse problems using diffusion models and are confused about their relationships, this survey paper will clear your head!
Why are there so many different methods for using diffusion models for inverse problems? 🤔 And how do these methods relate to each other? In this survey, we review more than 35 different methods and we attempt to unify them into common mathematical formulations.
1
4
58
RT @SonyAI_global: Meet @mittu1204, Lead Research Scientist, overseeing music and sound research within our AI for Creators Flagship. Lear…
0
6
0
A list of diffusion works + tutorial from our lab! [ML] #ACL2024 (Knowledge Generation) #ICLR2024 (Consistency Trajectory Model) #ICLR2024 (Manifold Preserving Guided Diffusion) #ICML2023 (Consistency-type Model) #ICML2023 (Blind Image Restoration) [Audio] #IJCAI2024 (Music Editing) #ICASSP2024 (Declipping) #ICASSP2024 (Speech Enhancement) #ICASSP2023 (Music Transcription) #ICASSP2023 (Vocoder) #ICASSP2023 (Dereverb) #INTERSPEECH2023 (Speech Enhancement)
1
20
98
📢For researchers in the Audio-Visual field, the call for papers of our #ECCV2024 workshop (AVGenL) is out Don't miss the deadline July 15⏳
Initial CfP advertisement for the ECCV 2024 workshop "AVGenL: Audio-Visual Generation and Learning". It will cover a wide range of topics about audio-visual generation and learning. Paper submission deadline is 15 Jul 2024. #ECCV2024 #ECCV @eccvconf
0
1
17