mi141 @mi141 profile

mi141

@mi141

Followers

4K

Following

506

Statuses

5K

どこぞの研究所で機械学習やら画像処理やらの研究をしています。社会人博士を無事に修了しました（2021.3）。機械学習全般に興味がありますが、最近のお仕事は主に深層学習。転職したので日本橋の某IPには出没しなくなりました。

Joined June 2008

Don't wanna be here? Send us removal request.

mi141

@mi141

4 years

「少ないデータやラベルを効率的に活用するための機械学習技術」という動画シリーズの投稿を始めました。色んな技術（データ拡張、正則化、転移学習、ドメイン適応、メタ学習、半・弱教師あり学習）を幅広く紹介するので、この分野を短時間で俯瞰したい人は是非。

1

111

540

mi141

@mi141

7 days

これは本当にその通りで、最近の動画生成がいかに凄いかを非専門家に説明しようとしても、「でも映画とか結構前からめっちゃCGらしいじゃん」ってなると思うんですよね。

あるふ

@alfredplpl

7 days

それいうとプロンプトから作れる画像生成はCGと何の違いがとか言えそうですね。昔は文字からレンダリングするのがCGと言われてましたし。

0

1

7

mi141

@mi141

7 days

ちなみに、この話のオチは「じゃあCVPRのPRとはいったい…？」です

0

2

mi141

@mi141

12 days

前職の同期が転職していった先という縁もあり、某ーリング飯にお招きいただきました。研究やら業界やら色んな話ができて楽しかったです！国際学会で招待制のディナー、みたいなのは良く見かけますが、こういう少人数でカジュアルな会というのも素晴らしい企画ですね…！

0

12

mi141

@mi141

16 days

JanusFlowのほうに興味がある人には、似たコンセプトのTransfusion（やShow-o）を紹介したことがあるので、なんかの参考にはなるかもしれません。

0

7

mi141

@mi141

22 days

ﾍｲｼｬで私の所属している部（とその近辺）からは6本の論文がICLR2025に採択されたようです。めでたい！

Yuki Mitsufuji

@mittu1204

22 days

6 papers accepted at #ICLR2025 from our lab 🎉 1. SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation 2. Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation 3. Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models 4. Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models 5. Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning 6. Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

0

3

25

mi141

@mi141

22 days

１本目： Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation ２本目： Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

0

mi141

@mi141

24 days

SANA（画像生成）は空間方向を1/32に圧縮するVAEを使って話題になりましたけど、同じように時間方向も大幅に圧縮するアプローチも今後出てくるんですかね。

0

12

mi141

@mi141

27 days

MMAudioをちょいちょいエゴサしてるんですけど、10秒ぐらいのショート動画から、段々と数分の映画っぽい映像作品に活用されてる例を見かけるようになりました。まあ、前提として動画生成モデルが強いわけですけど、なかなか凄い時代になりましたね…

Henry Daubrez 🌸💀

@henrydaubrez

28 days

KITSUNE 🦊 💫 When I embarked on this project a month ago, I didn’t expect it to consume holidays, evenings, and far too many nights—but here we are. From the first scenes, I knew I had something special, and I don’t want audiences to watch an “AI film”—I just want them to watch a film, and hopefully, a good one at that. ( Sound On 🔈) 👇 KITSUNE is a tale of love between two souls separated by everything except their shared feelings of loneliness. I grew up in front of beautiful cartoons, from timeless treasures like those of @DonBluth, which I watched again and again to the point of damaging my VHS tapes, to early 90s anime, and later, of course, plenty of Studio Ghibli. And yes, before you ask—I know Hayao Miyazaki would disapprove of this film 100%, but then again… I’m not (only?) seeking approval. I’ve had goosebumps many times while reviewing the evolving states of this film, and I hope at least some of you will feel the same. Another famous director (@RealGDT , I see you) recently said AI could create “semi-compelling screensavers,” and I see this as a step toward proving him wrong. Because you’ll ask: under the hood, there’s been tons of writing, re-writing, and switching directions mid-way. All shots were generated with Google’s text-to-video hashtag#VEO2. I faced countless challenges and hoops to bring my vision to life, finding ways to prompt and structure within the limitations of text-to-video despite VEO’s excellent prompt adherence. So, is VEO magic? No, not really—and the 1,700+ curated sequences on my hard drive (out of an estimated 5,000–7,000 total generations) are proof of that. What impressed me most was the global consistency, adherence, and how I could achieve tweaks by simply adjusting a few words. But what mattered most to me was creating something warm, nostalgic, and full of heart, avoiding the cold, clinical feel of so many films leveraging AI. Also, I’m a 40-year-old kid who grew up in front of the TV, has been creative his entire life, and has been designing professionally for nearly two decades. The more time passes, the more I know I can relate to what Nick Rubin said in that now-famous interview, where he mentions having no technical knowledge but trusting and building his own taste. If you like this film, this isn't just "Oh, AI is magic." You need to steer the damn ship. Then there’s MMAudio for sound effects, regular good old stock sound libraries, music on Udio for this version (yes, there’s a second version—more on that later), and tons and tons (and tons!) of editing, sound design, and small post-processing touches. Is this exposing risks for animators? Perhaps. Or it could also be their greatest companion, because once again, this is the worst it will ever be, yada yada yada.... No, it isn’t perfect, and if you look close enough, you’ll find defects and variations, but this is a film I’m proud of, not just an AI one... Enjoy. Wanna see a clean uncompressed version?

0

5

mi141

@mi141

28 days

【拡散希望】MIRU2025のスポンサー募集が始まりました！

mi141

@mi141

2 months

MIRU2025のスポンサー募集要項を公開しました！（募集開始は1/15の予定です） MIRUはコンピュータビジョンやパターン認識に関する国内最大規模の会議です。本分野での技術力訴求や認知度向上にはうってつけだと思いますので、ぜひご検討ください！#MIRU2025

0

3

5

mi141

@mi141

1 month

そういえば今月から地味に肩書きが長くなって「"Senior" Research Scientist」になりました。別に昇進したわけではなく、「社外で標準的に使われている肩書きと名前を合わせよう」という謎の施策のせいです。つまり、もうずっと前から私は"Senior"（自称若手）だったんだよ！！（ﾅ､ﾅﾝﾀﾞｯﾃｰ）

1

0

40

mi141

@mi141

1 month

今年は24日から休んでるんですが、実は初日からコロナを発症し、ずーーーっと体調が悪いまま新年を迎えることになりそうです…🥺

0

4

mi141

@mi141

2 months

同僚が主催するイベントです。いわゆる音楽AIに興味のある方はぜひ！（開催場所が弊社オフィス…！）

YUKARA16.04

@yukara13

2 months

🎶Tokyo Music AI Gathering🎶 来年の 2/5 に【音楽 AI】に関わる人々の交流を目的とした Meetup イベントを【東京】で開催します！技術者のみならず誰でも参加可能なイベントですので、興味のある方は奮ってご参加ください。言語：基本進行は英語詳細/参加登録：

0

1

mi141

@mi141

2 months

実は、今回は学習にそんなに高品質なデータを使っていないので、学習データを工夫すればおそらくもっと性能は上げられます。また、音声や音楽への適用も可能なので、関連分野のかたはぜひ！ちなみに、audio-visual generationについて最新の研究動向を以下で紹介してます！

0

2

20

mi141

@mi141

2 months

(1)の話は非常にシンプルです。モデル構造は、Stable Diffusion 3で使われているMM-DiTを拡張したような形ですが、これをtext-audioペアでも学習する（動画はempty tokenを入力）だけ。高品質なvideo-audioデータって案外集めるのが難しいので、これが結構効きます。

1

14