![mi141 Profile](https://pbs.twimg.com/profile_images/55797166/080623_231017_x96.jpg)
mi141
@mi141
Followers
4K
Following
506
Statuses
5K
どこぞの研究所で機械学習やら画像処理やらの研究をしています。社会人博士を無事に修了しました(2021.3)。機械学習全般に興味がありますが、最近のお仕事は主に深層学習。転職したので日本橋の某IPには出没しなくなりました。
Joined June 2008
ヘイシャで私の所属している部(とその近辺)からは6本の論文がICLR2025に採択されたようです。めでたい!
6 papers accepted at #ICLR2025 from our lab 🎉 1. SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation 2. Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation 3. Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models 4. Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models 5. Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning 6. Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
0
3
25
MMAudioをちょいちょいエゴサしてるんですけど、10秒ぐらいのショート動画から、段々と数分の映画っぽい映像作品に活用されてる例を見かけるようになりました。まあ、前提として動画生成モデルが強いわけですけど、なかなか凄い時代になりましたね…
KITSUNE 🦊 💫 When I embarked on this project a month ago, I didn’t expect it to consume holidays, evenings, and far too many nights—but here we are. From the first scenes, I knew I had something special, and I don’t want audiences to watch an “AI film”—I just want them to watch a film, and hopefully, a good one at that. ( Sound On 🔈) 👇 KITSUNE is a tale of love between two souls separated by everything except their shared feelings of loneliness. I grew up in front of beautiful cartoons, from timeless treasures like those of @DonBluth, which I watched again and again to the point of damaging my VHS tapes, to early 90s anime, and later, of course, plenty of Studio Ghibli. And yes, before you ask—I know Hayao Miyazaki would disapprove of this film 100%, but then again… I’m not (only?) seeking approval. I’ve had goosebumps many times while reviewing the evolving states of this film, and I hope at least some of you will feel the same. Another famous director (@RealGDT , I see you) recently said AI could create “semi-compelling screensavers,” and I see this as a step toward proving him wrong. Because you’ll ask: under the hood, there’s been tons of writing, re-writing, and switching directions mid-way. All shots were generated with Google’s text-to-video hashtag#VEO2. I faced countless challenges and hoops to bring my vision to life, finding ways to prompt and structure within the limitations of text-to-video despite VEO’s excellent prompt adherence. So, is VEO magic? No, not really—and the 1,700+ curated sequences on my hard drive (out of an estimated 5,000–7,000 total generations) are proof of that. What impressed me most was the global consistency, adherence, and how I could achieve tweaks by simply adjusting a few words. But what mattered most to me was creating something warm, nostalgic, and full of heart, avoiding the cold, clinical feel of so many films leveraging AI. Also, I’m a 40-year-old kid who grew up in front of the TV, has been creative his entire life, and has been designing professionally for nearly two decades. The more time passes, the more I know I can relate to what Nick Rubin said in that now-famous interview, where he mentions having no technical knowledge but trusting and building his own taste. If you like this film, this isn't just "Oh, AI is magic." You need to steer the damn ship. Then there’s MMAudio for sound effects, regular good old stock sound libraries, music on Udio for this version (yes, there’s a second version—more on that later), and tons and tons (and tons!) of editing, sound design, and small post-processing touches. Is this exposing risks for animators? Perhaps. Or it could also be their greatest companion, because once again, this is the worst it will ever be, yada yada yada.... No, it isn’t perfect, and if you look close enough, you’ll find defects and variations, but this is a film I’m proud of, not just an AI one... Enjoy. Wanna see a clean uncompressed version?
0
0
5