Personal role update: From January 2023 I'll start leading the entire Google Brain team in Japan including the research team previously led by David Ha
@hardmaru
. I'm looking forward to working with the team members & learning new topics such as RL, ALIFE, & AI creativity!
I've been promoted to Principal Research Scientist at Google Brain (soon Google DeepMind). I'm grateful to my family, colleagues, collaborators & those who have supported me throughout my career. I'm excited to continue working w/ all of you to make a difference in the world.
Google Brain Tokyo members had a chance to meet and have lunch with Prof. Hinton today. He also gave a tech talk about his latest work on capsule network in the Tokyo office this afternoon. Thanks a lot to
@hardmaru
for organizing them!
Yet another neural vocoder from my team mates in Google Brain is out! The new model, "WaveGrad", is not autoregressive/Flow/GAN. It is based on score matching / diffusion probabilistic models. Check it please!!
``WaveGrad: Estimating Gradients for Waveform Generation. (arXiv:2009.00713v1 []),'' Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan,
"Googleでは機械学習に特化したイベントGoogle Developers ML Summitを7/11に開催します。Jeff Dean をはじめとしたTensorFlowチームが来日し、TensorFlow、Cloud ML、ML Kitなど、Googleが開発者のみなさんに提供する機械学習ツールについてのセッションを行います"
I've been elected as a fellow of the International
#Speech
Communication Association (
#ISCA
).
Huge thanks to my colleagues & collaborators who have taught me so much &
#Google
which has given me opportunities to work on real-world problems.
I was advised to make this tweet more informative :-)
My team (Google Brain Applied Research in Tokyo) is hiring a Research Software Engineer (Machine Learning) who has professional experience and/or publications in speech processing, NLP, or Computer Vision area.
I will move to Google Brain from this July. By the end of this year, I will go back to Japan and be one of the founding members of the new Google Brain Tokyo team;
I'm looking forward to working with the Brain team members and people in the Tokyo office!
東京オフィスでAI研究に取組む仲間を募集します!Happy to see our
#GoogleAI
efforts expanding w/ Google Brain now having a research presence in Tokyo. We’re hiring machine learning researchers there, if you’re interested in helping advance AI, apply here —>
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano
Gemini Ultra’s performance exceeds current state-of-the-art results on
Today I presented this paper (Best Student Paper at Interspeech2019) at Google Brain Tokyo's paper reading group. It was fun :-)
Adversarially Trained End-to-end Korean Singing Voice Synthesis System
Paper:
Slide & Demo:
Our team
@GoogleDeepMind
Japan is hiring a Research Engineer to work on Neural Speech Understanding and Speech Generative Modeling in Tokyo!
If you are interested and have experience in these topics, please consider applying via the link below:
Google Cloud for Researchers
"Submit a proposal to receive up to $5,000 in free Google Cloud credits for academic research. Use Google's high performance computing capabilities. ..."
Google Research Japan is hiring a Research Scientist in AI for Social Good.
This is a great opportunity for researchers in Japan who are working on AI & Healthcare.
New paper from our team:
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, RJ. Ryan, Yonghui Wu
"Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling"
Arxiv:
Samples:
New paper from our collaborators at Google Research.
"We present Translatotron 3, a novel unsupervised speech-to-speech translation architecture. In Translatotron 3, we show that it is possible to learn a speech-to-speech translation task from monolingual data alone."
Introducing Translatotron 3, an unsupervised approach to speech-to-speech translation that can learn from monolingual data, mitigating the challenges of requiring parallel speech data & opening the door to translation of the non-textual speech attributes →
Yet another neural vocoder from my team mates in Google Brain is out! The new model, "WaveGrad", is not autoregressive/Flow/GAN. It is based on score matching / diffusion probabilistic models. Check it please!!
Neural vocoder from Xiaomi. Representation is extracted by a neural encoder, rather than knowledge-based fixed representation such as mel-spectrogram or WORLD vocoder params. Conceptually similar to VQ-VAE.
"RawNet: Fast End-to-End Neural Vocoder"
Google DeepMind's music generation model Lyria and Music AI tools under the partnership w/ YouTube.
I'm so excited to see this annoucement, and looking forward to seeing how it will help creators!!
(reposting as there was a typo)
"25 Years of Evolution in Speech and Language Processing"
"In this article, we summarize the evolution of speech and language processing (SLP) in the past 25 years. We first provide a snapshot of popular research topics and the associated state of ..."
Today (25th July, 2021) is my 10th
#Googleversary
.
I am fortunate that I could work with so many talented people at
@Google
. A huge thank you to my friends and colleagues who have taught me so much.
New paper from our team:
Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu
"PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS"
Arxiv:
Samples:
After 7 years in Speech team @ Google, it’s time for me to take a new adventure; I'l leave the team at the end of this month. I feel grateful for having had the opportunity to work as a part of the team. I learned a lot. I also feel proud of the team's incredible achievements.
Direct speech-to-speech translation with a sequence-to-sequence model. Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, and Yonghui Wu
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages and outputs web navigation actions, such as click and type. WebGUM is trained by jointly
A paper from my team:
Jonathan Shen, Ye Jia, Mike Chrzanowski, Yu Zhang, Isaac Elias, Heiga Zen, Yonghui Wu
Non-Attentive Tacotron: Robust & Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
paper:
audio:
Translatotron 2: Robust direct speech-to-speech translation
pdf:
samples:
outperforms Translatotron by a large margin in terms of translation quality and predicted speech naturalness
Translatotron is our experimental model for direct end-to-end speech-to-speech translation, which demonstrates the potential for improved translation efficiency, fewer errors, and better handling of proper nouns. Learn all about it below!
SpecGrad is yet another denoising diffusion probabilistic model (DDPM)-based neural vocoder incorporating more ideas from signal processing to achieve better performance.
"Even if we do not have a vaccine, our plan is that we will be able to deliver the Games"
@Tokyo2020
spokesman Masa Takaya tells
#TheNine
's
@BBCchrismclaug
that organisers are planning for the postponed Olympic Games to go ahead this summer with spectators present.
We are hiring for a wide variety of research roles within
@GoogleAI
.
See our web site, or if you're at
@NeurIPSConf
, stop by the Google booth!
(Someone wondered if we weren't hiring bc I hadn't tweeted about this, so trying to fix this impression).
It had been 8 months since I signed the offer letter. Today is my first day as a Senior Research Scientist at Google Brain team based in Tokyo.
I am so grateful for all the chances and supports I receive. I will do my best.
よろしくお願いします。
I like Hanawa Hokiichi doodle.
WaveGrad 2 -- Iterative Refinement for Text-to-Speech Synthesis
"WaveGrad 2 is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence. "