The Future of Image Models Is Multimodal
The Future of Image Models Is Multimodal  
Podcast: AI + a16z
Published On: Fri Jun 07 2024
Description: In this episode, Ideogram CEO Mohammad Norouzi joins a16z General Partner Jennifer Li, as well as Derrick Harris, to share his story of growing up in Iran, helping build influential text-to-image models at Google, and ultimately cofounding and running Ideogram. He also breaks down the differences between transformer models and diffusion models, as well as the transition from researcher to startup CEO.Here's an excerpt where Mohammad discusses the reaction to the original transformer architecture paper, "Attention Is All You Need," within Google's AI team:"I think [lead author Asish Vaswani] knew right after the paper was submitted that this is a very important piece of the technology. And he was telling me in the hallway how it works and how much improvement it gives to translation. Translation was a testbed for the transformer paper at the time, and it helped in two ways. One is the speed of training and the other is the quality of translation. "To be fair, I don't think anybody had a very crystal clear idea of how big this would become. And I guess the interesting thing is, now, it's the founding architecture for computer vision, too, not only for language. And then we also went far beyond language translation as a task, and we are talking about general-purpose assistants and the idea of building general-purpose intelligent machines. And it's really humbling to see how big of a role the transformer is playing into this."Learn more:Investing in IdeogramImagenDenoising Diffusion Probabilistic ModelsFollow everyone on X:Mohammad NorouziJennifer LiDerrick Harris Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.