Released 10/2024
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz, 2 Ch
Genre: eLearning | Language: English | Duration: 18h 16m | Size: 4.1 GB
From variational autoencoders to Stable Diffusion with PyTorch and Hugging Face.
Overview
Programming Generative AI is a hands-on tour of deep generative modeling, taking you from building simple feedforward neural networks in PyTorch all the way to working with large multimodal models capable of simultaneously understanding text and images. Along the way, you will learn how to train your own generative models from scratch to create an infinity of images, generate text with large language models similar to the ones that power applications like ChatGPT, write your own text-to-image pipeline to understand how prompt- based generative models actually work, and personalize large pretrained models like stable diffusion to generate images of novel subjects in unique visual styles (among other things).
About the Instructor
Jonathan Dinu is currently a content creator and artist working with deep learning and generative AI. Previously, he was a PhD student at Carnegie Mellon University before dropping out to ultimately pursue the less academic and more creative side of machine learning. He has always loved creating educational content, going back to his days as a co-founder of Zipfian Academy, an immersive data science bootcamp, where he had the opportunity to run workshops at major conferences like O'Reilly Strata and PyData, create video courses, and teach in person.
Skill Level
Intermediate to advanced
Learn How To
Train a variational autoencoder with PyTorch to learn a compressed latent space of images
Generate and edit realistic human faces with unconditional diffusion models and SDEdit
Use large language models such as GPT2 to generate text with Hugging Face Transformers
Perform text-based semantic image search using multimodal models such as CLIP
Program your own text-to-image pipeline to understand how prompt-based generative models such as Stable Diffusion actually work
Properly evaluate generative models, both qualitatively and quantitatively
Automatically caption images using pretrained foundation models
Generate images in a specific visual style by efficiently fine-tuning Stable Diffusion with LoRA.
Create personalized AI avatars by teaching pretrained diffusion models new subjects and concepts with Dreambooth.
Guide the structure and composition of generated images using depth- and edge- conditioned ControlNets
Perform near real-time inference with SDXL Turbo for frame-based video-to-video translation