I first scraped 3838 images from Flicker using the keyword “coffee”. Then, I uploaded the dataset to Runway. I used StyleGAN2 with Flickr Faces pre-trained model and 3000 steps to train my coffee model. After 5 hours, my model produced a FID 92.01. I suspect the high value is due to limited number of steps and the complexity of the dataset. Some of the coffee images had people while others did not. Some images consisted of brewed coffee while other were of coffee beans.
Here are some of my favorite synthesized images: