Generating text with a possibility to guide it with an image. Trained on COCO dataset
Text and written content