Generating image with text and optimizing it with image restoration model to match the input
Multimedia content