This project focuses on generating images from text prompts in multiple languages using the Stable Diffusion model. The generated images are then evaluated for quality and relevance using the CLIP model, which measures the alignment between text and image. The aim is to assess the effectiveness of multilingual text-to-image generation and improve the coherence and realism of the produced visuals.
First install the dependencies
pip install -r requirements.txt