top of page
Writer's picturefridahoeft

#11 Trying DALL-E to generate letters

In this experiment i tried out two versions of the famous AI tool DALL-E. Can this text-to-image tool generate appropriate letterforms?



ruDALL-E

In experiment #11, the model DALL-E, original from OpenAI, is tested for generating letters. A very low-threshold access is available via the website https://rudalle.ru/ or the Telegram bot. ruDALL-E has meanwhile released several updates of their model. In this experiment I will work with ruDALL-E Malevich (XL). The architecture is based on the so-called Transformer, it consists of an encoder and a decoder. In the description of the Russian developers, the 250 million image-text pairs from OpenAI and 30 million more from CogView are given as training data. The individual parameters are not adjustable and the model generates images with a size of 1024x1024 pixels.

All in all, it is clear that the model is mostly unable to generate the properties of what is requested. The prompt "A black letter on white background" reliably generates matching images, with the other prompts a few images match. It is also noticeable that the generated images have fragments of watermarks. This gives an indication that a large amount of licensed stock images were used as training data. Also, there was no focus on letters or text in the training data: Key domains were people, animals, celebrities, interiors, landmarks and landscapes, different types of technology, activities of people, emotions which gives an explanation for the insufficient generation of the images.


minDALL-E

Brett Kuprel has made available a "fast, minimal port of Boris Dayma‘s DALL-E Mini"" in a GitHub repository, which can be used via Google colab. In this notebook, some parameters can be adjusted by the user, such as the supercondition_factor. The higher this value, the greater the match between the generated image and the prompt. But a higher value also leads to a smaller variety of generated images. In this experiment, the value is set to 16. The images can be saved in a resolution of 768 × 768 pixels.

The generated images are mainly recognizable as letters or text, but often do not match the prompts. The only prompt that works reliably is "A letter A", or "The letter b on a white background". These produce nine majuscules A‘s, or majuscules B‘s (lower case is not recognized) on a white background. If the letters in the prompt are in quotation marks, the results are not as accurate. The white background is reliably recognized.



The generated results of the two DALL-E versions do show similarities, minDALL-E is more capable of representing the attributes of the requested letters.



13 views0 comments

Commentaires


bottom of page