top of page
Writer's picturefridahoeft

#1 Generating individual letters with a diffusion model

In this post I explain my very first experiment on this blog. Using Google colab I trained a diffusion model with the letter A and this is what came out:

samples at 40.000 training steps

Set-up

In 2021, Prafulla Dhariwal and Alex Nichol, both scientists at OpenAI, published a paper on diffusion models and revolutionised the AI tools industry. Their revised model architecture can generate images from random noise and is supposed to offer some advantages over GANs, at that time state of the art. "GANs are often difficult to train, collapsing without carefully selected hyperparameters and regularizers".

My access to a diffusion model was via a notebook by Alex Spirin in Google Colab. Google Colab allows Python code to be run in the browser, accessing the powerful Google hardware. For the resource-intensive process of training a model, access to more powerful GPUs and TPUs has been necessary. First, an arbitrary, already trained model (in my case, trained with images of bedrooms) is loaded into the notebook and then fine-tuned with the user‘s own data. What effects the already trained model has on the own output was not known before the experiments. This procedure saves time, although the training process still took days in each of my experiments. Training takes place in so-called training steps, and is theoretically an infinitely continuous process. In these experiments, the status of the trained model is saved in my cloud every 5,000 training steps. I can access these caches and sample individual images. The model makes it possible to generate images with a resolution of up to 512x512 pixels. In the following experiments, a resolution of 256x256 pixels was chosen for technical reasons. The training data used in the experiments was created manually to ensure the quality of the data. Here, special attention had to be paid to the consistent size of the A‘s, as well as the same baseline.


samples at 40.000 training steps



17 views0 comments

Comments


bottom of page