top of page
Writer's picturefridahoeft

How to train a diffusion model as a graphic designer with no coding skills

AI is coming into our lives and it's not stopping at the creative world. In this post, I explain how I used google colab to train my own diffusion models to generate visual output. No prior knowledge of programming required!



Basic functioning

There are basically two types of programs. Marcus du Sautoy, professor of mathematics at Oxford University, refers to these as top-down and bottom-up strategies. The top-down strategy is fixed-coded rule-based programming, in which programmers give explicit instructions to the program to act from above. This allows the programs to interpret data (e.g. individual pixels in an image) but not recognise any context in it. Du Sautoy refers to this as machine vision. These programs are not considered AI tools and are not discussed further here.


As a further development, there is the bottom-up strategy, more commonly machine learning or statistical learning. These programs can independently extract logical conclusions from individual data and identify statistical correlations. This works because the parameters in the learning process are constantly adjusted by the program itself as more new data input is added. One disadvantage of self-learning AI tools is that the solutions they find to solve problems no longer make sense to a human observer. There are two approaches to machine learning: supervised learning and unsupervised learning. In supervised learning, the program is trained with labelled data sets. These must therefore be created in advance by humans. In unsupervised learning, the input to the program consists of unlabelled raw data. In both approaches, the program tries to identify relationships between the data. Although these AI tools are different and the architecture behind them varies from model to model, a basic scheme can be recognised: input > process > output.

The input is usually in the form of huge data sets. This is made available to the model as training data, allowing the model to learn. Learning can be summed up as the process. After this phase, the now trained model can generate an output. It is important to understand that the model does not copy, collage or manipulate the input, but can generate completely new output.

In my experiments, I mainly work with diffusion models. In this case, a data set of images can be used as input. In the process, the model adds gaussian noise to the images (forward process) and then learns to remove it again (reverse process). The model then uses this learned reverse diffusion process to generate new images, the output.


Actual training in Colab

My access to a diffusion model was via a notebook by Alex Spirin in Google Colab. Google Colab allows Python code to be run in the browser, accessing the powerful Google hardware. For the resource-intensive process of training a model, access to more powerful GPUs and TPUs has been necessary. First, an arbitrary, already trained model (in my case, trained with images of bedrooms) is loaded into the notebook and then fine-tuned with the user‘s own data. What effects the already trained model has on the own output is not known. This procedure saves time, although the training process still took days in each of my experiments. Training takes place in so-called training steps, and is theoretically an infinitely continuous process. In these experiments, the status of the trained model is saved in my cloud every 5,000 training steps. I can access these caches and sample individual images. The model makes it possible to generate images with a resolution of up to 512x512 pixels. In the following experiments, a resolution of 256x256 pixels was chosen for technical reasons. The training data used in the experiments was created manually to ensure the quality of the data. Here, special attention had to be paid to the consistent size of the A‘s, as well as the same baseline. In this chapter, only individual samples selected by me are shown. This selection procedure will be discussed further in chapter 5. In experiments #1 to #5, I decided to generate the letter A (as a majuscule). This is approximately mirror-symmetrical in one axis and should make it easier for the model to generate valid output. In the following posts, the individual experiments that I conducted with diffusion models are described and explained.







16 views0 comments

Comments


bottom of page