About the AI Technology

What technology do you use?

We are using a text to image model called stable diffusion for the majority of operations.

Currently we are using stable diffusion version 2.1 with 512 x 512 native output.

  • Why 512 when there is a 768 version. We chose this to make credits as cheap and accesible as possible.

How does this technology work?

Artsmart’s text to image generation is built on Diffusion Neural Networks.

A neural network uses interconnected processing units called "neurons" to analyze data and make decisions based on that data.

It’s called a “neural network” because it’s inspired by the way the human brain uses a network of neurons to think. It can be used for tasks such as image and speech recognition, language translation, and making predictions.

Neural networks are particularly good at recognizing patterns and making decisions based on those patterns.

So how do we get images from text?

To avoid your eyes glazing over we’ll give a rough overview - this example is heavily borrowed from a reddit post by PhyrexianSpaghetti

We take a picture of a thing, let's say a dog, and we tell the neural network, "Hey computer, please gradually turn this picture into noise and memorize every step while doing it 504 times.”

image
image

Now we take a picture of actual random noise and we tell the computer, "Hey computer, please play the 'dog to noise algorithm' but reversed.”

image

Wow, it’s not the same dog!

image

Now, if we teach the computer about the color black, and we then ask, "Hey computer, please play the 'color black to noise algorithm' but reversed, and the 'Dog to noise algorithm' but reversed, at the same time.”

image
And just like that we have a Black Dog!

Deep Dive?

If you want to deep dive a bit further here’s a great video about (starting at 5:50) about how these types of technology work.