Is AI Stealing My Art?

parkdalegardener · January 14, 2023, 02:11:41 PM

In action the diffusion process looks like this. We start with random noise and denoise it one step or iteration at a time. I will start with "car" as a prompt. I give the AI a de-noise logarithm to work with, Euler Ancestral; and say go to work and refine the noise 20 times using Euler a. This would result in a picture of a car, of some type of quality, in some type of style; painted whatever.

parkdalegardener · January 14, 2023, 02:10:45 PM

Let's put this together. With the current AI models you type a description of what you want to see in a somewhat natural language. The prompt. The AI then uses the CLIP to try and figure out just exactly you want. The AI then de-noises the random noise picture it generates, on a pixel by pixel bases; as many times as you have told it to. The "magic" of the AI is that it tries to figure out how to resolve the noise into a coherent image. Pixel by pixel using the CLIP to determine the outcome and try and make sense of the noise. The result constantly changing each iteration depending upon the de-noising logarithm and how pleasant or unpleasant the current iteration is. CLIP's aesthetics score.

This is important. If we allowed this type of computer vision in our cars we would be hitting Ladas because they are not considered as "good" or "pretty" a car as a Lincon or a Porche. A car learned by traditional computer vision is not subject to these types of interpretation. Diffusion models do not contain any representation of the training data in the form of a static image as we know it. They amalgamate all the information on what a car is, how pleasing a particular shape may be, a favourite colour or paint pattern, number of windows. Whatever the CLIP, and the humans scoring a training image; thought was important at that time of classification. That's what defines the outcome of a text to image request. That's what defines the image as it converges upon a solution to your request.

If a single pixel of every image that the LIAON-5B base was trained upon, and a single byte of ascii text information to describe it, and a single byte to log it's position in the database were added all together; we would be into the petabytes of info stored inside the diffusion model. The original model released to the public in August of this year; yep that short a time ago; is 4 gig. Yes sir. 4 gig. There is no way to store 5 billion images, either in whole or in part; in a 4 gigabyte file irregardless of the compression method used. It's not about the images in the model because there are no images in the models.

Prompt your text to image program with no prompt. You will still get an image output. If you don't tell the CLIP what you want it just looks at the noise and desperately tries to figure out if there is a pattern. Just like you do when you see random noise. It uses what ever de-noising you tried for a single step (iteration) and tries again. It has no guidance other than the slightly resolved image of noise and an attempt by the CLIP to resolve it into something "pretty" or "aesthetically pleasing." It will continue doing so till you say stop.

You smart phone does this with the people remover type feature on the camera. Some random person tries to "bunny ear" your selfie by throwing a peace sign behind you as they walk by. Your smart phone AI de-noises the interloper right out of there. Crisis averted. The phone AI never needed an image of no photobomber to remove the photobomber. It projects what it thinks is the correct background to replace the interloper. You decide how well it is doing.

parkdalegardener · January 14, 2023, 02:08:55 PM

That is where a lot of misconception comes from. Diffusion models; text to image generators, are trained differently. They start with an image of a car like CV models but they process it differently. The image is broken down into a pixel map and the training model is told that the image is a car. A noise pattern at the pixel level "diffuses" the image. Like static on an old TV if anyone here remembers that. The AI is told "That's a car" and another round of noise is added to the image and the AI is told "That is a car" and another round of noise ...... Sooner or later all you have is random noise but the AI is still learning "car" till you stop the whole process from looping forever.

The images used to train have to be paired with descriptions of what they are being trained on. Tags. We tag everything we post on line for the most part and those are the training tags for the diffusion model. We post a pic on Farcebook of our new car. An ABC Electric Street Cruiser. If that particular picture was scraped for the diffusion model it would also train on whatever other tags are on that image. If you never tagged the image as your new "car" then it may train the diffusion model that your new car is actually some type of electric navy boat, with wheels; that travels in the city like a streetcar. AI is dumb.

This is where LAION comes in. Large-scale Artificial Intelligence Open Network. They are the folks that scraped the net for image/tag pairs. In the case of the current LAION-5B dataset used to train Stable Diffusion 5.85 Billion. Yes; that's B as in billions of tagged datasets.

The other part of the text to image generation is the text part. CLIP. Contrastive Language-Image Pre-training. Clip is simply a way to use more natural language to train the AI and for people to use more conversational language to obtain a result from it. In addition to the tag it also has an aesthetics score as rated by a person.

You can freely look into the training images and the associated meta data to see if you were part of the training and request removal if you feel any such in formation was obtained incorrectly for inclusion into the model.

parkdalegardener · January 14, 2023, 02:07:44 PM

So, did AI stealing your artwork? Yes, no, and maybe. Is AI stealing your artwork? No and maybe. It has come to my attention that some here are of the impression that AI "prompted art" is simply regurgitating the images, or parts of; that they were trained upon. This is an incorrect assumption. Diffusion doesn't work that way.

Without going into a crash course on machine learning I'm going to try and help you understand how this stuff works and a bit of the history. It starts with autonomous driving and driving assists in your car. Computer Vision is done by training an AI on a whole lot of images of the same thing. They are cropped and scaled into a particular size for training. Usually square. You tell the training what they are with a text description. This is a car. Give the AI a crap load of random car images and that text description. You tell the AI you are training that you are 100% sure that the images are indeed cars. You flip them around a few times, maybe reverse them; possibly even rescale them. No matter what, they are still images of cars suitable for training.

In addition to training the AI on car images only, a group of other non-car images are added into the mix. The text descriptions will say what they are, which is not cars. You tell the AI that these images in the mix are not cars. You are 100% sure of that. These are "control" images. You flip, rotate, scale or whatever the images in the same way as the car images. These are still not cars. The trained AI is then able to recognize a car when is sees one (hopefully) and knows that it isn't a tree or possibly a person.

Where does one get these training images? After all, you need a huge number of car images to train with and a large number of non-car images as well. Google is your friend. Way more than you might think. A single line of Python or Linux will give you all the images of your criteria that you can ask for. Straight from the Google API. In some respect a training image is in the data set though it has been heavily manipulated. If your image is in such a training dataset, it could be recovered with a lot of work.

Jherrith	62550
Fafnir	16702
McGrandpa	16110
rrkknight3	8078
Agent0013	7377
M-Callahan	6104
FrahHawk	6030
Radkres	5511
Margy	4811
MarciaGomes	4340
Star4mation	4017
Dreamer	3975
Wizzard	3548
dRaCX	3535
Neimrok	3492
MilosGulan	3418
parkdalegardener	3096
Paul	3070
Noshoba	3059
sanbie	2609
RodS	2602
Katt	2553
Nemesis	2150
Twisted.Illusionz	2045
fruity	1960
Carolann	1769
HadCancer	1747
Scouseaphrenia	1731
deeleelaw57	1580
Hipshot	1457
AmirA	1416
CalieVee	1398
Napalmarsenal	1344
AngellsGraphics	1251
Bea	1157
Heitaikai	1063
Burpee	971
heavenlee	939
Skhilled	913
Petege	840
ArtByMivan	744

The Gathering!

Chat Box

McGrandpa

Zeus Fx

Hipshot

Skhilled

DarkAngel

Skhilled

DarkAngel

McGrandpa

McGrandpa

DarkAngel

Vote for our site! 2025

Attic Donations

Going Somewhere, hmmm?

@ FRM

Featured Art

High Roller Candidates

In the Seelie Court

Stats

Members

Stats

Users Online

Giveaway of the Day

Is AI Stealing My Art?

parkdalegardener

parkdalegardener

parkdalegardener

parkdalegardener