Author Topic: Is AI Stealing My Art? (Read 579 times)

parkdalegardener · « **on:** January 14, 2023, 02:07:44 PM »

So, did AI stealing your artwork? Yes, no, and maybe. Is AI stealing your artwork? No and maybe. It has come to my attention that some here are of the impression that AI "prompted art" is simply regurgitating the images, or parts of; that they were trained upon. This is an incorrect assumption. Diffusion doesn't work that way.

Without going into a crash course on machine learning I'm going to try and help you understand how this stuff works and a bit of the history. It starts with autonomous driving and driving assists in your car. Computer Vision is done by training an AI on a whole lot of images of the same thing. They are cropped and scaled into a particular size for training. Usually square. You tell the training what they are with a text description. This is a car. Give the AI a crap load of random car images and that text description. You tell the AI you are training that you are 100% sure that the images are indeed cars. You flip them around a few times, maybe reverse them; possibly even rescale them. No matter what, they are still images of cars suitable for training.

In addition to training the AI on car images only, a group of other non-car images are added into the mix. The text descriptions will say what they are, which is not cars. You tell the AI that these images in the mix are not cars. You are 100% sure of that. These are "control" images. You flip, rotate, scale or whatever the images in the same way as the car images. These are still not cars. The trained AI is then able to recognize a car when is sees one (hopefully) and knows that it isn't a tree or possibly a person.

Where does one get these training images? After all, you need a huge number of car images to train with and a large number of non-car images as well. Google is your friend. Way more than you might think. A single line of Python or Linux will give you all the images of your criteria that you can ask for. Straight from the Google API. In some respect a training image is in the data set though it has been heavily manipulated. If your image is in such a training dataset, it could be recovered with a lot of work.

parkdalegardener · « **Reply #1 on:** January 14, 2023, 02:08:55 PM »

That is where a lot of misconception comes from. Diffusion models; text to image generators, are trained differently. They start with an image of a car like CV models but they process it differently. The image is broken down into a pixel map and the training model is told that the image is a car. A noise pattern at the pixel level "diffuses" the image. Like static on an old TV if anyone here remembers that. The AI is told "That's a car" and another round of noise is added to the image and the AI is told "That is a car" and another round of noise ...... Sooner or later all you have is random noise but the AI is still learning "car" till you stop the whole process from looping forever.

The images used to train have to be paired with descriptions of what they are being trained on. Tags. We tag everything we post on line for the most part and those are the training tags for the diffusion model. We post a pic on Farcebook of our new car. An ABC Electric Street Cruiser. If that particular picture was scraped for the diffusion model it would also train on whatever other tags are on that image. If you never tagged the image as your new "car" then it may train the diffusion model that your new car is actually some type of electric navy boat, with wheels; that travels in the city like a streetcar. AI is dumb.

This is where LAION comes in. Large-scale Artificial Intelligence Open Network. They are the folks that scraped the net for image/tag pairs. In the case of the current LAION-5B dataset used to train Stable Diffusion 5.85 Billion. Yes; that's B as in billions of tagged datasets.

The other part of the text to image generation is the text part. CLIP. Contrastive Language-Image Pre-training. Clip is simply a way to use more natural language to train the AI and for people to use more conversational language to obtain a result from it. In addition to the tag it also has an aesthetics score as rated by a person.

You can freely look into the training images and the associated meta data to see if you were part of the training and request removal if you feel any such in formation was obtained incorrectly for inclusion into the model.

parkdalegardener · « **Reply #2 on:** January 14, 2023, 02:10:45 PM »

Let's put this together. With the current AI models you type a description of what you want to see in a somewhat natural language. The prompt. The AI then uses the CLIP to try and figure out just exactly you want. The AI then de-noises the random noise picture it generates, on a pixel by pixel bases; as many times as you have told it to. The "magic" of the AI is that it tries to figure out how to resolve the noise into a coherent image. Pixel by pixel using the CLIP to determine the outcome and try and make sense of the noise. The result constantly changing each iteration depending upon the de-noising logarithm and how pleasant or unpleasant the current iteration is. CLIP's aesthetics score.

This is important. If we allowed this type of computer vision in our cars we would be hitting Ladas because they are not considered as "good" or "pretty" a car as a Lincon or a Porche. A car learned by traditional computer vision is not subject to these types of interpretation. Diffusion models do not contain any representation of the training data in the form of a static image as we know it. They amalgamate all the information on what a car is, how pleasing a particular shape may be, a favourite colour or paint pattern, number of windows. Whatever the CLIP, and the humans scoring a training image; thought was important at that time of classification. That's what defines the outcome of a text to image request. That's what defines the image as it converges upon a solution to your request.

If a single pixel of every image that the LIAON-5B base was trained upon, and a single byte of ascii text information to describe it, and a single byte to log it's position in the database were added all together; we would be into the petabytes of info stored inside the diffusion model. The original model released to the public in August of this year; yep that short a time ago; is 4 gig. Yes sir. 4 gig. There is no way to store 5 billion images, either in whole or in part; in a 4 gigabyte file irregardless of the compression method used. It's not about the images in the model because there are no images in the models.

Prompt your text to image program with no prompt. You will still get an image output. If you don't tell the CLIP what you want it just looks at the noise and desperately tries to figure out if there is a pattern. Just like you do when you see random noise. It uses what ever de-noising you tried for a single step (iteration) and tries again. It has no guidance other than the slightly resolved image of noise and an attempt by the CLIP to resolve it into something "pretty" or "aesthetically pleasing." It will continue doing so till you say stop.

You smart phone does this with the people remover type feature on the camera. Some random person tries to "bunny ear" your selfie by throwing a peace sign behind you as they walk by. Your smart phone AI de-noises the interloper right out of there. Crisis averted. The phone AI never needed an image of no photobomber to remove the photobomber. It projects what it thinks is the correct background to replace the interloper. You decide how well it is doing.

parkdalegardener · « **Reply #3 on:** January 14, 2023, 02:11:41 PM »

In action the diffusion process looks like this. We start with random noise and denoise it one step or iteration at a time. I will start with "car" as a prompt. I give the AI a de-noise logarithm to work with, Euler Ancestral; and say go to work and refine the noise 20 times using Euler a. This would result in a picture of a car, of some type of quality, in some type of style; painted whatever.

parkdalegardener · « **Reply #4 on:** January 14, 2023, 02:12:47 PM »

CLIP will look at each iteration and try to figure out how to get a car out of the noise. With no other info than car it can't just scroll through a database and find a car to display. It needs to try and figure out just what you're wanting in an image of a car; and without you to guide it CLIP starts making guesses about shape, style, colour and in our case if we should be inside or outside of the auto. The evilly long photo strip is to show the evolution of a prompt into an image. As you can see; there are certainly no training images there. They are all really bad. This is done with the original 1.4 weights.(model)

parkdalegardener · « **Reply #5 on:** January 14, 2023, 02:13:44 PM »

Now If I change to the newest 2.1 model and run the exact same prompt, using all the same settings and initial random seed; I get a different strip of output images. The 2.1 model is trained on exactly the same dataset as the 1.4, 1.5, and 2.0 models. Even though they use exactly the same training data they do not produce the same output. The difference is the level of the CLIP training. In an effort to respect intellectual property rights, celebrity likenesses, and living artists names and styles; as well as reducing the NSFW aspects of training on an open internet; the CLIP has changed to try and respect these factors.

parkdalegardener · « **Reply #6 on:** January 14, 2023, 02:15:13 PM »

Let's recap this wall of text.

Can I get a training image back as output to a prompt? No in the case of a diffusion model. There is no training image to return.

Is my art/IP being ripped off? Possibly, but it can't be reproduced as originally scraped from the internet. If you are worried, there are tools to query the dataset and tell you if you are part of the training. If you are, and wish to be removed; it can be made to happen. If you are a living artist, you are most probably removed from the current CLIP. Especially if your work is considered NSFW. You get removed twice as often. If you are Disney, WETA, or anyone else with IP you want respected; you are gone or can be made to be gone if you have yet to be removed due to your classification.

Is your artistic style is being ripped off and your name devalued or defamed? I'm not a lawyer and I'm not here to defend anyone. Get a solicitor and leave me out of it. I don't use other folks names or styles. I have better ways to use the technology than trying to mimic someone else as demonstrated in the other thread.

Can putting anti AI tags or random bull shit as tags screw up the training of future models and defeat the random web crawlers scraping your images? Nope. Not really and I'll explain later. I've been one finger typing this for more than 7 hours now trying to keep it simple. I'm tired.

Finally; and this has yet to be fully answered here by me, are people or websites stealing my current work to train their AIs? A resounding YES. That is exactly what is currently happening with open source diffusion models going behind pay walls and becoming black boxes you cannot see inside of. More on this rant at another time.

parkdalegardener · « **Reply #7 on:** January 15, 2023, 10:53:01 AM »

Well; it's another time and my finger has rested.

"But pdg," you say; "I see on Tweeter, Farcebook, Instagroom, and Tic Tacky where peoples styles are being ripped off by these so called "art" programs. My fantastic artwork might be next. I gotta watermark it right away so it can't be used for training 'cause the site I have my work on is scanning it for some model without my permission."

Yeah, let's look at this a bit. Remember how I said these diffusion models are trained? That if you don't stop the training it goes on forever. A kind of feedback loop that just keeps going feeding more and more noise into the random noise that, in turn; is paired with the CLIP. Now for the propeller head terms. What we laymen call "models" are actually checkpoint saves while the weights calculate. Huh? Think of autosave while you are playing a game or working on a text document. "Weights" are the game you are playing. The diffusion process itself. Checkpoints are incremental saves every so many training steps. You define how many training steps between saves. These checkpoint saves are the "models" we use.

Remember LAION? More than five billion image/tag pairs for training. How many images have you produced in your lifetime and posted on the internet? Enough to train a model on your work? Nope. No way. You haven't done it. Get over yourself, you are not that prolific and your artwork is not that well known. No one has spent the hundreds of thousands of computational hours of training time, on more GPU power than you and 10 of your friends can afford, to rip off your style so they can produce fake art exactly like yours, so as to lessen your brand and influence your ability to create or sell original product.

parkdalegardener · « **Reply #8 on:** January 15, 2023, 10:53:58 AM »

However; there is some truth behind the panic. Due to the way the training never stops, you can teach a model (checkpoint) new terms it is not familiar with. A popular application for this is called Dreambooth. People can upload a few properly tagged images of themselves, and after a while the software spits out a new model you are included in, along with everyone else that may have been part of the training. The same thing can be done with "style." Upload a number of representative samples of one's artwork, with the relevant tags; and you can train the software to output it's results in that style of work.

However; and this is a big however; training on top of a checkpoint (model) makes this training the more important than previous training. Think game save again. The difference between the last save game and your new save game is everything you have done since the last save. This makes your new save game way more important than your last saved game. Some things may have been overwritten or removed by new additions to a map or inventory since the last save. Adding stuff into an existing checkpoint via Dreambooth or other method changes the checkpoint and like a saved game those changes are the most important thing in the file.

Remember that word "weights" that propeller heads use. It starts to make a bit more sense now. Stuff added to the checkpoint (model) at a later time has more "weight" than stuff that has already been taught. These weights also affect the output by putting more emphasis on something learned later in training than something learned earlier in the process.

Which brings us to watermarks and anti AI logos and the rest. Put whatever you want across your image. Should your artwork somehow become included in a training set, with or without your permission; marks across your image won't help you. Remember, denoising. If you want to keep your stuff save from dataset inclusion stop tagging it. Dataset pairs. No pair. No data for inclusion to the set. So then I should just mislabel my fantastic art and if it gets crawled it should be safe from the evil AI and I can screw with it's training at the same time. Nope. Not really. More than 5 billion dataset pairs. You can tag the picture of your new mortorcycle as a motorboat or a hockey puck. It won't really matter. Your minimal inclusions into that big a dataset won't screw the results for forthcoming models.

parkdalegardener · « **Reply #9 on:** January 15, 2023, 10:54:42 AM »

"You're full of crap pdg. I've seen an artist's style get ripped off." Nope. You haven't. You have seen the weights influenced by a few dozen training images that in some cases may not even actually be work by a particular artist. The AI will continue to generate with the "style" of said artist; but in reality it is skewing every result every time. It just applies what it was taught in dataset pairs. What the trainer claims is the "style" of the person trained upon; it is actually what the trainer thinks is the style as denoted in the associated CLIP. The training leverages what already exists in the weights by shoehorning your data into an already existing weight. The "style" is actually modified by the training already in place before the addition. That is why when someone adds their face via Dreambooth almost every result returned upon your new checkpoint (model) looks like the training images. This is sometimes called "overfitting." Training so hard to get a face or style that the model cannot output anything not modified by the new training. Everyone output looks like the person you trained on. Everyone. You are not seeing stolen artwork or styles, you are seeing over fit models that can't do anything else as well as the original model. Some info had to be removed or modified to accept your new addition. Just like saving a game after you've played a bit. Your old save may be overwritten by the new one.

Jherrith	64448
sidherose	28122
Fafnir	15924
McGrandpa	14420
rrkknight3	7898
Agent0013	6988
FrahHawk	6184
M-Callahan	6071
Margy	5468
Radkres	5048
MarciaGomes	4519
Star4mation	4046
Dreamer	3918
MilosGulan	3748
Neimrok	3638
Wizzard	3581
dRaCX	3577
Paul	3138
parkdalegardener	3087
Noshoba	2939
RodS	2600
Katt	2554
Twisted.Illusions	2009
Scouseaphrenia	2000
fruity	1960
HadCancer	1748
Carolann	1746
sanbie	1659
Nemesis	1629
deeleelaw57	1580
AmirA	1542
Hipshot	1454
CalieVee	1392
Napalmarsenal	1344
AngellsGraphics	1252
Bea	1156
Heitaikai	1069
Burpee	971
heavenlee	948
ArtByMivan	744
roni190	720
Madbat	656

Chat Room

Attic Donations

Vote for site! 2024

Where Can We GO?

Shop Your Way @ FRM

Featured Art

High Roller Candidates

Change you look!

In the Seelie Court

Giveaway of the Day

Author Topic: Is AI Stealing My Art? (Read 579 times)