An “agnostic” view of AI art ethics and the future of art (Part 2)

Part 1 was a general outline of what AI Art is and a general outline of the controversies as explained by other people. Today I investigate for myself what is really going on.

Time to try out this AI Art thing for myself

Why?

  • 1: Artist like myself are going to need to learn this stuff, or at least be familiar with it.
  • 2: I want to know what the fuss is all about.
  • 3: The online art generators are limited, allowing only so many “free” images before they start charging you.
  • 4: The popular apps like Lensa wants you to upload pictures of yourself to make stylized avatars, but the terms of service indicate they can use your uploaded photos for further AI training.

The best way around this is for me to install a free copy of the open source Stable Diffusion on my own computer so I can run it on my own hard drive, so I don’t have to worry about artificial limitations. Warning: You need a really good computer with lots of memory and a high end video card of some sort. I installed Stable Diffusion and the 1-5-pruned-emaonly.cptk file as the “standard stable diffusion” database, it’s about 4GB big. There are bigger ones at better resolution, but I’m here to investigate only, no plans to do this commercially.

I also downloaded for experimental purposes the dreamlike-diffusion-1.0.cptk file as an alternate to test with. It is only 2 GB in size. This proved very educational with regard to what a cptk file is and what it is capable of. More on that later.

Using the standard 1.5 file, I typed in “A woman walking in a forest near a waterfall” (with a seed of 5) at 1280×720 resolution and got the above picture. It is pleasing looking picture for sure, but not really that close to what I typed. There are two women, not one, there are three waterfalls, not one, and it looks like it just cut and pasted some random pictures together that fit the description.

This is pretty typical of all the online AI tools as well. I tried them all and they all kind of do this, give you something related to what you typed in that looks lazily created.

At this point you start experimenting, trying more precise prompts, try different language tricks. The “Img2Img” tool allows you to take the generated image above and zoom in on one of the women, and type a more precise prompt like “Sandra Bullock with black hair and big brown eyes walking in a forest away from a waterfall”

Yes you can generate pictures of celebrities, as long as there are a lot of picture of them in the database.

Zoom in on the face and it generates a picture with even more face detail.

Ok, we are now in freaky and ethically questionable territory here.

Just a heads up, if you tried experimenting with the online generators and got freaky looking faces and eyes, that is easily fixable with the offline generators thanks to an open source program called GFPGAN that post processes the AI results to fix all the faces. It is so useful its installation is part of the normal install. To enable it in automatic 1111 you just check the box “Restore faces”. Without it Sandra would have multiple deformed irises.

To create my own models I used Dreambooth which allows you to add your own pictures to a cptk file and set a unique name. So I added about 20 512x512px jpg renders of Ariane at different angles and expressions and called the model “dateariane” so that I don’t get a lot of Ariane rocket pictures when I use the name “Ariane”.

Yes it is not a perfect match but looks pretty good. I actually did a bunch of different attempts to get this. That’s the thing with AI art, it will usually take you a dozen or so tries to get close to what you want. Hope you have a fast computer. Every change of the “seed” and a push of the “Generate” button is like a dice roll, and most turn out to be a complete mess of multiple faces and distorted bodies with extra limbs.

Note: Always use a positive seed and if you don’t like the result change it, the seed number saves in the saved file name so you can redo a picture later. Seeds are used to create random numbers to create pictures, there is no telling what will come from which seed, so it is just a weird game of random image roulette.

Adding “Sandra Bullock” to the end of the prompt mixed the two models together:

Yeah, I know right?

The Power of using different Checkpoints

Dreamlike Diffusion is a different checkpoint that is half the size of the “standard” database used above. As you can tell the Ariane and Rachel models have more exaggerated features: stronger nose, stronger jaws, stronger eyebrows. Despite looking different from their source photos, the faces of the two models look like the other pictures using the same named models. Then I had to play with the models a bit:

If your goal is to look like real people, the dreamlike checkpoint is not the way to go. Pictures have a very different “caricature” styles under dreamlike, which helps to distinguish these from real photos, so maybe that is a good thing?

Except I wonder if in order to create these stylistic faces, they filled the checkpoint with artists that specialize in this style of art. “Dreamlike diffusion” is designed to imitate the “Midjourney” AI which is known for stylistic fantasy elements, which as pointed out in part 1 is filled with stolen art. Except for the above mentioned celebrity mixes, I try to avoid specific living artists and people. A lot of AI artists love adding names like “Greg Rutkowski” to their prompts to imitate his style, and Greg has come out and stated his displeasure with the situation.

“Myths” blown out of the water

I want to take this one step further. Here are two “Rachel” pictures using the two different checkpoints:

These women are supposed to be “Rachel” based on models I added to Stable Diffusion standard on the left (after 43 failed tries), and Dreamlike diffusion on the right. I showed you my “fix” above to make them prettier, but I am using clean prompts and identical seeds here to demonstrate an important point:

Not sure if you watched the video at the bottom of the part 1 post, but they were worried about certain “myths” regarding AI art which the above demonstration disproves.

First, let me explain what checkpoint files are: they are a collection of compressed images (usually jpg files) mixed with a searchable database of text descriptions and tags describing what is in each picture. The AI basically searches the database based on the text entered for a group of images that are close and mixes the images to generate the final image.

In creating the Stabile Diffusion checkpoint, the stability.ai people pulled as many pictures as they could from all over the internet, using alt tags and captions to make the text descriptions. There are several obvious problems with this approach:

They “scraped” the images from all over the internet without asking for permission, under a non-profit organization that they own under the pretense of “research”, and then under a legal loophole they are using this “research” for commercial purposes without compensating the original artists. These scraped images contain numerous questionable images that violate people’s privacy.

Stability.ai’s answer to all this is to remove words like “greg rutkouski” and various NSFW words from the searchable database to make it harder to generate art using the artists name or NSFW prompts.

The much better answer is to remove all the copyrighted images used without permission, but the stability.ai people insist that would make their tools useless.

But as we just demonstrated by changing the checkpoint, that is not the case. In fact the smaller and more specialized dreamlike checkpoint contains a much better database of words to images, creating usable results much more often. I had to try several prompts and seeds to get a pleasing to the eye image on the left. I did nothing but change the checkpoint used to generate the picture on the right in the first try.

Which leads to the other obvious problem with the stable diffusion checkpoint process, the reliance of alt tags and captions to identify images. Unless these were checked by human eyes, there is probably a lot of images in the database whose “descriptions” are part numbers, news sources, advertising or random tags often used by web developers to hide images from Google and other search engines.

This is probably why most of my prompts using the standard checkpoint resulted in pure garbage. Yes, dreamlike checkpoint also produced a lot of garbage, but at a much lower frequency.

The need for “clean checkpoints”

Artists on both sides of the AI debate consider the scraping of internet images to create checkpoints to be unethical or at least less than ideal. It is the cheapest and most profitable method, and that is why they did it, but the unethical behavior is going to be the thing that keeps AI art from wide public acceptance.

This tech has only been publicly available a short while. There is still time to course correct.

It is already at the point where court briefs are being put together and legislation is being drafted to prevent what stability.ai from doing what it is doing, and I believe these legal efforts are going to be successful because there is a demand for answers from several deep pocket and politically connected interests.

That means they will need to create checkpoints using exclusively public domain art and photos, and art used by permission. I believe Dall-E operated by Open AI with support from Microsoft is working on this as Microsoft is a major contributor to Open AI and they own a huge catalog of images (anyone who has Bing Wallpaper set up on Windows will see some of them daily)

Yes, this will limit the variety of images that can be made using stability diffusion, but with better tagging and cataloguing of the smaller picture database it’s usability should actually improve. But that is not all…

The typical pattern of disruptive technology = New Markets

Let state the obvious: AI Art is a “disruptive technology”, like factory automation, personal computers, internet, file sharing, SQL databases, and most recently cell phones.

All of these disruptive technologies result in loss of some jobs, and economic losses for certain sectors.

I would hesitate for artist advocacy groups to rely on creating legal battles for AI Art tools as a way to “save” the current art community. I think forcing the creation of clean checkpoints for commercial art creation is inevitable, but even after this is done, the disruptive technology is still going to have power over the future of art, and the damage to the current art field will continue, now with actual legal backing.

But there is a bright side to disruptive technologies: The creation of new jobs and opportunities. The rise of SQL databases saw the creation of “data entry” positions. The rise of cell phones saw the creation of app developers. There may be whole new industries I haven’t thought of that spring up to replace lost ones.

For example, if AI software developers are forced to limit the art available to these tools, to the detriment of their usefulness, then a new cottage industry of “checkpoint” add-ons (called “models” in the community) filled with original well documented legally obtained commercially licensed source art by actual artists is inevitable. Want a specific artists look to your AI art? Maybe they’ll sell you a model add on pack to add that functionality. Want to add celebrity models? Buy legitimate model packages from photographers that own the image rights.

That way artists get paid, permissions are cleared up, and whatever is next for AI art can proceed without the ugly legal and ethical ramifications.

This is a billion dollar idea stability.ai should be working on right now! Build a store front within the software itself, take a cut of the sales, broker artists on their site. This is the business model of Unreal, Daz3D, and Adobe, and it could work for AI art as well.

What then?

If AI can get over its ethical and legal hurdles, there are huge untapped potentials.

Take my industry: visual novels. Imagine if I could write visual novels with AI instructions to generate images with blanks that could be filled in by the player. Want to date Ariane, where Ariane is played by Ana de Armas? or your RL girlfriend? Or maybe you want to change Ariane’s gender and date a male Ariane? Give me a few years and I can make it happen.

Date Taylor Swift? Tech is not ready yet, but could happen soon.

To be clear: I’m not going to make games with celebrity likenesses, but I don’t have a problem making it easy to let players customize the game for personal use. I’m pro “modding” as long as people aren’t distributing or selling modded versions of my games as their own.

That’s just my field of expertise. Maybe the personalization of TV and movies is doable too? I know it is pretty much a given that use of an actors likeness for mass media is illegal and highly unethical, but what about for personal use?

This is just one of hundreds of questions we’ll be asking ourselves in the coming years as the technology only gets better and better.

Should artists fear the changes AI art will inevitably bring? I don’t know. I do know that the invention of photography threatened painters livelihoods in the late 19th century, but painting still exists. Painters were actually freed from having to do realistic paintings, which led to impressionism, cubism, surrealism, art nouveau, art deco, etc. Now AI can imitate these styles too, will painters give up painting?

Definitely not! They will likely invent new styles that haven’t even been thought of before, do things AI can’t do very well like tell stories with their art, or make art that changes and evolves on its own.

All I know for sure is art is going to change. Hopefully for the better, but there is no guarantee. My advice is to envision what you want to do and embrace the technologies that will help get you there.

AI for me?

I was going to finish with my own personal insights and a specific statement of how I was going to be ethical with my use of AI Art, but I erased it. This is not to say I want to be unethical with AI art. The future is hard to predict, and I don’t want to tie myself to promises before we know what is going to happen.

I generated some ethically questionable images for this piece, but since the plan is for this post to be educational and non-commercial and not monetized in any way, I’m OK with it.

It is a continuation of my policy to only use Google Images in social media posts and memes, but stick to Wiki Commons and respect copyleft requests for attribution when using “free” art in my commercially sold visual novels.

I have come to realize that there are other aspects about AI-art that warrant further discussion beyond the current ethical discussions getting all the attention. This was a two part discussion specifically about AI arts impact on the art industry and its problematic use of copyrighted material in the creation of that art.

I am now satisfied that these issues will get settled soon thanks to demand, though I have no idea how it will shake out, and probably to nobody’s complete satisfaction, but settled they will be, and that is what I set out to look into.

But I quickly discovered a myriad of other philosophical issues worthy of discussion (sorry for the lack of specifics here):

  • “Aesthetically pleasing” art vs “artist’s true voice” art.
  • Issues regarding the commercialization and mainstreaming of style that AI can either fix or make worse.
  • The way 3D art rendering and AI art generation are polar opposite techniques with similar goals for results.
  • The act of prompt revision and AI generation feedback as a new form of art in itself, and how text AI like ChatGPT could undermine that before it even develops a culture.
  • The strange subcultures being created over AI Art, and whether or not they are positive or negative on society.

Yes, there is definitely more to talk about. So more essays are coming. I don’t care if these essays change things, I am writing to answer questions in my head before they overwhelm me, and I am releasing them to hopefully help others think more sanely about these issues, too.

5 comments

  • When all is said and done, this is just another tool in your paint box. It is your vision that gives it life. I prefer your work to anything AI comes up with (whether it looks like Jennifer Lawrence or whoever) I want to see your Ariane not the computer’s the same way I want to see Da Vinci, Jack Kirby, or Jim Borgman. Stay true to your muse and your results will be stellar.

    • Thanks, and I agree. The inability to accurately pose, control lighting, or render on model no matter how accurate my description means AI is useless to me at this time for game creation except to make temp storyboards or possibly one off backgrounds.

      The tech will improve and may be useful in the future, hopefully after they get their copyright issues sorted. In the mean time it is a pretty picture generator that is fun to play with, but that’s about all.

    • “But as we just demonstrated by changing the checkpoint, that is not the case. In fact the smaller and more specialized dreamlike checkpoint contains a much better database of words to images, creating usable results much more often.”

      Dreambooth models like dreamlike are not trained from scratch. They use semantic leveraged from the 2.3 billion images trained base model. You’re just finetuning it on a particular set of images.

      • Agreed. Dreamlike has the same potential ethical issues as stable diffusion generic.

        My point is that if the smaller and better organized dreamlike can make pictures better than generic sd1.5, then an ethically public domain + permission granted cdpt file could be created that wouldn’t have the legal questions. Dreamlike isn’t that.

  • Pingback: AI Rant 4: Things are changing too quickly | Date Ariane Games

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.