Copyright Law Can Save Creativity from AI… It Can Also Save AI

Reading Time: 9 minutes.
Sketch of a robot claiming, "All your art are belong to us."

I am no artist, but Sketches Pro does make sketching fun. 🤖

Let’s say you make your living creating works of art. Everyone knows you for your art style. Then, someone else starts taking your paintings, making slight tweaks to them and selling them as their own? They undercut you on price because it takes them 5 seconds to slop some tiny changes on the painting and call it their own.

What if you’re an author, you make your living writing books. You, like every writer, have a unique voice. You draw from personal life experiences, some beautiful, some terrible, and use the experiences that are yours and yours alone to create exquisite art. Then someone starts using a program to just copy your book, change a few words, and publish it as their own. They sell it for pennies. You lose your livelihood as someone else speaks with your voice.

Let’s say you’re an actress. You’re popular, you won best actress, but now your studio doesn’t want to hire you because they can make anyone in a green suit look like you. A younger version of you with a bigger butt, even. Also, there’s about 10 hours of porn on the internet that looks just like you.

These aren’t “what if” scenarios. They’re already happening. We have to start applying the same ethics and copyright laws we have for everything else to the input of AI now, otherwise it’ll ruin people’s lives and eventually get banned altogether. For AI to thrive, it needs human consent. Right now, most AI doesn’t have that at all. The way to start is with copyright protections.

Fancy IP Theft is Still Theft

A fancy copy machine is still a copy machine. If the same inputs always produce the same outputs, then it’s not even derivative work, it’s basically a collage. A photocopy. In fact, some generated images even carried over the watermark from the stock image sites the images were taken from.

If you write a paper, even though the creation may not involve direct quotes, you still cite your sources. And, sure, sometimes you may make honest mistakes, but you can rectify them. AI doesn’t do this, at least not yet. Their outputs aren’t tagged with the images and text that the models used to generate output. Unlike human creations, you can’t just go back and attribute those sources later, either. Instead, the program obscures their sources. If you’re writing a paper and do something as bad as copying and pasting something—like a watermark—you can get expelled from school. It’s a serious legal issue for companies. However, because the law hasn’t caught up to AI yet, because our representatives don’t understand the technology, AI can mask what some may consider IP theft.

Copyright Law Helps Everyone Involved

Curation to the Rescue

If we can put copyright laws to work on AI, we’ll be able to force curation. AI researchers will have to carefully curate their datasets to ensure they aren’t using anything that would violate copyright. This is great for creators, artists, anyone with their image or voice online, but it will do far more than protecting intellectual property and ensuring creators are paid for their work. The curation portion is where companies can bring in experts to weigh in on the content their AI is injecting. They can do this specifically with the goal to reduce bias. Bias in AI exists because it’s able to find bias in our data and obfuscates its reasoning. AI has a dangerous habit of sanitizing bias.

Reducing bias isn’t just good for humanity, but for the application of AI as well. Amazon had to abandon the hiring AI they used for screening candidates. They found it was biased against female candidates. This is because the AI, trained on their own data, found that the company is less likely to hire women than men. The AI told on Amazon’s bad hiring practices. This wouldn’t have happened if Amazon had carefully curated their AI’s dataset to train it on an equal number of candidates from all backgrounds. Amazon had to stop using their AI because it codified and sanitized their own bias. It ended a program that could have been done properly.

In some cities around the world, the people have demanded a ban on facial recognition AI. Why? The existing facial recognition AI has extreme biases that leads to falsely identifying Black people. When law enforcement uses this AI, they’re more likely to arrest the wrong suspect when the suspect is Black. When cars use this AI, it can be more likely to hit women and non-whites. This is because of the bias built into their training sets and data. Curation could not only protect these would-be victims, but allows the furthering of AI.

Documentation Explains and Resolves Issues

Documentation is an important step for complying with copyright rules. It involves telling the end user whose work went into making the final project. It can also help explain why an AI did something. Both of these come together, and copyright law could enforce it. Documentation is accountability. It’s telling someone they didn’t get the job because Amazon’s AI doesn’t like women, so they can find the issue more quickly. It’s saying what parts of an AI lead to creating the output, from who was involved, to decisions that resulted from that usage. It can help root out bias in AI, making changes to large datasets with problematic portions on the fly, or simply telling someone why a decision was made so they can appeal it.

I do want to call attention to the fact that AI has been used in bail and parole hearings, despite the racist bias it shows. If it could document and explain its rationale, you could find that bias more quickly and throw out the result. Right now, someone could be denied bail or even parole and have no idea why. The AI is sending people to jail, and no one can exactly explain why because the process is a black box. AI companies have intentionally made them like this so they could avoid copyright and accountability, as well as the often million-step process for making each decision. In Franz Kafka’s The Trial, a man is arrested, imprisoned, and eventually executed, without ever knowing why. With AI that doesn’t document its process or credit its sources, it’s easy to see how that could become a reality.

And in the interest of credit, I got the idea for that comparison from a Philosophy Tube video (warning: some racy/nsfw outfits), presented by Abigail Thorn. See? Documentation isn’t that hard.

Smaller Datasets, Smaller Footprint

As illustrated in On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, large datasets are harmful in many ways. They can easily introduce bias, they can perpetuate bias, they can elevate some people over others, and they’re costly. Creating a model from a large dataset means you need more hardware. That’s bigger data centers, more servers, more electricity, more hardware, more tin, copper, cobalt, and lithium. It’s bad for the environment and it’s bad for the laborers (often children) who dig those materials out of the ground.

However, a curated data model is able to provide the same quality output, often without as extreme biases as the populace, but using less data, less energy, and costing less in resources and money. This makes it more accessible to all. It also would be a necessary step with copyright as a consideration. If you have to pay artists for what comes out of your AI, you’re going to be more careful about what you put in it. You’re going to limit its dataset to limit possible unforeseen expenditures. Copyright will lead to smaller datasets, which will lead to less bias, better decision making auditing, and a smaller ecological footprint.

Then there’s the poor efficiency of this entire algorithm. We know that GPT-4 responds to disallowed requests over 20% of the time. That’s a greater than 1 in 5 failure rate. You have a better chance of surviving a round of Russian Roulette. That’s a shit algorithm, even if it does represent a vast improvement over gpt-3.5-turbo. The issue is how they thought of it. Think of it like any other programming issue: what’s more efficient, going through millions of data entries or a tenth of that? Obviously the latter. So curate first, improving the efficiency of each subsequent iteration. This is basic stuff. Shrink the data set, improve efficiency. Copyright and human rights curation is the best way to create fantastic AI that uses fewer resources.

AI is Work

Every website you’ve used, every app on your phone, behind it is thousands of hours of work. People often don’t realize the sheer amount of work it takes to create software. From planning, designing, organizing, and, of course, the hours upon hours spent learning, practicing, and coding, often for long days. I can’t tell you how many 12+ hour days I pulled, with it often being the regular at many workplaces. Not to mention all the work you have to do outside of work to remain an expert in your field.

But that’s just on the software side. It doesn’t include the thousands of hours, often from child or slave labor conditions, to create the hardware that software runs on. The mining, assembling, and shipping of hardware to its destination. it doesn’t count the work that goes into making the power grid run to support that hardware.

When it comes to AI, there’s another layer. Much of the data that runs through these models still needs a human touch. Companies have to get people to identify and label images, describe the themes or context of paragraphs of text, memes, jokes, and everything else you can think of. These employees are often paid fractions of a penny for their work. Services like Amazon’s Mechanical Turk ask laborers to examine mountains of data, often only getting paid on a per-task basis, forcing workers to work entire days at a computer, just to still come in below the poverty line.

Even if we exclude the IP theft, data isn’t free. The least we can do is ensure that AI provides jobs by paying those that prop it up. That includes those who work on it, those who support and label the data, and those who make the data it needs to function. We can reduce the amount of work required, reduce the negative impact of AI, by shrinking datasets.

Companies Probably Already Know This is Coming, and It’ll be Good for Everyone

Companies likely already know copyright law is on the horizon. Nividia and Microsoft built their large language models using “174GB of text from the English Wikipedia, OpenWebText, RealNews and CC-Stories datasets” (On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?). These are open-sourced or creative commons sources that are made specifically to be used by others without explicit permission. However, they do require attribution, something that only a documentation process would create. OpenAI trained their AI on much larger datasets, some ten times as large, achieving similar results, but pulling from potentially disallowed sources, if copyright law is enforced for AI datasets.

The companies that are trying to create copyright-safe models aren’t quite measuring up, but they’re setting the groundwork for smaller, curated datasets that perform better than large language models trained from data scraping. While it could use far more moderation for content, it’s a tiny step in the right direction. Copyright law and better curation has the capability to improve AI, and be beneficial for everyone working on it.

Well… Almost Everyone

There is a class of company that may not benefit from copyright laws, at least not yet. The idea for some of these AI companies seems to be outrunning the law. Basically, large companies don’t have to worry quite as much about lawsuits. Labor laws, copyright laws, they’re not made to actually go after the worst offenders. They’re not powerful enough to cause lasting or permanent damage to large corporations, but could be devastating to startups. Their goal is to become such a big name, and make so much profit, that even once copyright law comes after them, they’ll be immune to its effects. Too big to fail. We need a copyright crackdown now, otherwise artists and creators will never get what they’re actually owed by creating these models that corporations will continue to use forever.

How Do We Curate?

“Alright, already,” you say, “I get it, we need to curate AI datasets and apply copyright law! How?” And… well, that’s a wonderful question. Because the answer isn’t straightforward and could help create new jobs. New jobs from AI, the promise of AI not to take jobs but to make them could come true!

There are already experts in AI ethics. Philosophy majors would love a new career path other than going into academia or pre-law. Researchers into online abuse, diversity experts, artists, all could get a new career path: shaping AI. They could work with AI experts and software engineers to find sources for data, create data explicitly made for the purpose of training AI. Crafting bedtime stories to make AI better at bedtime stories and paintings made to train AI how to paint the perfect landscape. People could be paid for creating datasets curated to avoid bias and discrimination. We could give AI a unique voice, all of its own. It’s the perfect career path for creators and humanities experts who could shape AI to help remove biases not only from AI, but from humanity as a whole. We could train our AI to better train people not to hold on to old biases. There’s so much wonderful potential here.

What Can We Do?

The Legal Route

If you want the law changed or better enforced, talk to the people who write those laws. Call your representatives. Fax them. Tweet at them. Email them. I personally have gotten written letters back from my representatives. They do pay attention when you reach out to them directly. Make your voice louder than the companies looking to skirt copyright for cheap data. Push for harsh punishment for IP theft when used to train AI, as it extrapolates and hides that theft. Get the startups and this early form of AI in line now, before it’s too big to take down.

The Rebel Path

I won’t lie to you, this is my favorite option. If AI profiteers want to steal your art without your permission, lace it with poison. A few researchers have found ways to make your images poison models. Take Nightshade, for example. They’ll include just the right amount of noise that targets specific phrases and keywords. The poison noise will make your photo of a dog appear to be a dog to a human, and a cat to AI. Then, when it goes to generate an image of a dog, it’ll look like a monstrosity or, if it’s been poisoned enough, a cat.

Hey, they stole from you. They can deal with the consequences.

There are other tools like Glaze that simply make your art appear as noise to an AI model, but that’s not as much fun. They are working to incorporate Nightshade as an option as well. The idea that your art can literally kill the kind of greedy projects that want to take your labor for free is delicious. This art kills machine learning. It’s so cool it’s made me want to start putting out more art in the world. If someone wants to violate your work, make sure you take them down with you.

Keyboard Warrior

Hey, it’s okay to just talk about this. Write about it, maybe on a blog, on a social network, or just over drinks with your friends. The AI apocalypse isn’t here, John Conner is still alive and well (metaphorically speaking), and we’re nowhere near dangerous artificial general intelligence (AGI) that can compete with a human. We have chatbots and clever copy/paste tools that hide their sources. That’s it. Don’t believe the AI doomerism hype, we’re a long way from AI being dangerous enough to take over the world. But we do have AI that steals people’s work and perpetuates bias. It’s our job to take a stand against that now, however we can.


Sources / Further Information:
  • Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜,” doi.org.
  • Jeffery Dastin, Reuters
  • Emilia David, The Verge
  • Benj Edwards, Ars Technica
  • Jon Keegan, The Markup
  • Melissa Heikkilä, MIT Technology Review
  • Shawn Shan, Wenxin Ding, Josephine Passananti, Haitao Zheng, Ben Y. Zhao, “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models,arxiv.org.
  • Abigail Thorn, Philosophy Tube
  • Chloe Veltman, NPR
  • James Vincent, The Verge