Landscape View of A.I.’s Legal Issues

Aug 9

From investors hitting buttons on Robinhood, to tech-hipsters babbling at Bushwick parties, to CEOs at earnings calls, Artificial Intelligence (A.I.) is at the forefront of the Zeitgeist and making it into any broad discussion about technology, software, or media. Currently, it is driving speculation for a broad range of possibilities, such as:

Birthing the Star Trek replicators to aid us in building a new Utopia where nobody will need a 9 - 5;
Proving my theory that Donald Trump is essentially an algorithmic blend of L. Ron Hubbard and P.T. Barnum;
Opening entire new avenues of understanding of human behavior via the processing of massive amount of information;
Maybe getting another Heavy Metal movie.

However, AI has to clear some legal hurdles before it can be seriously appraised, and there will be a broad range of people having their interests threatened who will look to defend them in court and Congress. While generative AI has some differences in the issues between images and text, they tend to broadly overlap on the most basic issues, i.e. copyright.

Currently, while opinions will certainly differ, the most pressing issues from an intellectual property perspective revolve around web scraping, that is, how much information can a program take in to analyze and train AI models, and copyright infringement in generative AI, or how close to copyrighted information taken can the AI’s product be without infringing on the rights of the writer, artist, or other creator (keeping in mind that everything generated comes from pre-existing, often-copyrighted material)? While there is law that addresses some circumstances that are somewhat analogous to web scraping and generative AI, many of the issues it presents are entirely new, and previous issues have new dimensions or emphasis through new technologies.

Web Scraping and the Ninth Circuit

Generative AI has no original creative powers - everything is taken from pre-existing work, and the more leverage scrapers have to take, the more likely an infringing amount will be taken from any source.

What information is off limits to an automated web scraper, even by request or demand by the owner of the information, is currently a hotly-debated topic. For instance, artists typically need a public presence or portfolio to prove their abilities - would they be able to make their paintings, writings, or other copyrightable work unavailable for AI analysis via web scraper, or is it more like photographer rules where essentially anything in public is fair game to take a picture of?

While ultimately Congress will need to grapple with this question, currently it’s just Wild West rules out there - there’s apparently little law or precedent restricting the gathering of public information by automated process, and I’d be willing to bet that the web scrapers that exist today have already gathered every bit of information publicly available (including about yours truly… in my defense, it was dark, I was drunk, and that panda was delicious).

Keep in mind, like most technology, this one is “values-neutral” - that is, the same technology powering new generative AI can also search the internet to help artists find instances of infringement to protect their own copyright or trademark.

So far, the most interesting case was in the highest court in California, the Ninth Circuit Court of Appeals. In HiQ Labs v. LinkedIn Corp, this court ruled that automated web scraping of information available for the public on websites (i.e. user data from LinkedIn that could be seen by you or I) does not violate the Computer Fraud and Abuse Act (CFAA), even if the website owner objects. While this is not the most pressing issue on scrapers, it represents the first step in the development of web scraping caselaw in the circuit responsible for the most copyright production in the nation, and breaks in favor of the scrapers.

The CFAA was a cybersecurity bill prohibiting unauthorized access to computers and systems - basically an anti-hacker law, and the court’s ruling in essence holds that automated web scraping is not analogous to illegal hacking, an argument advanced by LinkedIn. What will be far more interesting will be arguments as to what legislation may create a right to protect information even when otherwise offered publicly and how that may be challenged, but that is a debate for another day.

Should Congress create legislation allowing potential goldmines like Reddit with a hojillion interactions to analyze to monetize their databases by providing them more leverage to deny access to scrapers while still keeping information public, we could see an entirely new, valuable industry open up within information technology. Alternatively, we could see:

Some information restricted to avoid scraping;
New businesses in tech opening up to make certain websites scrape-proof or create information that damage AIs, i.e. overloading them with nonsense or inaccuracies;
AIs being trained only on public domain information, causing them all to speak with an accent and terminology that is roughly 100 years behind our own, giving us all an omniscent consciousness that sounds like an extra in a movie about Chicago gangsters;
The web scraping we know today is at least partially banned, and then data entry becomes a bit more analogue to at least some degree.

The current state of law will allow web scraping to confidently continue on publicly-available websites within the 9th Circuit, with other circuits likely to follow the path they pave on “scraping” not being “hacking” unless and until another major web scraping case challenging the process makes its way up the court system. This does not, however, touch on the incredibly controversial topic of what can be produced with the information scraped - the vital issue of “fair use” in copyright that recently received attention by the Supremes in the Andy Warhol Foundation for the Visual Arts v. Goldsmith.

Copyright Questions and Andy Warhol Synergy

Perhaps the thorniest question in generative AI is, much like all material produced utilizing at least some part of another work, is at what point is it infringing? The question of copyright infringement has been kept as an analysis to be done by the four factors of fair use on a case-by-case basis as much of it involves evaluating art, a thing that the court has come to terms with being poor at (i.e. drawing the line between “art” and “pornography”).

In July, the Andy Warhol Foundation for the Visual Arts (AWF) v. Goldsmith case was decided (see our breakdown of the issues here). This was the most important copyright case arguably since Campbell v. Acuff-Rose, a precedent the court and both certiorari briefs relied on heavily.

In framing the issue, the court stated:

In this Court, the sole question presented is whether the first fair use factor, “the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes,” §107(1), weighs in favor of AWF’s recent commercial licensing to Condé Nast. On that narrow issue, and limited to the challenged use, the Court agrees with the Second Circuit: The first factor favors Goldsmith, not AWF.

Perhaps the best illustration drawn in the case was that between parody and satire, two media that the court is looking at categorically differently, which the court drew from Campbell:

Distinguishing between parody (which targets an author or work for humor or ridicule) and satire (which ridicules society but does not necessarily target an author or work), the Court further explained that “[p]arody needs to mimic an original to make its point, and so has some claim to use the creation of its victim’s (or collective victims’) imagination, whereas satire can stand on its own two feet and so requires justification for the very act of borrowing.

Looking to the text and Supreme Court’s interpretation of Campbell, unless the purpose of a work of generative AI is to specifically make a point on every copyrighted work it drew from infringingly, parody is likely off the table.

In elaborating on their position, the opinion of the court held:

…Campbell cannot be read to mean that §107(1) weighs in favor of any use that adds some new expression, meaning, or message. Otherwise, “transformative use” would swallow the copyright owner’s exclusive right to prepare derivative works. Many derivative works, including musical arrangements, film and stage adaptions, sequels, spinoffs, and others that “recast, transfor[m] or adap[t]” the original, §101, add new expression, meaning or message, or provide new information, new aesthetics, new insights and understandings. That is an intractable problem for AWF’s interpretation of transformative use.

While the AWF case does not address anything to do with AI specifically, this will offer creators some protection as changing the meaning or message via satire alone will not be enough to shield generative art or text from infringement.

At its heart, the debate around the legality of generative AI is essentially one of the availability of public information to automated gathering, a brand new problem, and fair use, my favorite complex old problem that the courts have intentionally kept ambiguous. While the courts and agencies like the Copyright Office will do their best with the patchwork of law they can quilt together to address these situations, they are clearly social paradigm-shifting technologies that were not contemplated when current copyright and internet law were legislated, and will require Congress to more-directly address them.

Ryan Campbell

Landscape View of A.I.’s Legal Issues

Web Scraping and the Ninth Circuit

Copyright Questions and Andy Warhol Synergy

AI, Court Cases, and “Human Authorship”

Highlights from AWF v. Goldsmith Oral Arguments