The Generative AI Battle Has a Fundamental Flaw

Blog
-July 25, 2023
- No Comments

Last week, the Authors Guild sent an open letter to the leaders of some of the world’s biggest generative AI companies. Signed by more than 9,000 writers, including prominent authors like George Saunders and Margaret Atwood, it asked the likes of Alphabet, OpenAI, Meta, and Microsoft “to obtain consent, credit, and fairly compensate writers for the use of copyrighted materials in training AI.” The plea is just the latest in a series of efforts by creatives to secure credit and compensation for the role they claim their work has played in training generative AI systems.

The training data used for large language models, or LLMs, and other generative AI systems has been kept clandestine. But the more these systems are used, the more writers and visual artists are noticing similarities between their work and these systems’ output. Many have called on generative AI companies to reveal their data sources, and—as with the Authors Guild—to compensate those whose works were used. Some of the pleas are open letters and social media posts, but an increasing number are lawsuits.

It’s here that copyright law plays a major role. Yet it is a tool that is ill equipped to tackle the full scope of artists’ anxieties, whether these be long-standing worries over employment and compensation in a world upended by the internet, or new concerns about privacy and personal—and uncopyrightable—characteristics. For many of these, copyright can offer only limited answers. “There are a lot of questions that AI creates for almost every aspect of society,” says Mike Masnick, editor of the technology blog Techdirt. “But this narrow focus on copyright as the tool to deal with it, I think, is really misplaced.”

The most high-profile of these recent lawsuits came earlier this month when comedian Sarah Silverman, alongside four other authors in two separate filings, sued OpenAI, claiming the company trained its wildly popular ChatGPT system on their works without permission. Both class-action lawsuits were filed by the Joseph Saveri Law Firm, which specializes in antitrust litigation. The firm is also representing the artists suing Stability AI, Midjourney, and DeviantArt for similar reasons. Last week, during a hearing in that case, US district court judge William Orrick indicated he might dismiss most of the suit, stating that, since these systems had been trained on “five billion compressed images,” the artists involved needed to “provide more facts” for their copyright infringement claims.

The Silverman case alleges, among other things, that OpenAI may have scraped the comedian’s memoir, Bedwetter, via “shadow libraries” that host troves of pirated ebooks and academic papers. If the court finds in favor of Silverman and her fellow plaintiffs, the ruling could set new precedent for how the law views the data sets used to train AI models, says Matthew Sag, a law professor at Emory University. Specifically, it could help determine whether companies can claim fair use when their models scrape copyrighted material. “I’m not going to call the outcome on this question,” Sag says of Silverman’s lawsuit. “But it seems to be the most compelling of all of the cases that have been filed.” OpenAI did not respond to requests for comment.

At the core of these cases, explains Sag, is the same general theory: that LLMs “copied” authors’ protected works. Yet, as Sag explained in testimony to a US Senate subcommittee hearing earlier this month, models like GPT-3.5 and GPT-4 do not “copy” work in the traditional sense. Digest would be a more appropriate verb—digesting training data to carry out their function: predicting the best next word in a sequence. “Rather than thinking of an LLM as copying the training data like a scribe in a monastery,” Sag said in his Senate testimony, “it makes more sense to think of it as learning from the training data like a student.”

thecrossroadtimes.com

Writer & Blogger

Considered an invitation do introduced sufficient understood instrument it. Of decisively friendship in as collecting at. No affixed be husband ye females brother garrets proceed. Least child who seven happy yet balls young. Discovery sweetness principle discourse shameless bed one excellent. Sentiments of surrounded friendship dispatched connection is he.

About Me

Kapil Kumar

Founder & Editor

As a passionate explorer of the intersection between technology, art, and the natural world, I’ve embarked on a journey to unravel the fascinating connections that weave our world together. In my digital haven, you’ll find a blend of insights into cutting-edge technology, the mesmerizing realms of artificial intelligence, the expressive beauty of art.

Instagram

Follow on Instagram

Edit Template

Subscribe Now

Subscribe Now

The Generative AI Battle Has a Fundamental Flaw

thecrossroadtimes.com

Writer & Blogger

Leave a Reply Cancel reply

About Me

Kapil Kumar

Founder & Editor

Popular Articles

5 Techniques for Efficient Long-Context RAG

Structured Outputs vs. Function Calling: Which Should Your Agent Use?

How to Implement Tool Calling with Gemma 4 and Python

Instagram

Quick Links

Home

Features

Terms & Conditions

Privacy Policy

Contact

Recent Posts

7 Essential Python Itertools for Feature Engineering

Top 5 Reranking Models to Improve RAG Results

Contact Us