sdoc-col logo

Praxis: The Online Publication of The McCarthy Institute

By Daniel Metz. Daniel is a current 1L at the Sandra Day O’Connor School of law at ASU. He’s interested in the ways that law can be used to maximize the potential of emerging technologies, while also protecting against their potential pitfalls. Prior to law school, Daniel worked as a data scientist in the advertising industry. He hopes to use his knowledge of the data economy to inform a future legal practice in technology law. In his free time he enjoys playing soccer, chess and dabbling in film. 

Background

In late 2022 the world was taken by storm when OpenAI launched ChatGPT, a new generative artificial intelligence (GenAI) model.[1] The technology allows anyone to type their requests into its interface and receive a comprehensive textual response. From computer code and philosophical musings, to trivia questions and historical treatises –ChatGPT can provide a response that, made by a human, would demonstrate proficiency in the subject. It quickly became the fastest-growing consumer application in history by achieving an estimated 100 million users two months after its launch. [2] Billions of dollars have since been invested in developing GenAI technology.[3] OpenAI, the developer of ChatGPT, was recently valued at $157 billion.[4] Other companies have rushed to launch their own GenAI models and today we sit in the middle of a technological arms race as the tech giants jockey for position in this new industry.[5]

While there may be a legal scholar or two who worries that GenAI models will replace human legal talent[6], this newfound ability of machines to instantaneously generate unique text has raised questions that should keep intellectual property lawyers busy for years to come.

Josh Harlan, a venture capitalist, best illustrates this issue by writing that “pending copyright infringement lawsuits could derail the [AI] industry’s economic potential.”[7] GenAI owes its success to using huge amounts of human-created content to “train” the computer model.[8] GenAI algorithms identify patterns that exist within billions of documents written by humans; it then mimics these patterns to create its own unique textual outputs in response to user queries.[9] Harlan anticipates what could be high profile legal fights between the media giants who own the human-created content, and the tech giants using that content to build GenAI.[10] This fight is not limited to Fortune 500 behemoths—individual writers like Sarah Silverman and John Grisham have filed suit against OpenAI for using their work in ChatGPT’s training datasets.[11] This essay argues that requiring GenAI developers to pay licensing fees for using copyrighted material is essential to both the interests of the public and the advancement of AI technology.

Discussion

The courts’ response to GenAI should apply a tech-normative approach.[12] This approach focuses on the original objectives of copyright law to find the appropriate method of regulating new technology, rather than focusing on the technology’s specific technical details.[13] This achieves a form of technological neutrality that can be applied to the evolving copyright landscape without the need for analogizing modern tech to very different, preexisting tech.

Other theories of technological neutrality, by contrast, apply the law as it currently is to any new innovation; for example, a copy made by hand would be treated as being equivalent to a copy stored in a computer’s Random Access Memory (RAM).[14] Obviously, a copy made by hand is used in entirely different ways than a copy stored in RAM. Yet the latter theory of tech-neutrality would have the law treat them both in the same way. The tech-normative approach instead allows lawmakers to focus on the fundamental purpose of copyright law rather than trying to graft an outdated framework on modern tech. Professor Carys J. Craig synthesized the benefits of this approach, writing that it “relieve[s] us of the task of examining the internal mechanics of technological processes, and it allows us to escape the intractable debates about which analogies most aptly apply to describe new technology enabled activities.”[15]

Accepting the tech-normative approach then begs the question: what are the fundamental policy objectives that led to the creation of copyright law? Quoting the Constitution’s Intellectual Property Clause, Justice O’Connor opined  “the primary objective of copyright is not to reward the labor of authors, but ‘[t]o promote the Progress of Science and the useful arts.”[16] This makes clear that the law was written primarily to benefit the public at large, and the author’s exclusive right to profit from their work is an ancillary effect of this larger goal. Authors have historically provided the public with news, entertainment, instruction manuals, and practically all other forms of information. Copyright law was created to incentivize authors to continue providing it. Copyright law applied to AI should similarly look to serve the public interest by providing them with the maximum amount of high quality written information.

This public interest approach acknowledges that GenAI’s capabilities are fundamentally different than those of the earlier technologies upon which the law is based. Copyright law was created largely in response to the printing press[17]—a technology which enabled the mass distribution of an author’s written work. GenAI, by contrast, can write the work itself. While many consider ChatGPT’s outputs to be a cheap imitation of human writing, its ability to generate unique text is undoubtedly a function that was previously considered the sole domain of the human brain. While the law has thoroughly tackled the problem of a machine replacing an author’s pen, it has yet to tackle the problem of a machine replacing his mind. Most existing copyright law, therefore, was written with very different types of technology in mind and trying to analogize them to GenAI is a fruitless exercise. Instead, a return to the fundamental purpose of copyright, providing the public with valuable written work, is the most appropriate lens with which to interpret how the law applies to GenAI.

At first glance, it may seem that allowing GenAI models unlimited use of copyrighted material best serves the public interest because it enables the technology to continue advancing at the fastest possible rate. This seems like a favorable outcome when considering GenAI’s massive ability to disseminate written information to the public. The Supreme Court addressed the public interest principle in Google v. Oracle deciding that “we must take into account the public benefits the copying will likely produce.”[18] In Oracle, the court reasoned that copying parts of another company’s computer code fell within the fair use doctrine because enforcement of this copyright would stifle the ability of other companies to create new software.[19] Similarly, the creative potential of GenAI software may be stifled if its programmers are required to pay a licensing fee to access written content for use in its training dataset. The result of these fees would be increased costs passed on to the consumers, and possibly a decreased quality of GenAI’s outputs due to less available training data. For example, a user who prompted ChatGPT to “write a summary of the New York Times’ coverage of the Kennedy assassination” would have trouble if OpenAI had not paid the required licensing fees to access the New York Times’ content. The argument follows that enforcing authors’ copyrights against their use in AI training datasets will deprive the public of this promising technological breakthrough achieving its full potential.

Copyright owners may counter that Oracle dealt with the copying of computer code rather than the copying of a creative work. The copying in Oracle was for an application programming interface (API), which is a simple piece of code used within a much larger product.[20] This larger product competed in a different market and was in no way a replacement for the original.[21] By contrast, GenAI’s outputs could certainly serve as direct competitors to the original. For example, a user of ChatGPT could prompt the model to “write a novel in the style of John Grisham.” The model would then access its database of Grisham’s copyrighted work to fulfill the request. Such a user could create hundreds of Grisham-style novels in a single day, each serving as direct marketplace competitors to Grisham himself. Grisham may eventually find himself out of a job if, instead of paying retail price for his latest novel, all of his readers freely use GenAI to write books in his style for them. Therefore, the type of public-interest that the Court sought to address in Oracle does not apply to copyrighted works in an AI training dataset.

Furthermore, copyright owners may argue that giving developers unfettered access to copyrighted works will hamstring GenAI technology in the long-term. GenAI models are soon set to exhaust the supply of available human-created writing to train their models on.[22] Some estimates predict that they will have ingested every available text into their training data within the next four years.[23] Once they have no more data to train their models, GenAI technology may stagnate. In a nutshell, once GenAI models have used every single published piece of human writing to train their models, which is likely to happen soon, they will have no more raw material to identify patterns in human writing that can be used to improve their textual outputs. Many speculate that, cut off from a new supply of data, GenAI will have difficulty improving or innovating past its current capabilities.[24] The developers will rely on new human-created writing to continue supplying them with fresh data, but human writers may be disincentivized from creating this new writing because GenAI has harmed their ability to profit from it. The result may be a dearth of new human-created writing which, in turn, chokes off the available data that can be used by GenAI. It follows that there would then be stagnation in both human and computer generated writing. This hardly serves the Framers’ goal to “promote the Progress of Science and useful arts.” [25]

Some may argue that such copyright enforcement is unnecessary. After all, GenAI companies benefit from new human writing and will be financially incentivized to encourage more of it. As a result, they may argue that GenAI developers will protect human writing via the private sector, and government enforcing copyright will be unnecessary in the context of training datasets. What this argument fails to consider is that AI companies are competing in a cutthroat race with one another that creates a prisoner’s dilemma;[26] while it may be in all of their best interests to incentivize new human writing, any company that grabs as much free training data as possible will likely outcompete their rivals in the immediate term. These companies will, therefore, rely on copyright law to ensure that authors are protected enough to ensure a healthy stream of new human-writing for their training datasets.

Conclusion

Requiring GenAI developers to pay licensing fees for the copyrighted material in their training datasets is a straightforward way to avoid the risk of stagnation in both AI and human writing. These licensing fees will ensure that humans continue to have a financial incentive to write. This will lead to more human created content that is beneficial to the public at large, and also leads to more training data that GenAI models can use to continue improving their algorithms.

This outweighs the downside of increased costs passed on to GenAI consumers because such costs do not significantly infringe on the Framers’ stated goal to “promote the Progress of Science and the useful arts.”[27] While licensing fees may place artificial limits on the available training data, these can be overcome by savvy negotiating and increased capital. A widespread dearth of new human writing, however, represents an absolute limit on the training data that no amount of negotiating can remedy. These licensing fees will enable the public to reap maximum rewards from GenAI’s explosive creative output.


[1] Krystal Hu, ChatGPT sets record for fastest growing user base – analyst note, Reuters.

 (Feb. 2, 2023, 8:33 AM), https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.

[2] Id.

[3] Cade Metz & Erin Griffith, AI Start-Up Anthropic Is in Talks That Could Value It at $60 Billion, The New York Times(Jan. 7, 2025), https://www.nytimes.com/2025/01/07/technology/anthropic-ai-funding.html.

[4] Id.

[5] Id.

[6] Hessie Jones, Risk or Revolution: Will AI Replace Lawyers, Forbes, March 20, 2025, https://www.forbes.com/sites/hessiejones/2025/03/20/risk-or-revolution-will-ai-replace-lawyers/.

[7] Josh Harlan, The AI Boom May Be Too Good to Be True, The Wall Street Journal.

 (Dec. 26, 2024, 4:58 PM), https://www.wsj.com/opinion/the-ai-boom-may-be-too-good-to-be-true-copyright-ip-lawsuits-could-derail-econ-potential-fd514ea3.

[8] Sara Brown, Machine Learning, Explained, MIT Sloan, April 21, 2021, https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained.

[9] Id.

[10] Josh Harlan, The AI Boom May Be Too Good to Be True, Wall Street Journal, December 26, 2024, https://www.wsj.com/opinion/the-ai-boom-may-be-too-good-to-be-true-copyright-ip-lawsuits-could-derail-econ-potential-fd514ea3.

[11] Todd Spangler, George R.R. Martin Among 17 Top Authors Suing OpenAI, Alleging ChatGPT Steals Their Works: ‘We Are Here to Fight’, Variety (Sep. 21, 2023, 7:16 AM), https://variety.com/2023/digital/news/openai-chatgpt-lawsuit-george-rr-martin-john-grisham-1235730939/.

[12] See Carys J. Craig, The AI-copyright challenge: tech-neutrality, authorship, and the public interest,

 in Research Handbook on Intellectual Property and Artificial Intelligence 134,137–141 (Ryan Abbott ed., 2022).

[13] Id.

[14] Id.

[15] Id.

[16] Feist Publications v. Rural Telephone Service Company, 499 U.S 340, 349 (1991) (quoting Art. I, § 8, cl. 8).

[17] See Carys J. Craig, The AI-copyright challenge, in Research Handbook on Intellectual Property and Artificial Intelligence 134,135 (Ryan Abbott ed., 2022).

[18] Google v. Oracle, 593 U.S 1, 35 (2021).

[19] Id.

[20] Id. at34.

[21] Id. at 37.

[22] Nicola Jones, The AI revolution is running out of data. What can researchers do?, Nature (Dec. 11, 2024), https://www.nature.com/articles/d41586-024-03990-2.

[23] Id.

[24] See Data Scarcity: When Will AI Hit a Wall? Pieces (Jun. 17, 2024), https://pieces.app/blog/data-scarcity-when-will-ai-hit-a-wall.

[25] U.S. Const. ​​art. I, § 8, cl. 8.

[26] The Investopedia Team, What is the Prisoner’s Dilemma and How Does it Work?, Investopedia (June 16, 2024) (A prisoner’s dilemma is a situation where two decisionmakers always are incentivized to make choices that are harmful to the group as a whole).

[27] Google v. Oracle, 593 U.S 1, 35 (2021).