The Collision of AI and Copyrights
As artificial intelligence (AI) continues to be increasingly utilized in a wide variety of both business and consumer applications, a plethora of legal questions are being raised. Many of these questions revolve around how the integration of AI will affect liability allocations in situations where the AI’s “judgment” was relied upon to perform a task that traditionally involved the use of only human judgment. Such liability considerations are important to define and understand, particularly in high-risk implementations involving, for example, healthcare applications and large equipment operations (e.g., autonomous vehicles).
While the effects of replacing human judgment with AI judgment are important issues to consider, in at least some AI implementations, there are other, more fundamental, legal issues that must be resolved. Many AI solutions are developed via a training or learning process that develops a dynamic algorithm or model that is the core of the AI solution. These AI solutions are referred to as generative AI solutions.
Generative AI solutions often leverage information from the internet and other available data sources to “train” the model using this training data. In some instances, generative AI training processes may be implemented continuously in an otherwise unsupervised manner to develop the model. Ultimately, the model may be used by the generative AI solution to, for example, generate new content, such as images, video content, text content, articles, poems, stories, compositions, sound recordings, and computer code.
Because generative AI offers the ability to create new content quickly, with minimal human contribution and talent, the use of generative AI has become widespread and rapidly adopted. By simply inputting a natural-language request by a user, the generative AI solution can rapidly generate new content that traditionally required substantial human talent and time to create. Such efficiency and ease of use have led to high adoption and ongoing utilization of such generative AI solutions.
An aspect of some generative AI solutions that many casual users may not appreciate is that the AI output is still based on the data sources used for training, which are unlikely to be owned by the AI solution provider or the end user. In other words, the AI solution generates the new content from a model built on training data that was unlicensed and owned by third parties. Since copyrights often protect at least some of this data, a question arises as to whether such use of the data by the AI solution constitutes a copyright infringement. With this in mind, it is important to appreciate that the AI output is not developed in a vacuum, but it is an algorithmic output that is likely based on copyrighted data that is owned by, for example, third-party artists and authors.
The legal framework for copyrights protects the original works of authors and artists by offering a cause of action for copyright infringement. An infringement involves any copying of a substantial portion of a copyrighted work. It is noteworthy that the copying need not result in an identical copy, but in most instances, substantial portions of the copyrighted work must be included in the infringing work.
With respect to generative AI, the output may not include a substantial portion of any one copyrighted work, and for that reason, claims of copyright infringement due to the use of copyrighted works as training data may be difficult to establish. In other words, the AI output may not include enough of a copyrighted work to give rise to an infringement based solely on the AI output.
However, some have argued that in the process of training the AI model, some copying of the copyrighted training data is necessary, and therefore those instances of copying could constitute a copyright infringement. Unfortunately for the copyright owners, it is often difficult to determine exactly whether the training data includes an individual’s copyrighted work, and how the training process integrates such works into the process without being able to analyze, for example, the data sets and the computer code that have been used.
Adding to the complexities, a common defense to copyright infringement is fair use. The legal doctrine of fair use permits the use of unlicensed, copyrighted works in some instances if the alleged copyright infringer can establish that their use of the copyrighted work constitutes fair use. Factors considered in the fair use analysis include the purpose and character of the use, the nature or type of work being copied, the amount of the copyrighted work that has been copied, and the effect on the market for the copyrighted work.
With respect to fair use and copyright infringement, generally, the analysis of infringement associated with the generative AI output is different from the copyright infringement analysis associated with the model training process. With respect to the AI output analysis, if the output is quite similar to the copyrighted work, then infringement is more readily established. However, it is often the case that the AI output lacks a substantial portion of any one copyrighted work, making the case for copyright infringement for the AI output more difficult.
With respect to the copying that may be included in the AI model training process, copying of an entire copyrighted work may be more readily established if details of the training process are known. If such copying during training can be established, arguments can still be made that copying associated with generative AI is protected under fair use. For example, generative AI models, themselves, have a very different purpose and character from the underlying copyrighted works, which could be sufficient to establish a fair use defense for generative AI.
The uncertainty at the intersection of generative AI and copyrights has been and continues to be, an issue that will need to be resolved. Parties are currently in the midst of litigation to find a framework to address this issue. Original content generators, including artists and authors, believe they are receiving no compensation for their works, while those same works are being leveraged and monetized for commercial gain by the AI solution providers.
While there are active efforts to define law to address many of the legal issues being raised by AI, it is yet to be seen whether courts and lawmakers will find ways to address these issues under new and existing legal frameworks. It is, however, clear that AI solutions are beneficial for handling a vast number of tasks and problems.
As such, finding ways to embrace and promote this technology, while also benefiting contributors that intentionally or unintentionally provide content for model training, is an issue that will need to be resolved in the very near future.