AI in Software Agreements (Part 1)

wardclassen
5 days ago
6 min read

Artificial intelligence has been defined as: a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations or decisions influencing real or virtual environments. Artificial intelligence systems use machine and human-based inputs to-

(A) perceive real and virtual environments;

(B) abstract such perceptions into models through analysis in an automated manner; and

(C) use model inference to formulate options for information or action.

15 USC 9401(3).

In short, artificial intelligence (“AI”) is computer software programmed to execute algorithms to make informed decisions, reach conclusions, predict future behavior and automate repetitive functions. Artificial intelligence encompasses many different technologies with different functions and applications including machine learning, generative AI and logical AI. Commercial businesses using AI in their internal operations, products and services can develop AI technology or license the underlying software. AI models may be (i) created internally by an entity using its own data, (ii) developed using open-source software, or (iii) licensed from a third party. One differentiating feature of AI products is their ability to source the results, i.e. the application’s ability to identify where the output originated.

Well known AI applications include:

Company	Product
Anthropic	Claude
Google	Gemini
Lexis Nexis	AI Assistant
Microsoft	Microsoft 365 Copilot
OpenAI	ChatGPT Bing Chat incorporated into Edge
Synthesis AI	Synthesis
Thompson Reuters	Co-Counsel

A. Important Terminology

Knowing the terminology of artificial intelligence is important for using and understanding the operation of the technology. Set forth below are several of the principal terms involved in the use and licensing of artificial intelligence.

1. Chatbot

A computer program that responds like a smart entity when conversed with through text or voice, and understands one or more human languages by natural language processing. Chatbots are designed to simulate human conversation via text or speech. Common examples of chatbots include ChatGPT or Microsoft 360

2. Deep Learning

A method in AI that teaches computers to process data in a way that mimics the human brain. Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions. Deep learning is powered by many layers of ‘neural networks’ which are algorithms trained to simulate the human brain and make predictions. An example of a deep learning system is a system that can identify an image and create a coherent caption with proper sentence structure for that image.

3. Generative AI (GenAI)

A type of artificial intelligence system that can create original content when requested by a user. Generative AI models learn the patterns and structure of their input training data, and then create new output with similar characteristics. Generative AI relies on deep learning models. Generative AI has many applications to the practice of law including contract drafting, intellectual property management, drafting and analyzing briefs as well as e-discovery. Chat-GPT is an example of Generative AI. A model contractual definition follows: "Generative AI Technologies" means artificial intelligence technology [developed by the Vendor] that can generate high-quality texts, images, and other content based on the data they were trained on. In the context of Vendor's use of the technology, this specifically refers to large language AI models (for example, GPT) to comprehend and produce text-based content

4. Input

A question, query, prompt, request or other information, content or material submitted to the AI application to generate an Output.

5. Large Language Models (“LLM”)

A type of generative AI which create text generation and other natural language processing tasks by learning statistical relationships from text documents during their “training” by constantly utilizing an input text and predicting the following word.

6. Machine Learning

A variation of AI whereby a machine learns from experience, “improving” each time it completes a task such as analyzing data. Machine learning predicts outcomes based on many different sources. It provides computers the ability to learn without explicitly being programmed. Machine learning systems are based on algorithms that learn and have the capacity to make decisions by finding patterns in complex data. Large language models are an example of a machine learning model.

By learning from data, machine learning enables computers to improve through experience thus enabling predictions and decisions to become more accurate over time. For example, using historical aggregated maintenance data to predict when a particular aircraft engine will require service and be removed from service for servicing.

7. Natural Language Processing (“NLP”)

A type of machine learning where the application has the ability to understand written and spoken words. NLPs such as Chat GPT permit a model to answer questions posed by a user. NLP provides computers the ability to interpret, manipulate, and comprehend human language permitting computers and smart devices to recognize and analyze language in text or spoken form. NLP is commonly used in, chatbots, translation apps, and voice-operated apps such as digital assistants.

8. Output

The data, text, content, sound, videos, software code, image, material, information, communication, and other outcome, action or result generated from use of an artificial intelligence application in response to an input.

B. Use of AI Applications The use of AI applications involves three distinct data sets/intellectual property considerations:

1. Training Data Set and Business Rules

Training data is used to teach AI models/machine learning algorithms to reach correct outcomes. Models are considered to be organic and designed to learn and improve over time. All use cases are different, requiring the application to be “trained” for the specific use case such as the aerospace industry. For example, if a user is trying to build a model for a self-driving car, the training data should include images and videos labeled to identify cars as opposed to street signs and people.

The training data may come from the customer’s own data or from data created the vendor. If the customer’s data is being used to train the model, the customer must provide access to its data to allow the vendor to train the models. While the parties’ agreement will likely provide that the customer’s data will remain confidential, the question remains as to what extent the vendor can use the customer’s data to create better models and improve the vendor’s products, both the product used by the customer as well as other unrelated products.

If a customer does not want to use a pre-trained model, (i.e. the P in Chat GPT), the vendor will likely require access to the customer’s data which may include personal data. If an enterprise customer wants to adapt or fine-tune a vendor foundation model to fit its use case, then the customer can decide what training data can be used to train an adaptive layer that sits on top of, and separately from, the foundation model, so that the full application meets its particular needs.

The customer should carefully consider what and how much training data it provides to the vendor for training purposes as the training data may contain Personally Identifiable Information (“PII”) depending on the use case. To the end, prudent customers should clearly understand how their personal data will be used and whether the vendor uses personal data as part of its training data. If so:

• Does the vendor de-identify or anonymize personal data before using it as training data?

• What steps does the vendor take to prevent re-identification of personal data used as input/training datasets?

• What consents does the vendor obtain from data subjects to allow the processing personal data in the training data used by the AI model?

2. Prompts and Input Data

Prompts are the queries or questions posed to the AI application to obtain a desired output such as “Create a recipe for blueberry pancakes.” Input data is the data entered into the application by a user to provide data the application draws upon to create the output. Entering confidential data into the application may unknowingly allow such data to enter the public domain if the vendor uses the data to train/improve its model, creating the risk the data can be used to improve unrelated third-party products. Conversely, some vendors may restrict the type of input prompts and the use of output especially when the application is used in regulated or dangerous industries such a nuclear energy and medicine. Ownership of the prompts should be addressed by the parties as the vendor may want the right to use the prompts for testing and maintenance purposes.

3. Output

Output is the result generated by the AI application in response to the user’s prompts. In creating the output, the application will use the training data set and the business rules to reach its result. While, in most cases, the output remains the property of the customer, some vendors seek ownership and use rights to the output arguing that the output is a derivative work of the vendor’s AI application. Thus, the underlying agreement between the parties should specifically address the vendor’s rights to use the output.

Outputs are sensitive to how prompts are phrased. Small variations in the wording of a prompt may result in a different output. Further, submitting the same prompt twice may also result in different outputs.

The parties should recognize that the output may include open-source software, subjecting users to the terms of the underlying open-source license including those terms requiring the output to be made publicly available.

410-967-0102

AI in Software Agreements (Part 1)

Recent Posts

Comments