Generative AI Development Disclosure

I. Our Generative AI Services

This Generative AI Development Disclosure describes the data Amazon.com and its affiliates (collectively, “Amazon” or “we”) use to develop or deploy our generative AI models and services (“generative AI services” or “services”). We develop AI to make customers' lives easier and more productive. Our generative AI services range from voice assistants and shopping and content recommendations to enterprise solutions and developer tools. Our generative AI services may be powered by foundation models, which includes models built by Amazon as well as by third-parties. We may use multiple models, and we may select models to optimize performance, select the best model for the relevant task, and incorporate the latest capabilities.

II. Responsible Training Data Practices

For the generative AI services we make available to our customers, we train and test on a range of data intended to enhance our services’ capabilities. We may train and test on licensed and proprietary datasets, synthetic datasets, open-source datasets, and publicly available content (including web crawled data). These datasets may include text, images, audio, video, code, and other types of data relevant to the service’s purpose. These datasets may contain public domain content, rights-protected material, and in some cases, personal information or aggregate consumer information. We train and select the models that power our generative AI services to help deliver more accurate, helpful, and relevant responses, and to help support the features and functionality of the service, such as by responding to natural language queries, recognizing visual content, generating relevant recommendations, or creating useful content.

We use various techniques to curate training data, which may include human and automated annotation, automated quality indicators, preference ranking, and other methods. We also implement multiple safeguards throughout our training data practices, including techniques to help limit the impact of any processing of personal information in connection with training generative AI services. For example, we may use processes like training data deduplication to remove repetitive data that could cause models to overweight certain patterns or reproduce specific content.

The size of our training and testing data varies by model or service, and could range from thousands to trillions of data points. We have been collecting data since before 2022, with different models beginning development at different times. Data collection, training, and testing are ongoing processes as we continuously improve our services and incorporate new capabilities.

III. Evaluation for Quality

We test and evaluate our generative AI services to assess that they meet our quality standards and perform as intended. We assess performance, accuracy, and reliability for our services’ intended uses. Our methodologies may include automated and human evaluation, benchmarking against established industry standards, simulating real-world usage patterns and edge cases, and evaluating outputs across various conditions and contexts. We test using data and modalities relevant to the goals and functionality of the generative AI service. We evaluate generative AI service performance through various methods, such as monitoring metrics, incorporating user feedback, and conducting periodic assessments as appropriate for the service.

For example, for Amazon Nova Premier, we conducted comprehensive safety evaluations including expert red teaming across critical risk domains such as Chemical, Biological, Radiological & Nuclear (CBRN) capabilities, offensive cyber operations, and automated AI research and development, engaging both internal experts and independent third-party evaluators to identify potential risks before deployment.

As part of our quality evaluation process, we also implement appropriate safeguards for our generative AI services. These safeguards may include output filtering or safety controls designed to enable our generative AI services to provide trustworthy responses. Our approaches are tailored to each service’s purpose and capabilities.

IV. Learn More

We are committed to building AI responsibly, with appropriate safeguards for safety, accuracy, privacy, and security. For more information about our approach to responsible AI, see our Responsible AI at Amazon page. For more information about specific generative AI services and features, please see the applicable help pages. For more information about how Amazon collects and uses personal information, please see the Amazon Privacy Notice. We are also working to advance responsible AI and foster innovation that balances progress with responsibility. This includes ongoing investment in AI safety research, participation in industry standards development, and collaboration with industry partners, governments, academic institutions, and safety organizations to advance the field of responsible AI.