BasicsAI Glossary

Training Data

Quick Answer

Training data is the dataset used to teach a machine learning model the patterns it needs to perform its task. The quality, quantity, diversity and recency of training data directly determine how accurate and fair the resulting model will be.

In Depth

What Training Data really means

Training data usually consists of input examples and, for supervised learning, the correct output labels. Preparing training data typically involves collection, cleaning, de-duplication, labelling and splitting into training, validation and test sets.

Poor quality data is the single biggest cause of disappointing AI projects. Biased data produces biased models; stale data produces models that fail in production. Investing in data preparation almost always pays off more than chasing exotic algorithms.

Why It Matters

Business relevance for UK organisations

UK businesses must also consider UK GDPR when using personal data for training. Lawful basis, purpose limitation, data minimisation and the right to erasure all shape what data can be used and how.

Real-world example

How this shows up in practice

A Bristol HR-tech vendor discovered its CV-screening model underperformed for candidates from non-Russell Group universities because its training data over-represented a narrow set of employers.

Related Terms

Continue exploring

Basics

Supervised Learning

Supervised learning is a machine learning approach in which the model is trained on a dataset containing inputs paired with their correct outputs (labels). The model learns to map inputs to outputs, enabling it to predict labels for new, unseen examples.

Basics

Unsupervised Learning

Unsupervised learning is a machine learning approach where the model learns patterns and structure from unlabelled data. Rather than predicting a known target, it uncovers groupings, anomalies or compressed representations hidden in the data.

Business

AI Ethics

AI ethics is the discipline of ensuring AI systems are designed, deployed and monitored in ways that respect fairness, transparency, privacy, human autonomy and wider societal impact. It goes beyond legal minimums to reflect organisational values.

Business

AI Governance

AI governance is the set of policies, roles, controls and oversight mechanisms that ensure AI is used responsibly, safely and in line with law and organisational values. Effective governance is proportionate — tight where risk is high, light where risk is low.