Training Data
Quick Answer
Training data is the dataset used to teach a machine learning model the patterns it needs to perform its task. The quality, quantity, diversity and recency of training data directly determine how accurate and fair the resulting model will be.
In Depth
What Training Data really means
Training data usually consists of input examples and, for supervised learning, the correct output labels. Preparing training data typically involves collection, cleaning, de-duplication, labelling and splitting into training, validation and test sets.
Poor quality data is the single biggest cause of disappointing AI projects. Biased data produces biased models; stale data produces models that fail in production. Investing in data preparation almost always pays off more than chasing exotic algorithms.
Why It Matters
Business relevance for UK organisations
UK businesses must also consider UK GDPR when using personal data for training. Lawful basis, purpose limitation, data minimisation and the right to erasure all shape what data can be used and how.
Real-world example
How this shows up in practice
A Bristol HR-tech vendor discovered its CV-screening model underperformed for candidates from non-Russell Group universities because its training data over-represented a narrow set of employers.
Related Terms
Continue exploring
Supervised Learning
Supervised learning is a machine learning approach in which the model is trained on a dataset containing inputs paired with their correct outputs (labels). The model learns to map inputs to outputs, enabling it to predict labels for new, unseen examples.
BasicsUnsupervised Learning
Unsupervised learning is a machine learning approach where the model learns patterns and structure from unlabelled data. Rather than predicting a known target, it uncovers groupings, anomalies or compressed representations hidden in the data.
BusinessAI Ethics
AI ethics is the discipline of ensuring AI systems are designed, deployed and monitored in ways that respect fairness, transparency, privacy, human autonomy and wider societal impact. It goes beyond legal minimums to reflect organisational values.
BusinessAI Governance
AI governance is the set of policies, roles, controls and oversight mechanisms that ensure AI is used responsibly, safely and in line with law and organisational values. Effective governance is proportionate — tight where risk is high, light where risk is low.