Token
Quick Answer
A token is the basic unit of text that a language model processes. Tokens are usually subword chunks — roughly four characters or three-quarters of a word in English — and both the size of the model's context window and its pricing are typically measured in tokens.
In Depth
What Token really means
Tokenisation splits input text into a sequence of tokens using a vocabulary learned during model training. The same sentence may produce different numbers of tokens in different models, which matters for cost and latency.
Understanding tokens is essential for building efficient prompts, estimating API bills, and staying within context-window limits. Long prompts, verbose system messages and uncompressed retrieved context all drive up token usage.
Why It Matters
Business relevance for UK organisations
UK finance teams should track token consumption as a first-class cost metric for any LLM-based product. Small prompt optimisations — shorter system messages, compact formats, caching — can deliver large savings at scale.
Real-world example
How this shows up in practice
A Glasgow SaaS startup cut its monthly LLM bill from £9,400 to £3,200 by compressing system prompts, caching common contexts and moving simple routing to a cheaper model.
Related Terms
Continue exploring
Large Language Model (LLM)
A Large Language Model (LLM) is a type of neural network trained on vast quantities of text to understand and generate human language. LLMs power chatbots, copilots, content generators and many modern AI features across consumer and business software.
BasicsInference
Inference is the phase in which a trained model is used to produce predictions or outputs on new data. While training happens once (or periodically), inference happens every time a user interacts with an AI system, making it the dominant cost in most production deployments.
TechnicalPrompt Engineering
Prompt engineering is the practice of designing the text instructions given to a language model to produce reliable, accurate and appropriate outputs. Good prompts unlock significantly better performance without any change to the underlying model.
BasicsModel
A model is the trained output of a machine learning process — a collection of learned parameters that, combined with an algorithm, can turn new inputs into predictions or generated content without being explicitly programmed for each case.