Transformer
Quick Answer
The transformer is a neural network architecture introduced in 2017 that uses a mechanism called self-attention to process sequences in parallel. It is the foundational architecture behind nearly all modern large language models and many leading vision and audio models.
In Depth
What Transformer really means
Self-attention allows a transformer to weigh the importance of every token in the input when producing each token of the output, capturing long-range dependencies more effectively than older recurrent architectures.
Transformers scale remarkably well: bigger models trained on more data with more compute have, so far, continued to produce better results. This scaling property is the engine behind the generative AI boom.
Why It Matters
Business relevance for UK organisations
For most UK businesses, the practical implication of transformers is simply that capable language and vision models are now available as commodity services. The underlying architecture rarely needs to be understood in detail — but knowing what it enables shapes better procurement and strategy decisions.
Real-world example
How this shows up in practice
A Newcastle-based publisher uses a transformer-based model to summarise long-form articles into social-media snippets, cutting editorial time by 60%.
Related Terms
Continue exploring
Large Language Model (LLM)
A Large Language Model (LLM) is a type of neural network trained on vast quantities of text to understand and generate human language. LLMs power chatbots, copilots, content generators and many modern AI features across consumer and business software.
BasicsDeep Learning
Deep Learning is a branch of machine learning that uses multi-layered neural networks to learn highly complex patterns directly from raw data such as images, audio and text, without the need for hand-crafted feature engineering.
BasicsNeural Network
A neural network is a computational model loosely inspired by the human brain, consisting of interconnected layers of nodes (neurons) that transform inputs into outputs through weighted mathematical operations learned during training.
TechnicalEmbedding
An embedding is a numerical vector representation of text, images or other data that captures semantic meaning. Items with similar meaning produce similar vectors, which makes embeddings the backbone of semantic search, recommendations and RAG systems.