TechnicalAI Glossary

Transformer

Quick Answer

The transformer is a neural network architecture introduced in 2017 that uses a mechanism called self-attention to process sequences in parallel. It is the foundational architecture behind nearly all modern large language models and many leading vision and audio models.

In Depth

What Transformer really means

Self-attention allows a transformer to weigh the importance of every token in the input when producing each token of the output, capturing long-range dependencies more effectively than older recurrent architectures.

Transformers scale remarkably well: bigger models trained on more data with more compute have, so far, continued to produce better results. This scaling property is the engine behind the generative AI boom.

Why It Matters

Business relevance for UK organisations

For most UK businesses, the practical implication of transformers is simply that capable language and vision models are now available as commodity services. The underlying architecture rarely needs to be understood in detail — but knowing what it enables shapes better procurement and strategy decisions.

Real-world example

How this shows up in practice

A Newcastle-based publisher uses a transformer-based model to summarise long-form articles into social-media snippets, cutting editorial time by 60%.

Related Terms

Continue exploring

Technical

Large Language Model (LLM)

A Large Language Model (LLM) is a type of neural network trained on vast quantities of text to understand and generate human language. LLMs power chatbots, copilots, content generators and many modern AI features across consumer and business software.

Basics

Deep Learning

Deep Learning is a branch of machine learning that uses multi-layered neural networks to learn highly complex patterns directly from raw data such as images, audio and text, without the need for hand-crafted feature engineering.

Basics

Neural Network

A neural network is a computational model loosely inspired by the human brain, consisting of interconnected layers of nodes (neurons) that transform inputs into outputs through weighted mathematical operations learned during training.

Technical

Embedding

An embedding is a numerical vector representation of text, images or other data that captures semantic meaning. Items with similar meaning produce similar vectors, which makes embeddings the backbone of semantic search, recommendations and RAG systems.