What Are Embeddings — and Why Does the AI Care?
I’d been nodding at this word for months. Then I actually sat down and figured it out.
There’s a specific kind of confusion where you read a word, feel like you understand it, and then realise you couldn’t explain it to anyone. “Embeddings” was that word for me. It shows up everywhere in AI writing, and every time I’d move past it like I got it. I didn’t.
I was going through a phase of actually trying to understand how LLMs work — not use them, but understand what’s happening underneath. So I sat with this one. Wrote it out. Drew the diagram below. And I think I’ve got it now. Here’s what I found.
First: what’s a token?
You need tokens first, because embeddings come after them. A token is just a piece of text the model reads as one unit — usually a word, sometimes half a word. When you type “I love pizza,” the model doesn’t receive a sentence. It receives three separate pieces: [I], [love], [pizza].
That’s all tokens are — the input pieces. At this stage, the model has no idea what any of them mean. It just knows what arrived.
The basic unit of text the model reads. Think of it as the raw input — what the model receives before it starts making sense of anything.
Now: what’s an embedding?
An embedding is what happens next. The model takes each token and converts it into a list of numbers — called a vector — that represents its meaning. Not its spelling. Its meaning, in relation to every other word the model has ever been trained on.
So “pizza” doesn’t stay as the word “pizza.” It becomes something like [0.71, −0.22, 0.45, …] — hundreds of numbers that together encode where “pizza” sits in the whole landscape of language the model knows.
Why this matters more than it sounds
Here’s the thing about working purely with text: the word “cold” in “it’s cold outside” and the word “cold” in “she gave him the cold shoulder” are spelled identically. Without embeddings, the model treats them as the same thing. With embeddings, the context shifts the numbers — and the model actually holds the difference.
It’s also why you can do things like: King − Man + Woman ≈ Queen. That maths only works if meaning has a spatial structure — if related words are literally closer to each other in number-space. Embeddings are that structure. They’re the layer where language goes from being a string of characters to carrying actual meaning the model can reason about.
- A piece of text
- What the model receives as input
- No meaning attached yet
- Like a word written on a piece of paper
- A list of numbers (a vector)
- Encodes the token’s meaning
- Similar words have similar vectors
- Like the same word placed in a room full of related ones
The one line that made it stick
I wrote this in my notes and I keep coming back to it: “Without embeddings, the AI just sees letters. With embeddings, it knows how words relate.”
Everything a language model does — finishing your sentence, summarising an article, translating, answering questions — happens on top of this. Embeddings are not a feature. They’re the foundation. Which is probably why the word kept coming up everywhere and I could never quite shake the feeling I was missing something.
I was.
