Chunking: the day I was wrong about GEO-friendly content

By Léa Pétralie, Head of Black Pepper, CyberCité’s Content Marketing hub.

It seems that the brain is either “literary” or “mathematical”. But in SEO, to be very good, you have to be right between the two, perfectly hybrid. Yes, but there you go, I have a literary mind. 8×7 still traumatizes me. So, I practiced Content Marketing by following a conviction: to perform on the web, semantics take precedence. In semantics, it’s not the number of occurrences that counts, it’s what’s behind it, it’s the meaning. The meaning of words, signs, language. Meaning serving intention. And that speaks to me.

When GEO interfered in the discourse, then imposed itself in our professions, we were told about chunking, vectors ofembedding andoverlaps. Suddenly, agencies were charging their clients twice as much for “GEO-friendly” content. I shunned abusive commercial practices, but also and above all a concept in which I did not believe.

But I was wrong.

For your brand to be present in conversational search engines, you must now think about “chunking strategy”. A writer explains why.

Chunking, the art of breaking up content.

Ah, Marketing and its anglicisms! It is true that, on paper, “chunk” sounds much better than “piece”. More fun, more expert, more mysterious. However, chunking, in Marketing, is simply the fact of divide content into autonomous blocks. “On the verge of inventing the paragraphs. » That’s what I told myself for a while, eternal skeptic.

But at CyberCityyou can’t be wrong for too long. This is the advantage when you work every day with 130 Search experts and creative prompt engineers. I observed, exchanged, tested, listened to our conferences (which you should do too) and the unthinkable happened: I started to change my mind.

What if there was meaning behind content chunking?

Chunking therefore consists of breaking up content into fragments, the famous “chunks”. Of the self-sufficient segmentswhich can be perfectly understood even when isolated from the rest of the text. And this is where it becomes really interesting for the performance of your content.

Facilitate the recovery of your content by LLMs.

In reality, we did not wait for the arrival of generative AI to talk about “chunking”. The term was already used in cognitive psychology, to describe the mechanism of the human brain when faced with information overload. This mental simplification strategy consists of translating several pieces of information into an intellectually usable format (encoding), in order to facilitate memorization.

If you are told “1939-1945”, your brain has condensed the entire history of the Second World War behind these two dates. This mental shortcut is a chunk.

This is how LLMs, like ChatGPT, Gemini & co., also work when they extract data from the web. When responding to a query, most AI models do not retrieve an entire page but the segments that are semantically closest to the user’s prompt.

LLMs vectorize the text (their own form of encoding): they transform words into a series of numbers in order to more easily store information units (“ embeddings “) and calculate the semantic and contextual proximity between each.

I told you, to be excellent at SEOyou have to be good at maths as well as letters. But we’ll come back to that later.

What is the ideal size of a good chunk?

“You don’t talk to me about the length of words. » I have always refused to associate a specific caliber with content; your added value is not measured by the number of characters. Content must contain the number of words that the subject it covers deserves.

But with chunking, size matters.

To avoid information overload and not multiply context analyses, LLMs favor short and structured segments. They use a Sliding Window Attention (AFG); they carry out a contextual analysis on a group of words only, before moving on to the next, in order to save time and efficiency. It is therefore not the length of your content that should matter to you today, but the length of chunks which compose it.

The recommended size for a chunk efficient is between 150 and 300 words (or between 200 and 400 tokens for AI-natives).

How to adapt your Content Marketing strategy to chunking?

There are several ways to conduct your chunking strategy, and they are not incompatible. Here are two simple ones to implement today to maximize the citability potential of your content – ​​a potential that CyberCité helps you monitor.

  • THE size-based chunking : this strategy consists of dividing the content into several controlled size units within the same page. FAQ, bulleted list, “To remember” paragraph or even TL; DR (Too long, didn’t read), short definition: the template of your page is GEO-compliantand the content is structured to highlight the different segments to make them easily retrievable by LLMs.
  • THE page-level chunking : the strategy operates this time at the level of your website. A page can be considered as a single chunk. A Glossary hub page can, for example, link to a page by definition. This technique is a continuation of the topic cluster, the thematic cocoon that works so well today in SEO.

Semantic optimization: if we cut, do we become poorer?

You have to ask the annoying question. If you break up existing content to make it GEO-readydon’t you lose the semantic relevance necessary for positioning in traditional search engines?

The answer is no, when the job is done well.

What must be understood is that the Transformers LLMs (the famous T in GPT) do indeed have a semantic understanding of the text. By transforming words into digital vectors, they are able to analyze the interrelations between each, and therefore the meaning of a sentence and a paragraph.

Consider the following vectorization example:

  • “cat”: (0.2, -0.5, 0.8, 0.1, …)
  • “dog”: (0.3, -0.4, 0.7, 0.2, …)
  • The semantic distance between these two words is small: they belong to the same universe.

Optimize your content in a logic of chunking does not consist of deleting but of reshaping. The division of your paragraphs is at the heart of the subject: the return to the line is no longer driven only by your common sense or by the literary sensitivity of your editor, but by algorithmic logic. THE semantic chunking relies on thematic breaks: a change of idea is detected thanks to the sliding or semantic overlap of tokensas perceived by LLMs.

You do not impoverish your content; you enrich their semantic hierarchy.

From the semantic field to the blocks of truth.

I was wrong. Chunking is not just another buzzword in Digital Marketing jargon. It even initiates a change in the treatment that we had previously given to content. While attention was focused on the dilution of the semantic field within a page, we return to the essential: thearrangement of information in autonomous, reliable, verifiable and explicit fragments. Blocks of truth: the truth of your expertise, your market and your brand. Didn’t Google warn us? Good content is above all useful content, which responds clearly and precisely to the user’s real concerns.