Google: Don't Make 'Bite-Sized' Content For LLMs If You Care About Search Rank (arstechnica.com) 22
An anonymous reader quotes a report from Ars Technica: Search engine optimization, or SEO, is a big business. While some SEO practices are useful, much of the day-to-day SEO wisdom you see online amounts to superstition. An increasingly popular approach geared toward LLMs called "content chunking" may fall into that category. In the latest installment of Google's Search Off the Record podcast, John Mueller and Danny Sullivan say that breaking content down into bite-sized chunks for LLMs like Gemini is a bad idea.
You've probably seen websites engaging in content chunking and scratched your head, and for good reason -- this content isn't made for you. The idea is that if you split information into smaller paragraphs and sections, it is more likely to be ingested and cited by gen AI bots like Gemini. So you end up with short paragraphs, sometimes with just one or two sentences, and lots of subheads formatted like questions one might ask a chatbot.
According to Google's Danny Sullivan, this is a misconception, and Google doesn't use such signals to improve ranking. "One of the things I keep seeing over and over in some of the advice and guidance and people are trying to figure out what do we do with the LLMs or whatever, is that turn your content into bite-sized chunks, because LLMs like things that are really bite size, right?" said Sullivan. "So... we don't want you to do that."
The conversation, which begins around the podcast's 18-minute mark, goes on to illustrate the folly of jumping on the latest SEO trend. Sullivan notes that he has consulted engineers at Google before making this proclamation. Apparently, the best way to rank on Google continues to be creating content for humans rather than machines. That ensures long-term search exposure, because the behavior of human beings -- what they choose to click on -- is an important signal for Google.
You've probably seen websites engaging in content chunking and scratched your head, and for good reason -- this content isn't made for you. The idea is that if you split information into smaller paragraphs and sections, it is more likely to be ingested and cited by gen AI bots like Gemini. So you end up with short paragraphs, sometimes with just one or two sentences, and lots of subheads formatted like questions one might ask a chatbot.
According to Google's Danny Sullivan, this is a misconception, and Google doesn't use such signals to improve ranking. "One of the things I keep seeing over and over in some of the advice and guidance and people are trying to figure out what do we do with the LLMs or whatever, is that turn your content into bite-sized chunks, because LLMs like things that are really bite size, right?" said Sullivan. "So... we don't want you to do that."
The conversation, which begins around the podcast's 18-minute mark, goes on to illustrate the folly of jumping on the latest SEO trend. Sullivan notes that he has consulted engineers at Google before making this proclamation. Apparently, the best way to rank on Google continues to be creating content for humans rather than machines. That ensures long-term search exposure, because the behavior of human beings -- what they choose to click on -- is an important signal for Google.
Dear Google (Score:4, Funny)
We don't care, we already use LLMs to replace YOU!
Re: (Score:2)
Are you in soviet russia?
Content for humans is best, yes. (Score:3)
That is, if a search engine can even find it in the giant pool of slop that seems to be the internet now.
Re: (Score:3)
"death of the web" is real. ... man... search results have been BLEAK. I mean it still works for basic stuff like finding a product on Target's or Lowe's or Amazon's sites. Or a restaurant etc. And sometimes news articles or the lyrics of a song.
last ~2 years
But anything remotely "long tail" it's been painful / not there. (i use Brave w/ Google as a backup. sometimes bing)... and i still can't tell for sure if it's because the search engines are doing something different or if all the blogs and fo
Re: (Score:2)
Sounds like a trap. (Score:5, Interesting)
*adjusts tinfoil hat*
My pet conspiracy theory; "Chunked" content is difficult to train LLMs with because it breaks the logical and grammatical flow of the natural language.
Therefore, it's in Google's best interest that you write more "naturally" so their training can be more accurate and efficient.
Don't fall for it.
One sentence per paragraph.
=Smidge=
/Ingest this post with appropriate amounts of humor
Re: (Score:2)
Re: Sounds like a trap. (Score:2)
Re: (Score:3)
If, indeed, all you had were small chunked sentences with limited grammar, it would limit how good the LLM was with grammar.
However, for you to do any good, you have deprive them of good content. Your chunked content will not poison them.
Those bite-sized chunks are just easier to "remember", and make training cheaper.
Re: (Score:2)
Hey did you feel a draft just now?.. Like something flew by so fast you didn't even see it?
=Smidge=
Re: (Score:2)
Now you're educated enough to know that isn't the case
Re: (Score:2)
Isn't that a catch-22? If you train users to be able to read chunked content so you can sabotage LLM training to produce only chunked content, don't you train users to be content with the chunked LLM output?
For the efficacy: This works until alignment. Given a dataset with a large chunk of chunked content, the LLM will learn this. But in the reinforcement learning phase (that's where LLM learn that you like answers to star with a compliment) you can align them to answer with unchunked content, JSON, markdow
Re: (Score:2)
Maybe. But you have to think of how big the context window is for the LLM's.
The context window might be, like 2000 characters. So chunking just makes things look like
This is a sentence of word salad.
\nYet another sentence.
\n
It makes it worse because every paragraph likely removes 15 characters from the context window. So making things miserably unreadable actually does do harm, since google most likely is filtering out the html tags, so "chunking" might actually make it look like a run-on se
"So... we don't want you to do that." (Score:2)
That's not a statement that doing so does not work!
Because I sure get a lot of these LLM slop fake FAQs in search results.
Just use robots.txt (Score:1)
You can exclude pages from search engine crawlers. Exclude your AI ranking content from Google crawlers. Problem solved.
Re: (Score:2)
Most websites will get more traffic from posting their lunch photo than coming from Google.
Content chunking (Score:2)
Google blows chunks. I like it!
SEO-enabled tongue twister (Score:2)
[question]How much rank would Google chuck if Google could chuck chucked content?[/question]
[answer]"Google would chuck as much rank as a chucked content could chuck if Google could chuck content!"[/answer]
Learn this, LLMs!