AI On Innovation [Part 2]: More Insights From +546,000 AI Overviews via @sejournal, @Kevin_Indig
Gain insights into the relationship between common crawl data and AI Overviews. Understand how user intent influences AI Overviews and the distribution of top-ranking domains. The post AI On Innovation [Part 2]: More Insights From +546,000 AI Overviews appeared...
Following up on my first analysis of +546,000 AI Overviews, I dug deeper into three questions:
How are common crawl data and AI Overviews related? How does user intent change AI Overviews? How do the top 20 positions break down for domains that rank in organic search and get cited in AIOs?How Are Common Crawl Data And AI Overviews Related?
Common crawl inclusion doesn’t affect AIO visibility as much as sheer organic traffic.
Common Crawl, a non-profit that crawls the web and provides the data for free, is the largest data source of generative AI training.
Some sites, like Blogspot, contribute a lot more pages than others, raising the question of whether that gives them an edge in LLM answers.
Result: I wondered whether sites that provide more pages than others would also see more visibility in AI Overviews. That turned out not to be true.
I compared the top 500 domains by page contribution in Common Crawl to the top 30,000 domains in my dataset and found a weak correlation of 0.179.
The reason is that Google probably doesn’t rely on Common Crawl to train and inform AI Overviews but its own index.
Image Credit: Kevin Indig
I then analyzed the relationship between the 3,000 top domains by organic traffic from Semrush and the top 30,000 domains in my dataset and found a strong relationship of 0.714.
In other words, domains that get a lot of organic traffic have a high likelihood of being very visible in AI Overviews.
AIO seems to increasingly reward what works in organic search, but some criteria are still very separate.
It’s important to call out that a few sites distort the relationship.
When filtering out Wikipedia and YouTube, the relationship goes down to a correlation of 0.485 – still strong but lower than with the two behemoths.
The correlation doesn’t change when taking out bigger sites, solidifying the point that doing things that work in organic search has a big impact on AI Overviews.
As I wrote in my previous post:
Ranking higher in the search results certainly increases the chances of being visible in AIOs, but it’s by far not the only factor.
As a result, companies can exclude Common Crawl’s bot in robots.txt if they don’t want to appear in public datasets (and gen AI like Chat GPT) and still be very visible in Google’s AI Overviews.
How Does User Intent Change AI Overviews?
User intent shapes the form and content of AIOs.
In my previous analysis, I came to the conclusion that the exact query match barely matters:
The data shows that only 6% of AIOs contain the search query.
That number is slightly higher in SGE, at 7%, and lower in live AIOs, at 5.1%. As a result, meeting user intent in the content is much more important than we might have assumed. This should not come as a surprise since user intent has been a key ranking requirement in SEO for many years, but seeing the data is shocking.
Calculating exact (dominant) user intent for all 546,000 queries would be extremely compute-intense, so I looked at the common abstractions informational, local, and transactional.
Abstractions are less helpful when optimizing content, but they’re fine when looking at aggregate data.
I clustered:
Informational queries around question words like “what,” “why,” “when,” etc. Transactional queries around terms like “buy,” “download,” “order,” etc. Local queries around “nearby,” “close,” or “near me.”Image Credit: Kevin Indig
Result: User intent differences reflect in form and function. The average length (word count) is almost equal across all intents except for local, which makes sense because users want a list of locations instead of text.
Similarly, shopping AIOs are often lists of products with a bit of context unless they’re shopping-related questions.
Local queries have the highest amount of exact match overlap between query and answer; informational queries have the lowest.
Understanding and satisfying user intent for questions is harder but also more important to be visible in AIOs than, for example, Featured Snippets.
How Do The Top 20 Organic Positions Break Down?
In my last analysis, I found that almost 60% of URLs that appear in AIOs and organic search results rank outside the top 20 positions.
For this Memo, I broke the top 20 further down to understand if AIOs are more likely to cite URLs in higher positions or not.
Image Credit: Kevin Indig
Result: It turns out 40% of URLs in AIOs rank in positions 11-20, and only half (21.9%) rank in the top 3.
The majority, 60% of URLs cited in AIOs, still rank on the first page of organic results, reinforcing the point that a higher organic rank tends to lead to a higher chance of being cited in AIOs.
However, the data also shows that it’s very much impossible to be present in AIOs with a lower organic rank.
Where the top 20 domains that are visible in AIOs and search results rank (Image Credit: Kevin Indig)
Scenarios
I will work with my clients to match the AIO’s user intent, provide unique insights, and tailor the format. I see options for the progress of AI Overview that I will track and validate with data in the next months and years.
Option 1: AIOs rely more on top-ranking organic results and satisfy more informational intent before users need to click through to websites. The majority of clicks landing on sites would be from users considering or intending to buy.
Option 2: AIOs continue to provide answers from diversified results and leave a small chance that users still click through to top-ranking results, albeit in much smaller amounts.
Which scenario are you betting on?
Featured Image: Paulo Bobita/Search Engine Journal