Menu Close

Deep dive: how Google ranks a new page

For an SEO, it can be useful to understand how Google deals with ranking at at a forensic level, if for no other reason than to explain to clients when they can expect their new page to rank.

Let’s look at the new page ranking journey in Google, using a large authority site as an example.

Google Search Console to the rescue!

Google Search Console has a massive volume of data which we, as SEOs, have access to. Of particularly importance to this test is being able to filter the view to specific queries and the specific page we’re using as an example.

This means all GSC is showing are the ranking, clicks and impressions for this single page, and we’re able to look at each query for that page individually

The process

The seed query..

It’s useful to first find the “seed query” – the first query that Google ever showed this page for. To do this, take the date which Google first showed this page. In this case, it was 27th April 2019.

What was the query on that date?

Now, limit the date in Google to 27th April 2019 and click to see what the query was. There will probably be a single query that shows up on that date.

It will likely be the closest query (the query with the closest content vector) that Google can find which relates to your content and particularly the page title.

You can now clear the date filter. We’re finished with it.

Find the first impression date for all other queries

After finding the seed query, one by one, filter for each of the other queries that Google is ranking this page for. You’re now only seeing a single query at a time.

Note the first date that Google generated an impression for each query.

For Google to work, it must have an algorithm that is computationally inexpensive at run-time.

By evaluating and storing queries that are associated with the page, it can quickly look them up when a user searches.

Query visibility over time

In this chart below, the length of the bar shows how long after page launch each query was first shown to a user. Put another way, the longer the bar, the more days after page launch that query was shown to some real Google users.

The real queries aren’t included. Instead I’ve categorised them into:

  • Seed query; the first query which generated an impression.
  • Equivalent query; a direct equivalent of the seed query, but written in a different way.
  • Close matching query; a very closely related query to the seed query.
  • Alternative field query; a query in a related but not identical field.
  • Distant query; a query which isn’t particularly related to the seed query and will probably not be a useful result.

In addition, I added a number in brackets which is my estimate about how close to the seed query the content vector would be for the new query. 

It might be clearer with some examples:

  • Seed query (0) could be “dog walking”.
  • Equivalent query (1) could be “walking the dog” because it’s just another way of saying the dog walking.
  • Close matching query (2) could be “walking the poodle” because it’s still walking the dog, but a bit more specialised.
  • Close matching query (3) could be “exercising the dog” because it’s not quite walking the dog any more, so a little further away from the seed query.
  • Alternative field query (4) could be “walking the cat” because it’s an entirely different animal.
  • Distant query (5) could be “dog grooming” because, while dog owners probably do groom their pets, it’s not likely to be covered by the content we’ve written about dog walking.

Data from Google Search Console: first date each query was shown to users

Remember: the longer the bar, the further away from page launch that query was first shown to a user.

(this really doesn’t view well on mobile – sorry!)

Seed query (0)
Equivalent query (1)
Equivalent query (1)
Equivalent query (1)
Close matching query (2)
Equivalent query (1)
Equivalent query (1)
Equivalent query (1)
Equivalent query (1)
Distant query (4)
Distant query (4)
Alternative field query (3)
Alternative field query (4)
Close matching query (2)
Alternative field query (3)
Distant query (6)

Data analysis

You can see some obvious patterns in the data. 

Initially, Google shows the seed query in green, and then direct equivalent queries in blue, but with the occasional close match in orange. 

After some time, Google starts to try more and more close matches in orange, and more distant queries in red.

But why?

Fanning out from the original content vector

If you’ve not come across content vectors in SEO, they are an internal representation of content and queries in Google. They help to determine the relationship between queries. 

For ease, you can think of them as an angle if you like, where close matching content vectors will have similar angles. Content that doesn’t match very well will have a larger angle between them.

So what Google is doing is building relevancy and trust information using real users, by trying the content out on them to see if they’re willing to click it. Once it has some kind of positive reaction – a click – Google will then try the page on a query which has a content vector close to the last content vector.

Building trust using users

This allows it to build trust that the content which it believes is about the topic you’ve written about, IS in fact about that, and real users agree. It also gains a view of how far beyond the original content it should go. Is it a topic just on one very specific subject, or is it broader?

 

An example of the iterative content vector process

Again an example might be useful. Let’s use dogs again.

“Dog walking” to..

Let’s say the initial seed query is “dog walking“. Google would try this on the users and hopefully receive a click. Great! 

.. “walking the dog” to ..

Then it would try some direct equivalents, like “walking the dog” or “walking dog“. Presuming it received some clicks on these two as well, it would have some trust in the initial seed query and those equivalents and would be able to move on.

.. “walking the poodle” to..

So then Google might try your page on people who search for “walking the poodle“, a closely related query, but not a direct equivalent. 

.. “walking the doberman” to ..

If Google received a positive reaction (a click) to “walking the poodle“, it would then try some other closer queries close like “walking the doberman“. If it didn’t receive a positive reaction, it likely wouldn’t go further down the breed of dog line.

.. “running with your dog” and so on ..

It might also try queries like “running with your dog“, which our content almost certainly won’t receive any clicks from.

Building SEO confidence

This iterative process appears to allow it to build a set of queries related to your page and to have some level of confidence that it is showing the right content to users.

Googles algorithm for building a set of relevant queries for a page

Google tries users with the seed query.

Google tries users with close equivalents.

Google tries users with more distant close matches.

… keeps showing queries with more distant content vectors until the users reject the page.

Googles algorithm for building a set of queries for a page

Google tried users with the seed query.

Google tries users with close equivalents.

Google tries users with more distant close matches.

… keeps showing queries with more distant content vectors until the users reject the page, when it stops going down that content path any more.

Content relevancy & trust

If you ask SEO’s, the number one ranking factor often comes out as relevancy. How does Google determine how relevancy your content is?

There’s no doubt that many on-page and off-page signals count. If you get some fantastic links from relevancy sources into a page, it will make a difference. If you optimise the user journey all the way through a users Google journey and onto your page, creating a positive match on search intent, it can make a positive difference.

User data

But what about users? Don’t they matter?

If you look at what SEO’s think, usage data is only half way up the list of what’s important to ranking, and I would agree with that.

It is there as some kind of ranking factor though, and it’s clear to me from looking at a number of these, that if your content doesn’t capture clicks from a set of users, Google are less likely to show the content to them again.

Content relevancy is an incredibly stong signal for Google and positively improves rankings more than any other, according to a poll of 1,500 SEOs.

So, produce content your users want to click!

What can we take away from this then?

Search intent is incredible important! You want to be the person that everyone wants to click! If you aren’t, your content will rise slower through the rankings and will stop at a rank lower than you would like.

Keep your page titles, URLs and meta descriptions interesting and relevant.

Are you struggling to get your content ranked?