Unless you’ve been living under a SEO rock, you’ll have realised that SEOs are finding it more and more difficult to tell what is going on with core updates.
Why is that?
One word: Machine Learning. OK that was two.
Find out why, potentially, things are getting a bit hairy out there.
What is machine learning?
Unless you’ve hidden away for the past 10 years, you will have used some machine learning. It’s everywhere. It predicts which songs you might like on spotify, which movies you might like on Netflix and even keeps your money safe by predicting which transactions are most likely to be fraudulent.
Machine learning isn’t one system. It consists of many different algorithms.
A neural network takes an amount of input data, and a massive amount of training data, and aims to learn relationships which allow it to predict what result should be generated using new data it’s never seen before.
In the case of Google, the result we’re interested in is predicted quality and relevancy data for a site or page.
You can learn more in this YouTube video about neural networks, if you’re interested
Google May 2020 core update
If you hang around with SEOs, you’ll find they have a lot of opinions. [unpopular opinion] My honest view? They’re often not correct.
Programming to the rescue!?
I was a programmer for 20 years before moving into online marketing and SEO for the last 20 years. I eventually worked in 3d games, where we would pay a very careful eye on how fast code was running.
Every day was dot products of vectors, looking at computational efficiency and aiming to recreate something approaching reality. I’ve been amazed at how, 20 years later, I see so many parallels with what Google is trying to achieve.
What are Google trying to achieve then?
They’re trying to work out how to approximate the human brain in order to determine the quality of a site at scale.
20 years ago, they looked at keyword density and links. Now they’re using machine learning and neural networks to predict how humans would rate quality.
Happy neurons at Google, all beavering away, looking to praise quality and sniff out dodgy tactics
A bit of history: Relationships between phrases
We’ve known for quite some years that Google looks at relationships between words. They don’t use exactly word2vec, but it’s a system which can help explain a bit more about what they’re doing.
You can play with the Google natural language tool.
Word2vec and natural language processing
Google uses an equivalent of word2vec in order to build huge dimension arrays which helped them determine words which should co-exist.
Giving it a large corpus of information – like wikipedia – it would work out what was natural. Moreover, it’s possible to work out relationships between words and concepts.
Content vectors for words and phrases
So, each piece of content – a few words for example – gains a large vector in multi-dimensional space.
A common example of using this system is how using the content vectors for KING, MAN and WOMAN and some vector maths can lead you to a vector very similar to QUEEN. How cool!
You can see then how relationships between various data can be calculated and evaluated with incredible accuracy, and most importantly, how it can be boiled down to some maths.
(KING – MAN) + WOMAN = QUEEN
Using content vectors for KING, MAN and WOMAN, you can do a little vector arithmetic and arrive at a content vector quite close to QUEEN! It’s clear, then, that relationships between words, concepts and ideas is something that Google understand, to some degree.
Machine learning in EAT and core updates
What about the core updates and EAT that Google uses today in their algorithm? Do they use machine learning?
Every SEO knows that Google uses signals and tries to determine how to most accurately reproduce what users want in their query.
15 years ago – when I worked on a search engine for jobs – we knew that exact phrase matching was better than individual keywords. That was a simple signal. We changed the strength of these signals relative to each other to produce the “right” results.
But how do you produce an algorithm which is orders of magnitude more effective at producing high quality rankings, works on any kind of information, and which is hard for SEOs to fake?
You use neural networks!
Neural networks need training data
Like any child learning for the first time, neural networks need huge amount of data to learn from. Fortunately, that’s something Google has in abundance.
For example, they have access to:
- The quality raters manual ratings for sites
- The data from Google analytics on bounce data, the amount of pages on site, and even the amount of revenue that sites produce (if you use goals).
- Whether a user bounced back to Google to try another site, signaling they didn’t find what they were looking for on that first site. They could even know the average number of sites someone would tend to visit for that niche, so they can compare you to an average.
- … and so on.
Relevancy .. and levels of trust
First, let’s start with the goal. The desire is for Google to know, with a high level of accuracy, which pages are most relevant deliver to their audience.
As a programmer, how might I approach that using neural nets?
What are the two key parts to this system?
1. Relevancy requires natural language understanding
2. Relevancy can be enhanced with a trust metric
SEO signals and machine learning in 2020
Google BERT and natural language processing
Google BERT improved point 1 – natural language processing and understanding – very well in 2019.
Google now have fantastic systems to determining the nuances of language and assigning them to content vectors (if that’s how they do indeed rank queries and keywords)
Improving quality and trust rating
But what about point 2? How can we improve the algorithm and know whether a page on a site is relevant, and have a high level of trust in that opinion?
As a programmer, I know that I need to produce a system which works quickly enough, but accurately too. I have a huge amount of training data for neural networks. What could Google do in this situaiton?
Google can write new, better signals
Google could create a whole bunch of signals that it can calculate for either an individual page or an entire site, and see whether they can train a neural network to let them know which signals are most relevant in determining the quality and relevancy of a site.
.. and disregard irrelevant SEO signals
What I’m looking for, then, is a type of neural network which can take all of the signals the coders and data analysts have produced and work out which ones are most relevant, based on learning data.
It turns out there is a neural network for that! It’s called a deep autoencoder.
You might start with 200 signals, and it might find the 20 which really move the needle in terms of matching the manual quality ratings.
Specific signals for niches
But are all of the signals the same for each niche / industry?
Probably not. For example, if you look at the signals which are most relevant to predicting quality for a medical research site, compared with an affiliate site, they will be totally different.
And this is where a Google patent written about by Bill Slawski comes in. Google can likely tell the “vector” for your site.
E-A-T signals (plural!)
Google has said there’s no single signal for EAT (Expertise, Authority & Trust). They’ve said it’s a set of signals, so there is some precedent for believing that some of what’s being discussed here is close to what they are using.
So, could there be an EAT set of signals for each industry / niche? Possibly there is, and maybe they’re in the process of creating relevant new EAT signals, starting with niches which seem to have more of a “too much SEO” problem, in their eyes.
Medic EAT .. was it the first?
If Google can classify a site niche – as we think happened in the Medic update from a few years ago – they may have been using signals that are most relevant to that individual niche.
Google may even decide to create some signals which target specific niches and which it never uses for other niches.
For example, for Medic, it may be that a signal they created was the amount of black & white images on a site, because Google noticed that high quality Medic sites have lots of images of papers. I’m not saying this IS a signal Google produced by the way … just that it could be, if they wanted to write it.
It’s very likely that clarity of authors, and author resume, and Google knowing a person has authority in that area is a signal though.
The autoencoder neural network for Medic might take both and find that B&W images is an irrelevant signal in nearly all cases, and just it toss in the bin. They might find that the authorship is relevant is 50% of cases, and therefore leave it on the table.
Some affiliate sites got hit hard in the May 2020 Google Core update. For affiliate sites, these could be some signals that Google might try to determine quality:
- The number of off-site links per post.
- The amount of adsense (or other platform) advertising.
- The amount of money-term pages vs. helpful pages.
- How niched the site is.
- The amount of authority sites linking in.
- The use of a top image.
- Use of exact match phrases in page titles.
… and so on.
The autoencoder could work all of this out for the affiliate EAT neural net, and maybe it would find 25 signals that really separate the great sites from the dross. The 25 that are left would be computationally efficient enough for them to process over billions of pages.
Words and phrases per niche: content vectors
It’s likely that the EAT signals would look at the word usage too. The algorithm may find that certain words, or types of words, are more likely to be found in certain niches.
For example, very technical medical names would likely be on medical paper sites, but much more friendly words may be on a site written by someone who lacks medical training, showing a lack of understanding and therefore a poor EAT score for the technical medical niche.
It may score well in the “home medic” niche though!
Reading age would be another potential signal to differentiate the two similar niches.
Maybe Google are working towards an EAT algorithm per niche? Maybe they’re working through the niches producing signals they think could be relevant for each?
Predicting quality raters rankings using a neural network
So, let’s say Google used an autoencoder to work out the 25 signals which most closely predict the quality of sites in a niche, out of the 2000 signals Google have available.
They can then train a new neural network – not an autoencoder, but a type which aims to predict values given inputs – using the same data as the autoencoder.
Google can then give the signal values for a different un-rated site to the neural network and it magically predicts what the quality raters would rate the site, and possibly even the likely bounce rate and success percentage of the site for users.
Since they don’t have to evaluate all 2,000 signals for each page, they can perform this activity either per search performed, or pre-process the data ahead of time, depending on the content vector possibly.
The full machine learning algorithm
That’s it! We’re finished then!
Once Google has determined the niche for the site, and therefore the EAT neural net to use, there are 4 key stages:
How Googles core update process might work
The effectiveness of this system
I have to say as a programmer I would be strongly proposing something like this, and information and patents kind of point to some elements of it, Google could write hundreds and hundreds of signals and just let the autoencoder neural network figure everything out per niche, if they wanted to.
Google don’t need to decide, objectively, that “this signals” is worth 3x as much as “that signal” .. they just decide on the rating guidelines, ask their raters to rate thousands of sites, provide a bunch of signals to the neural nets and let them figure it all out.
And the bad news for traditional SEO
And, as time goes on and Google find SEOs using new tricks, they could simply write new signals which may trap these and, again, let the autoencoder work out which signals really work today, and train the system to predict quality ratings for other sites.
How about a practical example?
SEOs will talk about the amount of exact matching anchor text your site will be able to get away with, before getting penalised. Well, today that figure might be 5%.
Lets say Google invents 3 or 4 new signals, which together better predict quality than exact anchor text percentage, the autoencoder may ditch the original anchor text percentage figure altogether! It might no longer care if you have more than a 5% exact matching anchor text, because it has a more effective way to determine if your site is good.
Or, it may find that it’s always still deeply relevant as a single signal on it’s own, or when combined with another signal, or multiple signals.
Groups of signals
And this is where I go back to the start, where I think many SEOs are missing the point. Many are running around looking for this single signal and that single signal, to see what’s happened .. and they may find some of them which correlate.
However, while they’re likely to find some sites where their hypothesis is true, they’ll find others where it’s false … but how can that be?
Weird combinations of signals
Because, in my opinion, Googles core updates will likely spit out weird combinations of signals, not single signals, as indicators of quality.
And some signals might be important in some cases, and not at all relevant in others.
Why is that?
The inside of a neural network
If you look inside a neural network, it makes absolutely no sense to a human.
If you watch this neural net explainer video (also linked near the start of the article), where the programmers taught a neural net to learn to read handwritten numbers (0 to 9), you’ll find there are more detailed videos in the same series which look into the network and the data contained within it.
How humans learn
The number 9 to humans is simple: it’s a loop with either a straight line or a curved line on the right, coming down from the loop. That’s how we recognise different versions of the number 9. It’s how we were taught as children to read and write.
How neural networks learn
A neural net doesn’t see that at all, though. It doesn’t have that knowledge. What it “sees” and uses to create internal patterns make no sense to a human. You’d look at the intermediate pattern data and think “how the hell is it going to use that to work out a handwritten number?”.
But it does. And it’s very successful.
So, you can’t know what the core update is “thinking” about your site
So, trying to determine the exact signals that Googles core updates will use to determine EAT for your site could be next to impossible. They simply will make no sense to a human.
It might be something as random as:
- Signal 663 (link velocity gradient > 0.5 for less than 3 months) is true
- Signal 33 (Images have exact match ALT text) is 72%
- Signal 7 (Reading age) is 10
… produces a quality rating of 3.
Another set of signals which bring out a different end quality rating may not not use Signal 7 at all.
But whatever signals they use and whatever output they produce will successfully approximate to the same results that a human would produce.
I see Google as a topic confidence engine
I have a model in my mind about how Google works, as do all SEOs. My starting point is the phrase “topic confidence”. How can Google have a high level of confidence that a site and page cover a topic thoroughly? What signals can I send to Google to gain confidence, in this niche?
So is SEO dead … again? Really?!
People have been predicting for years that SEO is dead, and in my opinion, no. It’s still not dead. But it is finally evolving.
The latest core updates just mean that more and more dodgy tactics are going to be sniffed out by the neural networks, and Google will find signals – or combinations of signals – which will predict the lack of user-focus of over-SEOed sites.
The best any of us can do is understand that SEO really is effective marketing, because Google generally loves that, and you’re unlikely to trigger sets of signals which penalise sites.
This means .. it’s fine to use your keywords! It’s fine to have an exact match page title! It’s fine to gain natural links and to use marketing and PR to gain these, and it’s fine for someone people to have chosen exact match anchors! It’s totally fine to deeply understand and match search intent!
And yes, SEOs who understand the signals which are likely being used can gain a competitive advantage.
However, if you choose black or grey hat techniques, you may get away with them for a time, but get ready for some sleepless nights when there’s a new core updating coming because this might be the one which sniffs out your techniques with signals which you weren’t even aware of.
“But Google can’t detect that…”
I hear SEOs say time and time again that “Google can’t detect that”, for any trick they’re using at the time. I’m sorry, my friends, but this is becoming more and more like wishful thinking.
When I go onto their sites, I can often tell they’ve not put enough effort into their sites to truly deserve the traffic they are receiving, and if I can sniff it out, so will Google and their neural nets. Maybe not today, but maybe next time.
Programmers solve problems
When I was a games programmer we spent out lives looking for ways to accomplish tasks with data which people would look at later than think … how the hell did you even DO that … and I’ve seen Google achieve some amazing feats over the last 20 years I’ve been in SEO, and I’ve been somewhat humbled by what they’ve achieved.
For me, I don’t look at the tricks today and think “Google can’t detect that”. I just wait for the deluge of posts in the facebook groups with graphs showing, sadly, people have lost nearly all of their traffic and therefore their livelihood. And they always come, for the worst of the tricks SEOs use.
Programmers look at use cases
Because if I was a Google programmer I’d find the sites which were ranking well, but which the quality raters thought were rubbish, and I’d start looking backwards at why they’re ranking well and what tricks they’re using.
And by fixing a “loophole” that was exploited, I’ll probably take out a hundred thousand other sites using the same tactics, because everyone shares everything these days.
Neural networks aren’t perfect though
Sadly, neural networks aren’t infallible. They get it wrong. As we found in the YouTube video above, when trying to read numbers from handwritten sources, they were not 100% effective.
They got it wrong a few percent of the time.
Unfairly punished sites
So, sometimes your site will be unfairly punished. It will seem to get caught in a “glitch in the system”. It could be a fantastic site which just happened to have a combination of weird signals which was similar to sites which weren’t high quality. And there’s nothing you can do to stop this.
The best you can do is know the techniques that SEOs are using today and stay away from those as well as you can. Then you can have a site and business which has been largely dominating the SERPS in a niche for 20 years, like me! We still have 3500 1st to 3rd place rankings in a highly competitive niche.
How often does Googles core EAT algorithm wrongly predict the quality of a site? 0.1%? 1%? 2%?
Google re-training after a core update
There is some evidence, though, that Google releases some updates after their core algorithm have been released. Maybe between one to six months later, people who had their sites hit by a core update will find their traffic reduction is reversed.
However, if you get your traffic back, I would take it as something of a warning shot across the bow that, in some areas, you’re maybe not what Google likes.
But some remain in Googles dog house forever and never get their traffic back, without making very significant updates to their site, because Google just doesn’t believe their site truly deserves to be ranked well, but instead it were “SEOed to rank well”.
What this all means for SEO
First .. the elephant in the room .. is all of this correct?
Well, it’s honestly impossible to know and impossible to prove. Google certainly won’t tell us. I would say there’s some evidence for some of it, and I’ve used my programming background to fill in the rest.
Whether it’s exactly correct or not though is somewhat irrelevant because I think the SEO conclusions are likely to be relevant because they literally emerge out of using neural networks, almost regardless of the exact algorithm Google uses.
The biggest takeaway, then, is that SEOs need to stop thinking in terms of single signals and think instead about groups of signals, some of which might seem quite unrelated to the typical measures of site quality which they’re used to.
EAT is (probably) thousands of patterns of odd signals groupings!
In my opinion, SEOs should stop thinking in terms of single signals for quality, and think instead about groups of sometimes seemingly unrelated signals, specific to their particular niche, some of which might seem quite unusual at first.
The future of SEO
What will SEO look like in the future then?
More about quality content and intent
I think it will continue to be more and more content led. I think content writers who just write 1000 words on anything and everything will probably be punished, and those who truly care about their subject will be rewarded.
I think many the SEO tricks will become harder and harder to achieve. Google seems to have taken a huge sword to the traffic of a lot of affiliate sites this time. The owners feel aggrieved, but Google clearly doesn’t think they deserve the traffic, but instead pushed their sites forward using the loopholes.
Links to reduce in importance
Most of all, I can see a slow reduction in the passing of link strength as Google finds equally effective ways to determine quality – but ones which can’t be gamed as easily as buying guest posts.
It may be that links are already in the core EAT system and the neural network will decide to downgrade its relevancy in some specific cases. I don’t know.
One SEO method won’t be “everything”
SEOs will sometimes believe that “only my method works”.
For example, they might be into links and think that’s the only way to improve rankings. But in the neural net world, Google could start to have multiple pathways to excellent rankings. That might be links down one branch, but it might also be an exceptionally well designed site which fully covers a topic and has excellent user metrics (the amount of users truly satisfied by the page).
I also think that the SEO tools will get further and further away from the truth, and it will take creative humans to satisfy other creative humans … or very clever AIs of course!
Neural network training
Finally, what I do suspect is there’s likely to be a flurry of SEOs learning about neural networks over the next year or two!
Recovering from a Google core update
If your site has experienced a traffic reduction, you need to look at the signals (plural!) which could have contributed to the drop.