And now we know what to do!
The thing is, it is very difficult for crawlers to identify the subject of the website and understand its content if there are a lot of scripts. This makes indexing more complex. Your pages need to go through another rendering process, which can take some time.
There are usually only two steps:
The search engine sends the new URLs to the crawl queue;
The data goes on to the indexing phase.
But heavily “staged” pages go through additional rendering steps. So if you don’t see the site in the search engine results for a long time, the amount of code could be to blame.
Imagine markets with tens of thousands of pages and each has to go through several steps before it can be indexed. This doesn’t sound like great site performance, does it?
So can such a situation be improved? Sure! How to do it? Read on and you will find out!
Not using language for website design and functionality is not the best way out. You can make each page more crawler-friendly using the following techniques.
All the other optimization steps, such as effective SEO link building and regular analysis, will go much easier and show better results if the initial access to each page goes well.
Try to avoid page playback
- To reduce the render-blocking script:
- Remove unnecessary comments and extra spaces from the source code
All essential data should be in the first HTML response for crawling
The first answer must include:
- Page title;
- its metadata;
- Other data is stored in the <head> section of the code.
This way, you show the search engine what your page is about instantly before starting the rendering process. This is in case you can’t avoid such a long check before indexing the page. Creating a great first impression is crucial and you can do it by displaying a kind of “replay” during the first response to crawlers.
This approach will help you even after a redesign of the website. dune’s book highlights this topic very well, covering the difficult topic of rating improvement after a “redecoration”.
Don’t forget to adjust the tabbed pages
Make sure the original HTML response has all the tabbed content. On product pages, for example, it is usually hidden from the user’s eye until they open the tab. But the search engine should see it to understand what kind of page it sees and why it deserves a high ranking.
Remember to assign a separate URL to each page
If you want the search engine to index every page on your site, you should assign a unique URL to each of them. Otherwise, it will be even more difficult for crawlers to familiarize themselves with your page and content.
It is not recommended to use parts of the URL to access new pages of the site. It’s hard to understand what keywords you’re trying to rank for if there is so much information on a single link. In addition, many tags will be ignored, which means you’re missing out on opportunities to rank.
Remember to include navigation data in the first HTML response
Include all essential browsing data in the initial HTML response. We do not only refer to the basic navigation, but also to the basement and the sidebar. There are many additional links that will make the site “understood” faster and easier
It is better to create more pages for proper navigation and better crawl access. It is easier for search engines to process multiple pages smaller than a large one. Although long scrolling may be convenient for users, if the Search Engine doesn’t show it in results soon, you won’t have users impressing.
The problems it presents
If your browser took a few seconds to fully render the web page and the source of the page did not contain much content, then how will search engines find out what this page is about?
They will have to render the page, similar to what your browser just did, but without having to display it on a screen. Search engines use the so-called “headless browser”.
In July 2016, Google said it had found more than 130 trillion documents (opens in a new tab) and it is certain that since then, this amount of documents has grown massively.
Google simply does not have the ability to play all these pages. They don’t even have the ability to crawl all of these pages – which is why every site has an allocated crawl budget.
Websites also have an allocated rendering budget. This allows Google to prioritize its rendering efforts, which means it can spend more time playing pages that visitors expect visitors to search for more often.
We will explain each step of the process:
Crawl Queue: Keeps track of each URL to be crawled and is constantly updated.
Crawler: When the crawler (“Googlebot”) receives URLs from the crawl queue, it requests its HTML code.
Processing: HTML is analyzed and
- Found URLs are passed to the crawl queue for crawl access.
- The need for indexing is assessed – for example, if HTML contains a noindex meta robot, then it will not be indexed (nor will it be rendered!). HTML will also be checked for any new or changed content. If the content has not changed, the index is not updated.
- URLs are canonical (note that this goes beyond the canonical link element other canonicalization signals are considered, such as XML sitemaps and internal links).
- Render queue: keeps track of each URL to be rendered and, like the crawl queue, is continuously updated.
- Renderer: When the renderer (Web Rendering Services, or “WRS” for short) receives URLs, it renders them and sends back the rendered HTML for processing. Steps 3a, 3b, and 3d are repeated, but now using rendered HTML.
- Index: Analyses content to determine relevance, structured data, and links, and (re) calculates PageRank and appearance.
- Ranking: The ranking algorithm extracts information from the index to give Google users the most relevant results
That’s worth the work. So, analyze the performance of your site right now and hire specialists to fix any problems you have!