Crawl stats report

      166
thangvi.com (or, in plain English, I"m the guy responsible for ensuring that every blog post we publish is EPIC).

Bạn đang xem: Crawl stats report


Shows how many different websites are linking khổng lồ this piece of content. As a general rule, the more websites link khổng lồ you, the higher you rank in Google.


Shows estimated monthly search traffic khổng lồ this article according khổng lồ thangvi.com data. The actual search traffic (as reported in Google Analytics) is usually 3-5 times bigger.


If Google doesn’t index your website, then you’re pretty much invisible. You won’t show up for any tìm kiếm queries, & you won’t get any organic traffic whatsoever. Zilch. Nadomain authority.Zero.

Given that you’re here, I’m guessing this isn’t news to you. So let’s get straight down khổng lồ business.

This article teaches you how to fix any of these three problems:

Your entire trang web isn’t indexed.Some of your pages are indexed, but others aren’t.Your newly-published web pages aren’t getting indexed fast enough.

But first, let’s make sure we’re on the same page và fully-understand this indexing malarkey.


*

Google discovers new web pages by crawlingthe web, và then they add those pages to their index. They vị this using a web spidercalled Googlebot.

Confused? Let’s define a few keyterms.

Crawling: The process of following hyperlinks on the web to lớn discover new content.Indexing: The process of storing every web page in a vast database.Web spider: A piece of software designed to lớn carry out the crawlingprocess atscale.Googlebot: Google’s web spider.

Here’s a video from Googlethat explains the process in more detail:


https://www.youtube.com/watch?v=BNHR6IQJGZs

When you Google something, you’re asking Google to lớn return all relevant pages from their index. Because there are often millions of pages that fit the bill, Google’s ranking algorithm does its best to sort the pages so that you see the best and most relevant results first.

The critical point I’m making here is that indexingand rankingare two different things.

Indexing is showing up for the race; ranking is winning.

You can’t win without showing up for the race in the firstplace.


How to lớn check if you’re indexed in Google


Go khổng lồ Google, then tìm kiếm for site:yourtrang web.com

*

This number shows roughly how many of your pages Google has indexed.

If you want to lớn check the index status of a specific URL, use the same site:yourwebsite.com/web-page-slug operator.

*

No results will show up if the page isn’t indexed.

Now, it’s worth noting that if you’re a Google Search Consoleuser, you can use the Coveragereport lớn get a more accurate insight inkhổng lồ the index status of your trang web. Just goto:

Google Search Console > Index > Coverage

*

Look at the number of valid pages (with & without warnings).

If these two numbers total anything but zero, then Google has at least some of the pages on your website indexed. If not, then you have a severe problem because none of your website pages are indexed.


Sidenote.
Not a Google Search Console user? Sign up. It’s không tính phí. Everyone who runs a trang web and cares about getting traffic from Google should use Google Search Console. It’s thatimportant.

You can also use Search Console to check whether a specific page is indexed. To vì that, paste the URL inkhổng lồ the URL Inspection tool.

If that page is indexed, it’ll say “URL is on Google.”

*

If the page isn’t indexed, you’ll see the words “URL is not on Google.”

*


How to lớn get indexed by Google


Found that your website or website page isn’t indexed in Google? Trythis:

Go to lớn Google Search ConsoleNavigate lớn the URL inspection toolPaste the URL you’d like Google lớn index inlớn the search bar.Wait for Google to kiểm tra theURLClick the “Request indexing” button

This process is good practice when you publish a new post or page. You’re effectively telling Google that you’ve sầu added something new khổng lồ your site & that they should take a look atit.

However, requesting indexing is unlikely lớn solve sầu underlying problems preventing Google from indexing old pages. If that’s the case, follow the checkdanh sách below to lớn diagnose và fix the problem.

Here are some quichồng links to lớn each tactic—in case you’ve already triedsome:

1) Remove crawl blocks in your robots.txt file

Is Google not indexing your entire website? It could be due lớn a crawl bloông chồng in something called a robots.txt file.

To kiểm tra for this issue, go lớn yourdomain name.com/robots.txt.

Look for either of these two snippets ofcode:

User-agent: GooglebotDisallow: / User-agent: *Disallow: / Both of these tell Googlebot that they’re not allowed to lớn crawl any pages on your site. To fix the issue, remove them. It’s thatsimple.

A crawl bloông chồng in robots.txt could also be the culprit if Google isn’t indexing a single web page. To check if this is the case, paste the URL inlớn the URL inspection tool in Google Search Console. Cliông chồng on the Coverage blochồng to reveal more details, then look for the “Crawl allowed? No: blocked by robots.txt” error.

This indicates that the page is blocked in robots.txt.

If that’s the case, rekiểm tra your robots.txt file for any “disallow” rules relating lớn the page or related subsection.

*

Remove sầu where necessary.

2) Remove rogue noindex tags

Google won’t index pages if you tell them not khổng lồ. This is useful for keeping some website pages private. There are two ways to doit:

Method 1: metatag

Pages with either of these meta tags in their section won’t be indexed by Google:

This is a meta robots tag, and it tells search engines whether they can or can’t index thepage.


To find all pages with a noindex meta tag on your site, run a crawl with thangvi.com’ Site Audit. Go khổng lồ the Indexabilityreport. Look for “Noindex page” warnings.

Xem thêm: Mẫu Bảng Báo Cáo Kết Quả Kinh Doanh Thông Tư 200, Mẫu Báo Cáo Hoạt Động Kinh Doanh

*

Cliông xã through to see all affected pages. Remove sầu the noindex meta tag from any pages where it doesn’t belong.

Method 2: X‑Robots-Tag

Crawlers also respect the X‑Robots-Tag HTTPhường. response header. You can implement this using a server-side scripting language lượt thích PHP., or in your .htaccess tệp tin, or by changing your hệ thống configuration.

The URL inspection tool in Search Console tells you whether Google is blocked from crawling a page because of this header. Just enter your URL, then look for the “Indexing allowed? No: ‘noindex’ detected in ‘X‑Robots-Tag’ http header”

*

If you want to check for this issue across your site, run a crawl in thangvi.com’ Site phân tích và đo lường tool, then use the “Robots information in HTTP header” filter in the Page Explorer:

*

Tell your developer khổng lồ exclude pages you want indexing from returning this header.

Recommended reading: Robots meta tag và X‑Robots-Tag HTTP header specifications

3) Include the page in your sitemap

A sitebản đồ tells Google which pages on your site are important, và which aren’t. It may also give some guidance on how often they should be re-crawled.

Google should be able lớn find pages on your website regardless of whether they’re in your sitemap, but it’s still good practice to lớn include them. After all, there’s no point making Google’s life difficult.

To check if a page is in your sitebản đồ, use the URL inspection tool in Search Console. If you see the “URL is not on Google” error and “Sitemap: N/A,” then it isn’t in your sitemaps or indexed.

*

Not using Search Console? Head to your sitemap URL—usually, yourdomain.com/sitemap.xml—and tìm kiếm for thepage.

*

Or, if you want to lớn find all the crawlable and indexable pages that aren’t in your sitemap, run a crawl in thangvi.com’ Site phân tích và đo lường. Go khổng lồ Page Explorer & apply these filters:

*

These pages should be in your sitemaps, so add them. Once done, let Google know that you’ve updated your sitemap by pinging thisURL:

http://www.google.com/ping?sitemap=http://yourwebsite.com/sitemap_url.xml

Replace that last part with your sitebản đồ URL. You should then see something likethis:

*

That should speed up Google’s indexing of thepage.

4) Remove sầu rogue canonical tags

A canonical tag tells Google which is the preferred version of a page. It looks something likethis:

Most pages either have sầu no canonical tag, or what’s called a self-referencing canonical tag. That tells Google the page itself is the preferred and probably the only version. In other words, you want this page khổng lồ be indexed.

But if your page has a rogue canonical tag, then it could be telling Google about a preferred version of this page that doesn’t exist. In which case, your page won’t get indexed.

To check for a canonical, use Google’s URL inspection tool. You’ll see an “Alternate page with canonical tag” warning if the canonical points khổng lồ another page.

*

If this shouldn’t be there, and you want lớn index the page, remove sầu the canonical tag.


Canonical tags aren’t always bad. Most pages with these tags will have sầu them for a reason. If you see that your page has a canonical phối, then check the canonical page. If this is indeed the preferred version of the page, & there’s no need to lớn index the page in question as well, then the canonical tag should stay.


If you want a quick way to lớn find rogue canonical tags across your entire site, run a crawl in thangvi.com’ Site Audit tool. Go khổng lồ the Page Explorer. Use these settings:

*

This looks for pages in your sitemaps with non-self-referencing canonical tags. Because you almost certainly want lớn index the pages in your sitebản đồ, you should investigate further if this filter returns any results.

It’s highly likely that these pages either have sầu a rogue canonical or shouldn’t be in your sitebản đồ in the firstplace.

5) Cheông chồng that the page isn’t orphaned

Orphan pages are those without internal links pointing tothem.

Because Google discovers new nội dung by crawling the web, they’re unable to lớn discover orphan pages through that process. Website visitors won’t be able to lớn find them either.

To check for orphan pages, crawl your site with thangvi.com’ Site Audit. Next, kiểm tra the Linksreport for “Orphan page (has no incoming internal links)” errors:

*

This shows all pages that are both indexable and present in your sitemaps, yet have no internal liên kết pointing tothem.


This process only works when two things aretrue:

All the pages you want indexing are in your sitemapsYou checked the box lớn use the pages in your sitemaps as starting points for the crawl when setting up the project in thangvi.com’ SiteAudit.

Not confident that all the pages you want to lớn be indexed are in your sitemap? Trythis:

Download a full list of pages on your site (via yourCMS)Crawl your website (using a tool like thangvi.com’ SiteAudit)Cross-reference the two lists ofURLs

Any URLs not found during the crawl are orphan pages.

You can fix orphan pages in one of twoways:

If the page is unimportant, delete it & remove from your sitemaps.If the page is important, incorporate it into the internal links structure of your website.

6) Fix nofollow internal links

Nofollow links are links with a rel=“nofollow” tag. They prsự kiện the transfer of PageRankkhổng lồ the destination URL. Google also doesn’t crawl nofollow link.

Here’s what Google saysabout the matter:

Essentially, using nofollow causes us to lớn drop the target link from our overall graph of the web.However, the target pages may still appear in our index if other sites link lớn them without using nofollow, or if the URLs are submitted lớn Google in a Sitemap.

In short, you should make sure that all internal links lớn indexable pages are followed.

To bởi this, use thangvi.com’ Site Audit tool lớn crawl your site. Chechồng the Links report for indexable pages with “Page has nofollow incoming internal link only” errors:

*

Remove sầu the nofollow tag from these internal liên kết, assuming that you want Google to index the page. If not, either delete the page or noindex it.

Recommended reading: What Is a Nofollow Link? Everything You Need to Know (No Jargon!)

7) Add “powerful” internal links

Google discovers new content by crawling your trang web. If you neglect to lớn internally liên kết to the page in question then they may not be able khổng lồ findit.

One easy solution to lớn this problem is to lớn add some internal link lớn the page. You can vày that from any other website page that Google can crawl & index. However, if you want Google lớn index the page as fast as possible, it makes sense lớn vày so from one of your more “powerful” pages.

Why? Because Google is likely lớn recrawl such pages faster than less important pages.

To bởi vì this, head over to thangvi.com’ Site Explorer, enter your tên miền, then visit the Best by liên kết report.

*

This shows all the pages on your website sorted by URL Rating (UR). In other words, it shows the most authoritative pagesfirst.

Skyên this menu và look for relevant pages from which to add internal link to the page in question.

For example, if we were looking to lớn add an internal links to our guest posting guide, our links building guidewould probably offer a relevant place from which to lớn bởi vì so. And that page just so happens to be the 11th most authoritative page on ourblog:

*

Google will then see and follow that links next time they recrawl thepage.


Paste the page from which you added the internal liên kết inlớn Google’s URL inspection tool. Hit the “Request indexing” button to lớn let Google know that something on the page has changed & that they should recrawl it as soon as possible. This may speed up the process of them discovering the internal liên kết và consequently, the page you want indexing.


8) Make sure the page is valuable and unique

Google is unlikely lớn index low-chất lượng pages because they hold no value for its users. Here’s what Google’s John Mueller said about indexing in2018:

We never index all known URLs, that’s pretty normal. I’d focus on making the site awesome and inspiring, then things usually work out better.