Noindex, Nofollow, and Canonical: Index Control From A to Z
A definitive guide for technical SEOs on using noindex, nofollow, and canonical tags. Stop sending mixed signals and take control of your site’s indexation.
In this article
- Index Control: Why You Should Care
- The ‘Noindex’ Directive: Your Do-Not-Enter Sign for Googlebot
- Taming Links with ‘Nofollow’: When Not to Pass the Juice
- The Canonical Tag: Your Solution for Duplicate Content
- The ‘Noindex Nofollow Canonical’ Conundrum: Combining Directives
- Auditing Your ‘Noindex Nofollow Canonical’ Strategy with ScreamingCAT
Index Control: Why You Should Care
In the grand, chaotic universe of the web, not every page you create deserves a spot in Google’s index. Some pages are utility, some are temporary, and some are just plain thin. Understanding the intricate dance of noindex, nofollow, and canonical tags is non-negotiable for anyone serious about technical SEO.
These directives are your primary tools for telling search engines how to crawl and index your content. Get it right, and you guide crawlers toward your most valuable pages, consolidate link equity, and present a clean, efficient site architecture. Get it wrong, and you risk de-indexing your homepage or wasting your Crawl Budget on URLs that offer zero value.
This guide isn’t for beginners. We assume you know what a crawler is and why indexing matters. We’re here to dissect the nuances, debunk the myths, and give you a precise framework for using these powerful tools. Let’s get technical.
The ‘Noindex’ Directive: Your Do-Not-Enter Sign for Googlebot
The `noindex` directive is the most straightforward of the bunch. It’s a clear, unambiguous command to search engines: ‘Do not include this page in your search results.’ It’s the bouncer at the door of the SERPs.
You can implement `noindex` in two ways: as a meta tag in the “ of your HTML, or as an HTTP header via X-Robots-Tag. The meta tag is more common for HTML pages, while X-Robots-Tag is essential for non-HTML files like PDFs or images.
Crucially, a page must be crawlable for the `noindex` tag to be seen. If you block a page in your robots.txt file, Googlebot will never see the `noindex` directive. The result? The URL might still get indexed if it’s linked to from elsewhere, just without a title or snippet. Always allow crawling for pages you want to de-index.
When should you use `noindex`? The applications are numerous, but they all boil down to keeping low-value or private pages out of the public index.
<meta name="robots" content="noindex">
- Staging or Development Environments: An obvious but often-missed use case. Noindex your entire staging site to prevent it from being indexed.
- Thank You Pages: These pages typically have no search value and can skew analytics.
- Internal Search Results: Faceted navigation and internal site search can create a near-infinite number of low-quality, duplicate URLs. Noindex them.
- Admin and Login Pages: These are for users, not search engines.
- Author Archives on Single-Author Blogs: On a blog with only one author, the author page is a duplicate of the main blog page. Noindex it.
Taming Links with ‘Nofollow’: When Not to Pass the Juice
Ah, `nofollow`. Born from a need to combat comment spam, this attribute has had a fascinating evolution. Originally a directive to not pass PageRank, Google announced in 2019 that `nofollow`, along with its new siblings `rel=”sponsored”` and `rel=”ugc”`, would be treated as ‘hints’ rather than strict commands.
What does ‘hint’ mean? It’s a polite way of saying they’ll probably honor your request not to pass equity or use the link for discovery, but they reserve the right to do what they want. For most practical purposes, you can still treat `nofollow` as a way to sculpt link flow, but with an asterisk.
The `nofollow` attribute can be applied to individual links or to all links on a page via the robots meta tag. Using the page-level `nofollow` is a blunt instrument; use it with extreme caution.
The primary use case remains the same: marking links you don’t editorially endorse. This includes paid or sponsored links (though `rel=”sponsored”` is now preferred), links in user-generated content (use `rel=”ugc”`), and links to sites you simply don’t trust. It’s a signal of dissociation.
Good to know
Remember, Google may still crawl a nofollowed link. `Nofollow` is not a mechanism for preventing crawling or indexing. If you need to stop a page from being indexed, use `noindex`.
The Canonical Tag: Your Solution for Duplicate Content
The `rel=”canonical”` link element is arguably the most misunderstood of the three. It is not a directive like `noindex`. It’s a suggestion—a strong one—about which version of a set of duplicate or near-identical pages is the master copy.
When search engines find multiple pages with the same content, they don’t know which one to rank. This splits link equity and can lead to the wrong URL appearing in search results. The canonical tag solves this by pointing all variations to a single, authoritative URL. All ranking signals, like links, are then consolidated to that canonical URL.
This is a critical tool for managing the rampant Duplicate Content issues that plague modern websites. Common use cases include e-commerce sites with product variations, URLs with tracking parameters, and content syndication.
A self-referencing canonical—where a page’s canonical tag points to its own URL—is a best practice. It’s a clear signal that this page is the intended version and protects it from having its signals diluted by unforeseen parameter-based duplicates.
<link rel="canonical" href="https://www.example.com/preferred-url/" />
The ‘Noindex Nofollow Canonical’ Conundrum: Combining Directives
This is where things get messy. SEOs love to combine things, but with `noindex`, `nofollow`, and `canonical`, you’re often sending conflicting, nonsensical signals to search engines. Let’s clear the air.
The most egregious error is using `noindex` and `rel=”canonical”` on the same page. Think about what you’re saying: ‘Hey Google, don’t index this page. But by the way, it’s a copy of this other page, so please consolidate all its ranking signals over there.’ It’s a complete contradiction.
Google’s John Mueller has stated they will likely prioritize the `noindex` tag and ignore the canonical. Why? Because `noindex` is a hard directive, while `canonical` is a hint. The result is that the page gets dropped from the index, and any link equity it might have had is lost, not passed.
What about `noindex, follow`? This tells search engines to de-index the current page but to crawl the links on it and pass equity. It’s useful for de-indexing old archive or tag pages while still allowing link equity to flow to the articles they link to. In contrast, `noindex, nofollow` is a black hole—nothing gets indexed, and no equity flows out. Use it for pages you truly want to erase from Google’s map.
The core principle of the noindex nofollow canonical relationship is simple: a page is either an indexable master copy (or a duplicate that points to one via a canonical), or it’s a page you want removed from the index. It cannot be both.
Warning
Never place a `rel=”canonical”` tag on a page that also has a `noindex` directive. You are sending contradictory signals, and the link equity will be lost, not consolidated.
Auditing Your ‘Noindex Nofollow Canonical’ Strategy with ScreamingCAT
Theory is great, but execution is everything. You need to audit your site to find where these directives are implemented—or mis-implemented. This is where a robust crawler like ScreamingCAT becomes indispensable.
After running a crawl, ScreamingCAT makes it painfully easy to find these tags. Navigate to the ‘Directives’ tab. Here, you’ll find every URL with a `noindex`, `nofollow`, `noarchive`, or other directive, along with its source (meta tag or X-Robots-Tag).
To find canonicalization issues, head to the ‘Canonicals’ report. You can filter for pages with a canonical link element, see which ones are self-referencing, and, most importantly, identify pages where the canonical points to a different URL. Cross-reference this list with the ‘Noindex’ filter from the Directives tab. Any URL that appears in both lists is a high-priority problem to fix.
Using ScreamingCAT, you can quickly build a complete picture of your noindex nofollow canonical implementation. You can export these lists, identify the conflicting signals, and systematically clean up your site’s instructions to search engines. Stop guessing and start crawling.
Without a crawler, you’re flying blind. You can’t fix what you can’t find.
Every competent technical SEO
Key Takeaways
- Use `noindex` to keep low-value or private pages out of search results, but ensure the page is not blocked by robots.txt.
- Use `rel=”canonical”` to consolidate link signals for duplicate or similar pages, pointing them to a single authoritative version.
- Never use `noindex` and `rel=”canonical”` on the same page. It’s a contradictory signal that results in the page being de-indexed and its link equity being lost.
- The `nofollow` attribute is a hint to search engines not to pass equity or follow a link, primarily used for untrusted or paid links.
- Regularly audit your site’s directives with a crawler like ScreamingCAT to find and fix conflicting signals and ensure proper index control.
Ready to audit your site?
Download ScreamingCAT for free. No limits, no registration, no cloud dependency.