Welcome to a new series of posts designed to help content administrators and digital marketers do more with Kentico. These posts will be short, focused reads that explain how to accomplish common tasks in Kentico that many end users may not know.
This first post in the series will cover two different options to prevent a document or collection of documents from being indexed by search engines. Why would you want to prevent search engines from finding your pages, though? One reason might be your custom error pages. You wouldn't want someone to click on a search result only to land on your error page. Another case might be “Thank You” pages that are displayed once a user completes a form. There should be a specific path and series of events that lead someone to those pages, such as completing the form, and you wouldn't want them to start at the end of that journey. These are just a couple simple examples. Other reasons can include protected content or sensitive content that the public should not be able to navigate.
Option #1: Excluding Single Documents
The first option for excluding a single document from being indexed by search engines is the "Exclude from Search" setting. To access this setting, click on the
Properties tab for a given document and then select the
Navigation option.

This action will bring you to a tab where you can manage different navigation related settings for the selected document. Check the box for
Exclude from Search in the
Search & SEO section and then click the
Save button at the top.

That's it. This document will no longer be indexed by search engines now.
What really happens under the hood when you do this is that a specific HTML tag is added to the markup that tells the search engine spiders not to crawl this path and not to index this page. The tag is placed in the
<head>
section of the markup and looks like this:
<meta name="robots" content="noindex,nofollow" />
Option #2: Robots.txt
The second option you have is to use the robots.txt page to configure specific paths of the site to be blocked from the spiders. If your site does not have a robots.txt page that is managed through Kentico, you should create one or have your development partner do so. All search engines and other crawlers, such as Twitter, for example, look for and read this document for instructions on how they should index your content. If no such file exists (or any empty one does), all content that the spiders find will be indexed. To start preventing pages or sections from being indexed, you must add specific text to the document. Let's take a look at an example:
User-agent: *
Disallow: /private-documents
This text is telling all user-agents (or spiders) to disallow any documents living at or under the /private-documents directory for the site. This means
will not be indexed by any of the search engines. If you want to add multiple pages or directories, you simply add additional "Disallow" sections like this:
User-agent: *
Disallow: /private-documents
Disallow: /members-only
Disallow: /error-pages/404.aspx
In that example, we are excluding all documents at and under that /private-documents and /members-only sections as well as the specific 404 page at /error-pages/404.aspx.
Ok, so now you know how to instruct the spiders on what not to index but where do you actually edit this in Kentico? First, you have to know where your robots.txt document is. It will usually be named Robots and will live either in the root of the document list or in a folder or ad-hoc pages. Ask your developers if you still have troubling locating the document.
With the document selected, click on the
Design tab.

If you do not see a
Design tab, this is because you do not have developer access rights. That is usually due to the roles and responsibilities defined for your website and in this case, you should contact your developer to make the necessary update. If you do see the
Design tab, click on it to load the template view. From there, click on
Configure for the
Custom Response web part.

This action opens a new window with some properties. Place your updates in the
content text box and then click the
Save and
Close button.

That's it; your robots.txt is now updated. You can navigate to
www.yourdomain.com/robots.txt
, and you should see your updates.
There you have it!
Two simple ways to prevent specific content from being indexed. I hope you found this information helpful. If you have any questions about this topic or these solutions, please leave them in the comments below and I will gladly assist. Stay tuned for more Kentico Tips coming soon!