Client Pay Portal
 kentico marketer tips

How to Prevent a Document from Being Indexed on Your Site

A Tip for Marketers Using Kentico
Welcome to a new series of posts designed to help content administrators and digital marketers do more with Kentico. These posts will be short, focused reads that explain how to accomplish common tasks in Kentico that many end users may not know.
 
This first post in the series will cover two different options to prevent a document or collection of documents from being indexed by search engines. Why would you want to prevent search engines from finding your pages, though? One reason might be your custom error pages. You wouldn't want someone to click on a search result only to land on your error page. Another case might be “Thank You” pages that are displayed once a user completes a form. There should be a specific path and series of events that lead someone to those pages, such as completing the form, and you wouldn't want them to start at the end of that journey. These are just a couple simple examples. Other reasons can include protected content or sensitive content that the public should not be able to navigate.
 

Option #1: Excluding Single Documents

The first option for excluding a single document from being indexed by search engines is the "Exclude from Search" setting. To access this setting, click on the Properties tab for a given document and then select the Navigation option.
properties tab in kentico
This action will bring you to a tab where you can manage different navigation related settings for the selected document. Check the box for Exclude from Search in the Search & SEO section and then click the Save button at the top.
search and seo in kentico
That's it. This document will no longer be indexed by search engines now. 
 
What really happens under the hood when you do this is that a specific HTML tag is added to the markup that tells the search engine spiders not to crawl this path and not to index this page. The tag is placed in the <head> section of the markup and looks like this: <meta name="robots" content="noindex,nofollow" />
 

Option #2: Robots.txt

The second option you have is to use the robots.txt page to configure specific paths of the site to be blocked from the spiders. If your site does not have a robots.txt page that is managed through Kentico, you should create one or have your development partner do so. All search engines and other crawlers, such as Twitter, for example, look for and read this document for instructions on how they should index your content. If no such file exists (or any empty one does), all content that the spiders find will be indexed. To start preventing pages or sections from being indexed, you must add specific text to the document. Let's take a look at an example:

User-agent: *
Disallow: /private-documents


This text is telling all user-agents (or spiders) to disallow any documents living at or under the /private-documents directory for the site. This means 
private documents url
will not be indexed by any of the search engines. If you want to add multiple pages or directories, you simply add additional "Disallow" sections like this:

User-agent: *
Disallow: /private-documents
Disallow: /members-only
Disallow: /error-pages/404.aspx


In that example, we are excluding all documents at and under that /private-documents and /members-only sections as well as the specific 404 page at /error-pages/404.aspx.
 
Ok, so now you know how to instruct the spiders on what not to index but where do you actually edit this in Kentico? First, you have to know where your robots.txt document is. It will usually be named Robots and will live either in the root of the document list or in a folder or ad-hoc pages. Ask your developers if you still have troubling locating the document.
 
With the document selected, click on the Design tab.
kentico design tab
If you do not see a Design tab, this is because you do not have developer access rights. That is usually due to the roles and responsibilities defined for your website and in this case, you should contact your developer to make the necessary update. If you do see the Design tab, click on it to load the template view. From there, click on Configure for the Custom Response web part.
configure custom response
This action opens a new window with some properties. Place your updates in the content text box and then click the Save and Close button.
kentico content textbox
That's it; your robots.txt is now updated. You can navigate to www.yourdomain.com/robots.txt, and you should see your updates. 
 

There you have it!

Two simple ways to prevent specific content from being indexed. I hope you found this information helpful. If you have any questions about this topic or these solutions, please leave them in the comments below and I will gladly assist. Stay tuned for more Kentico Tips coming soon!
 

Author

Wiz E. Wig, Mascot & Director of Magic
Wiz E. Wig

Director of Magic