Exclude specific Sitecore files from Google search

A customer recently requested that we exclude specific types of files stored in Sitecore Media Library from being indexed by Google Search.  We considered three different approaches: 

  • Creating a separate download page with captcha that would prevent any non-human user from downloading and hence, indexing the file;
  • Encrypt the files with a password which would be displayed next to the download link link;
  • Adding a special “X-Robots-Tag” to the response headers that would instruct crawlers not to index the content.

After reviewing the options with the customer, the first two options were ruled out since they were kluge and not user-friendly.  We decided to pursue the third option for implementation.

We began by adding a new checkbox field called “No Index” to the default File template located in /sitecore/templates/System/Media/Unversioned/File template.  The checkbox allows the author to designate files for inclusion or exclusion into external search engine indexes.

BlogPost 1 1024x470 Exclude specific Sitecore files from Google search

The next step is to implement an event handler to process the “No Index” tag.  The handler will add the appropriate header info to prevent indexing. Here is a code example for ours:

 

 

The final step is to register the event handler in Sitecore.  This is done by creating a new configuration file in the \Website\App_Config\Include folder with the following content:

 

 



About the author
Alternative Text

Steven Pogrebivsky

Steve Pogrebivsky is an expert in information and content management systems with over 20 years of experience. Steve holds a BS in Computer/Electrical Engineering and an MBA in Information Systems from Drexel University.
Tags: