Exclude specific Sitecore files from Google search
A customer recently requested that we exclude specific types of files stored in Sitecore Media Library from being indexed by Google Search. We considered three different approaches:
- Creating a separate download page with captcha that would prevent any non-human user from downloading and hence, indexing the file;
- Encrypt the files with a password which would be displayed next to the download link link;
- Adding a special “X-Robots-Tag” to the response headers that would instruct crawlers not to index the content.
After reviewing the options with the customer, the first two options were ruled out since they were kluge and not user-friendly. We decided to pursue the third option for implementation.
We began by adding a new checkbox field called “No Index” to the default File template located in /sitecore/templates/System/Media/Unversioned/File template. The checkbox allows the author to designate files for inclusion or exclusion into external search engine indexes.
The next step is to implement an event handler to process the “No Index” tag. The handler will add the appropriate header info to prevent indexing. Here is a code example for ours:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
using System; using System.Linq; using System.Net; using System.Web; using Sitecore; using Sitecore.Configuration; using Sitecore.Data.Items; using Sitecore.Events; using Sitecore.Resources.Media; using Sitecore.Web; namespace MyProject.EventHandlers { public class AddNoIndexHeaderEventHandler { private const string NoIndexFieldName = “No Index”; public void OnMediaRequest(object sender, EventArgs args) { if (Context.Site.Name.Equals(Constants.ShellSiteName, StringComparison.InvariantCultureIgnoreCase)) return; var sitecoreEventArgs = (SitecoreEventArgs)args; if (sitecoreEventArgs == null || !sitecoreEventArgs.Parameters.Any()) return; var request = (MediaRequest)sitecoreEventArgs.Parameters[0]; Media media = MediaManager.GetMedia(request.MediaUri); if (media == null || media.MediaData == null) return; Item item = media.MediaData.MediaItem; if (item == null || item[NoIndexFieldName] != “1”) return; HttpResponse response = HttpContext.Current.Response; response.AddHeader(“Content-Disposition”, string.Format(“attachment;filename=\”{0}.{1}\”“, item.Name, media.Extension)); response.AddHeader(“X-Robots-Tag”, “noindex, nofollow”); response.StatusCode = (int)HttpStatusCode.OK; using (MediaStream stream = media.GetStream(request.Options)) { response.ContentType = stream.MimeType; var fileContent = stream.Stream; response.AddHeader(“Content-Length”, fileContent.Length.ToString()); WebUtil.TransmitStream(fileContent, response, Settings.Media.StreamBufferSize); } response.Flush(); response.End(); } } } |
The final step is to register the event handler in Sitecore. This is done by creating a new configuration file in the \Website\App_Config\Include folder with the following content:
1 2 3 4 5 6 7 8 9 10 |
<?xml version="1.0" encoding="utf-8"?> <configuration xmlns:patch="https://www.sitecore.net/xmlconfig/"> <sitecore> <events> <event name="media:request"> <handler type="[Namespace].AddNoIndexHeaderEventHandler, [ASSEMBLY]" method="OnMediaRequest" /> </event> </events> </sitecore> </configuration"> |
