Does the thought of duplicate content make you gasp? For many marketers, it does. They believe that Google and Bing see duplicate content as a malicious attempt to boost rankings and ruthlessly punish all offenders. These offenders then lose traffic and leads in addition to the trust of the search engine giants. This nightmare has been referred to as the dreaded "duplicate content penalty."
There is one problem; It's not real. It is built from misperceptions, misunderstandings, and half-baked information. You deserve the truth. This article will help you understand how search engines really treat duplicate content so that you can accept and embrace it, not fear it.
What is Duplicate Content?
We should start by understanding what defines duplicate content. Google states that "duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar." SEO authority Moz
puts it more simply: "duplicate content is content that appears on the Internet at more than one URL." That is pretty clear and easy to understand and is most likely what you thought the definition was anyway.
However, what might come as more of a surprise are some of the ways you end up with duplicate content. Sure, there is the obvious way of purposely copying content from one page to another, from either within your site or across sites. But did you know that even a single page can be seen as a duplicate of itself? Let’s take a look at some of the ways you can unknowingly end up with “duplicate content.”
Confusion by Variations of the Same URL
One way that a single page can be seen as a duplicate of itself is when the page is accessible via different URL structures. For example, say we have a page at www.bitwizards.com/duplicate-content. You could probably get to this page using www.bitwizards.com/duplicate-content/
as well or with www.bitwizards.com/Duplicate-Content
or even without the www
as just bitwizards.com/duplicate-content
. We know that these are all the same page but the search engines will not, without a little help. They see four unique URLs that just happen to return the exact same (duplicate) content. Uh oh, right? The good news is that there are steps you can take to help them understand that they are in fact all the same page. But more on that later.
The Problem with URL Parameters
In addition to the different URLs, a single page can also end up being identified as duplicate content when URL parameters are added. These are used for tracking purposes or passing values for programming reasons and look like this: www.bitwizards.com/duplicate-content?source=email
. The parameters are the key value pairs added after the "?" in the URL such are “source=email” in this example. Because the search engines treat each unique URL as a different page, they see each combination of the parameters as a duplicate of the same parameter-less URL in many cases. Don't fret, some of the same solutions to the different URL variations above will work in these cases too. We will get to those shortly.
What About Cross-domain Duplicates?
These cases with the URLs are ones that can cause duplicate content issues within the same website, but there are also cases of duplicate content across sites. A common scenario for this is guest blogging, which is when an author writes a piece of a content for another publication other than their own. Guest blogging in itself does not create duplicate content but in many cases, the guest blogger will choose to post the piece on their own domain as well which does result in duplicate content. Syndication of your content to other sites presents the same problem.
There are also some sites that will scrape your content to add it to their site. This action is not usually performed with malicious intent and from the public side you still present clearly as the author. However, without exercising care the search engines may not be able to identify you as the original author, which does present a problem. There are solutions you can use in these cases, but before we talk about the proper ways to address these issues, we need first to understand what happens if we don't address them at all.
What is the Penalty for Duplicate Content?
The prevailing thought is that Google and Bing will penalize you if they determine your content has been duplicated. The "penalty" has been described as anything from a minor rankings impact to a significant hit to your domain authority. The reality is that this is only true in the most extreme cases where search engines can clearly determine that your intent is deception and manipulation. This situation requires greater context than just copies of the same content. For more detail, read Google's thoughts on duplicate content here.
The real problem with duplicate content is not that the search engines will enforce a penalty. It is that they will only consider one version of it for ranking signals and for displaying in search results. This is reaction is because they focus on providing a good user experience and do not want multiple results that bring users to the same content when one will suffice.
What this means for you is that the page you want search engines to use for ranking and display in the results may not be the one they select. This selection may have less of an impact if they choose another version of content that is on your site, rather than if they were to choose a version on another domain. If they choose another domain, then that site will receive all of the search traffic that your content has ranked. The impact of this will vary based on the actual content and your strategy for that piece, but it can be and usually is pretty significant. The good news is that you do have some control to help the search engines “get it right.”
301 Redirects – All Roads Lead Home
One way to ensure that search engines (and users) always get to a single unique URL for a piece of content is to use 301 redirects. These redirects are search-engine friendly instructions that immediately redirect traffic from one URL to another thus identifying the final URL as the master or original. Let's go back to our previous example of different variations of the same URL. You can tell the search engines that www.bitwizards.com/duplicate-content
is the master or original by setting up 301 redirects that point each of the other variations to that URL. This tactic ensures that all of the other URLs will still work (which is good) but will ultimately end at this master URL. All roads lead home.
Implementing these 301 redirects can be done in different ways depending on how your website is setup. Some content management systems (CMS), such as Kentico
, give you an easy way to create and manage these redirects without technical skills. If your CMS does not provide this, you can still configure the redirects but you will need your webmaster to do so on the server.
Another way to define the master or original URL for a piece of content is with canonical links. A canonical link is a small piece of HTML code that tells the search engine the exact URL of the master version of the content. It looks like this: <link href="https://www.bitwizards.com/duplicate-content" rel="canonical" />
and must be placed within the <head>
tag. This link does not produce a redirect like the 301s we just discussed but accomplishes the same goal of helping the search engines understand where the original version lives. Where 301s are a great solution within the same domain or website, they are not an acceptable solution for content that has been guest blogged or syndicated to another site. The reason that site is using your content is to provide value to their users, and they cannot do that if they just automatically send the traffic to your site. Canonical links are perfect in this scenario as it allows the site to display your content but clearly informs the search engine who the original author is and where the original content lives.
Google and Bing Webmaster Tools
Both of the major search engines provide tools to manage various search related tasks. These tools are powerful and help with more than just duplicate content issues. Let's come back to the previous problem with URL parameters. The problem was that adding "?source=email" to the URL caused the search engines to see two different URLs for the same content thus identifying it as duplicate content. With these webmaster tools, you can address this problem by setting parameters that you want the search engine to ignore like "source" from our example. When you do this, the search engines will then see URLs like www.bitwizards.com/duplicate-content and www.bitwizards.com/duplicate-content?source=email as the same and understand that they are in fact the same content.
As you see, duplicate content is nothing to fear. The only "penalty" that is given is to those that make a conscious effort to abuse the system. That is not to say that duplicate content does not come without some problems and concerns. It certainly does. But more often than not these are things that you can control with the right care and techniques.
I hope this has eased your mind and given you a better understanding of the issues around duplicate content. Sharing content across pages and domains is a valuable tactic for any marketing strategy. If you embrace it and do it right, you will find that as something that many fear, duplicate content can bring more good than harm.