Stop worrying about duplicate content

By the way some people talk about duplicate content, you’d think it was on par with selling drugs to kids on street corners.

“Don’t allow for duplicate content or Google will penalize you?”

“If you have duplicate content out there, you ‘ll lose your SEO ranking.”

“If you have duplicate content, that freaky demon that crawls out the TV in The Grunge will come for you.”

And here’s the thing: I can’t blame these people. There is so much false or, at best, misleading information on the internet about this topic. (I know. Crazy right? Misleading information on the internet?)

Duplicate vs copied

But here’s the truth. Duplicate content in-and-of-itself is not “bad,” and Google does not summarily penalize you for it. In fact, there are plenty of examples of duplicate content you might not even know exists:

When you have both http: and https: versions of your site
Versions of your site with and without the www (e.g. www.yoursite.com and yoursite.com)
Multiple descriptions of an item for sale on different shopping sites
Printer versions of a web page
Podcast descriptions on the podcast’s main site and the web pages of the corresponding podcast distribution sites for the episodes.
Etc.

Google recognizes that there are naturally-occurring or “innocent” versions of duplicate content where there is no nefarious motive. But don’t take my word for it. On a group Q&A session with Andrey Lipattsev, Google’s Search Quality Senior Strategist, he makes the distinction between duplicate and copied content (it starts at 4:16):

Basically, “duplicate” content represents content which, through no deceptive means, is duplicated across your domain or multiple domains. Google’s help center defines it as:

…substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin (emphasis mine).

That last part is the key. The kind of content that people worry about when they say things like “You can’t have duplicate content,” is what Lipattsev was mentioning in the video. This is copied content or purposeful actions by creators to try to trick the Google bots into ranking their site (e.g. spam or scraper sites).

Why and when duplicate content should concern you

Just because Google might not specifically penalize a site for being a duplicate, that doesn’t mean there isn’t a legitimate concern to acknowledge. Specifically, making sure that it is your site that Google serves or ranks higher on the SERP (search engine results page).

The way Google works, it takes all the sites that would qualify to be served on a SERP. A group of pages that have duplicate content would be clustered and it is from that cluster Google determines which one to rank high. And therein lies the rub. Your site might not necessarily be the one Google ranks higher.

Let me demonstrate with one of my own old sites. A few years ago I ran a podcast called Radio Film School. When I Google “radio film school episode 3,” here are the top search results:

Radio Film School SERP shows sites with duplicate content.

The webpage for the episode is the fifth result on the SERP. The corresponding episode webpages for Apple podcasts, iHeart Radio, and Radio Public all appear first.

The Dare Dreamer FM webpage for Radio Film School — The podcast website version of the episode.

Apple podcast page for Radio Film School — The Apple podcast webpage of the episode.

The iHeart Radio version of Radio Film School — The iHeart Radio version of the episode.

First, recognize that here you have four examples of duplicate content. None of the pages are “penalized” per se. They are all indexed and appear in the top five search results. But the page I might otherwise want to be ranked first is not. Why?

The answer is easy. I have not updated this website in over three years (I no longer produce new episodes of the show or new blog posts on this blog). So when determining which domain to rank higher, Google is going to take into consideration which pages a person is more likely to want to reach. It’s no surprise then that a search for a podcast episode yields as the top organic result, the #1 podcast distributor.

Help Google pick your site

So it turns out that the best way to help ensure that it’s your site that is ranked higher in a cluster of duplicate content, is to practice all the “healthy” SEO habits we already know and love—create strong, authoritative content that has numerous backlinks.

In the Google Q&A video above, Google’s Webmaster Trends Analyst John Mueller mentions that one way to help ensure your site gets ranked higher in those situations is to have aspects of your site that set it apart and make it unique. Keep in mind that at the end of the day, all Google cares about is serving results that will facilitate the user finding the best option for their query.

If there are two sites selling the same piece of furniture, and each has the same block of text, Google might look for a geographic indicator to determine which site is ranked for that search (i.e. if one of the stores has a location physically closer to the person conducting the search).

Syndicating your content

A practice that frequently raises the concern of duplicate content is content syndication. This is when you take an article or blog post you wrote for your own site and you have it re-purposed and re-published on other sites. This is typically done as a strategy to increase exposure and topic authority. (A good example of this is when I write a blog post here then also have it re-published on Medium.)

There are a number of ways you can help ensure that Google sees your article as the original source. The easiest and most common is to have the syndicating site mark your article as the canonical source (i.e. rel=canonical). Most blogging platforms have forms that make it easy to enter this information:

The “Advanced” section of the Yoast SEO plugin.

If you syndicate your content to Medium, once you import your article, you can set the original URL as the canonical one:

You can help ensure Google sees your post as the original a few other ways:

Have the syndicating site “noindex” their version of your post. This tells Google not to index their version of the article.
Have the syndicating site give you an attribution link.

Unfortunately, sometimes sites “syndicate” your content by “borrowing” large sections of a post you wrote, but doing so without your permission. When I was managing editor for the Frame.io Insider, this happened a few times to articles we published. Most recently, an article I wrote for Pro Video Coalition was scraped by three or four spammy sites. It’s the nature of the beast.

Your best bet in situations like this is to reach out to the site and ask to have them link back to you and give you proper attribution (if they haven’t already done so). If you have reason to believe their version of the page is hurting your ranking, you could report them to Google under the Digital Millenium Copyright Act. More often than not, it’s often better not to do anything. Google is pretty good at figuring out which sites are “spammy” and copycats and penalizing them vs. you.

Just make great content

The moral of the story is to do all you can to ensure your site provides content that is informative, authoritative, mobile-friendly, and optimizes the visitors’ experience (e.g. use of rich media, fast loading times, etc.) Make great content and follow healthy SEO habits. And if you need help doing so, I think I know someone who can help. 😀

Header image by Jørgen Håland on Unsplash

Please follow and like us: