How to check for duplicate content

Having duplicate content on your site can affect your SEO efforts so knowing how to check for it and what to do can help you to give your rankings a boost. Read more in our guide.

You might know that you should try to avoid duplicate content on your site, but knowing how to check (and remove) duplicate content is easier said than done. 

Duplicate and similar content is a natural part of the web and often isn’t a problem for search engines that can canonicalise URLs. However, this can be a real issue if your site has widespread duplicate content. 

In this blog, we’ll cover everything you need to know about duplicate content, including:

  • What duplicate content is
  • Why you might have duplicate content 
  • Why duplicate content is a problem 
  • How to check for duplicate content
  • How to remove duplicate content

What is duplicate content?

Google describes duplicate content as content “within or across domains that either completely match other content or are appreciably similar”. This could mean on different pages of your website or on different websites. Duplicate content can also refer to content that’s very similar to other content or has been slightly rewritten or re-worded. 

So, if you publish multiple pages on your website with the same or very similar content, this is duplicate content. If you copy another site’s content (or they copy your content), this is also duplicate content.

Why might you have duplicate content?

In most instances, duplicate content is not intentional or malicious. You might have duplicate content on your site without meaning to - which is why knowing how to look for it is essential. You might find you have duplicate copy across landing pages, blog posts, product descriptions, product pages or meta descriptions.

However, some content is deliberately copied and used across multiple domains to increase traffic or rankings. This can lead to users getting frustrated when searching for information, and the same information is in multiple places, which is why search engines try to discourage this practice. 

You might notice that some of your original content has been stolen or scraped and is being used on another website. This is known as content scraping and is a highly unethical practice.

Why is duplicate content a problem?

Duplicate content can cause issues with search engines. Search engines might find it challenging to decide which version of content is the most relevant if the content is the same or highly similar. They might decide to exclude duplicate content from the results page.

While there is no ‘duplicate content penalty’ (a common SEO myth), this doesn’t mean that duplicate content doesn’t harm rankings. Similar content can dilute PageRank, link equity and page authority. This is because each time a user searches for a particular keyword, the search engines might choose which version of the content to show. The search engine robots might not always pick the same version, so one version of the content might have a higher ranking.

Having lots of duplicate content can also lead to crawling inefficiencies and a poor user experience. 

How to check for duplicate content

There are several ways to check for duplicate content. We’ll cover the main options below.   

Use Google 

One of the simplest ways to check if your site has any duplicate content is to use Google. Copy approximately ten words from the start of a sentence and paste it into Google. Ideally, only the webpage you pasted the phrase from should appear. If other pages appear, you should check these pages to see if they contain duplicate content. 

Google Search Console

Google Search Console is a free way to check for duplicate content on your site. Under ‘Performance’, you can check the ‘Search Results’ tab to find URLs that might be causing duplicate content issues. 

Here are some issues that might be identified:

  • www and non-www versions of the same URL
  • HTTP and HTTPS versions of the same URL
  • URLs with and without trailing slash “/”
  • URLs with and without query parameters
  • URLs with and without capitalisations
  • Multiple pages ranking for long-tail keywords

Use free tools

There are several free duplicate content checker tools you can use to help you create unique content. This will help ensure your content is viewed as unique by search engines. 

Copyscape: After you input a URL, this tool searches the web for similar or duplicate content. The comparison tool highlights the parts of the content that are duplicated and tells the percentage of your content that matches. The free version gives you a limited number of searches. However, the paid version gives you unlimited searches, deep searches and monthly monitoring for duplicate content. 

Siteliner: This tool allows you to check your entire site for duplicate pages. You can scan your entire site once a month for free, and it will also identify broken links and give you information on page load time. 

Duplichecker: Registered users can complete up to 50 daily searches using text, DocX files or URLs. This makes it easy to check any content you plan to post for duplicate or similar content. 

Plagspotter: This tool identifies possible instances of duplicate pieces of content across the web and is great for checking if your content has been stolen or scraped. The paid version includes plagiarism monitoring, so you can automatically monitor your URLs to find duplicate content.   

Use paid tools

There are also a couple of premium plagiarism checkers that you can use to ensure that your content is original and won’t be attributed to someone who didn’t write it. 

Some popular paid tools include:

  • Grammarly (a plagiarism checker that checks for grammar, word choice, spelling and more).
  • Plagiarismcheck.org (this plagiarism detector can find exact matches and paraphrased content).

How to remove duplicate content on your site

After you’ve identified duplicate content on your site, the next step is to either remove or manage the duplicate content. Here are a couple of options.

Rel = “canonical” tag

This is a snippet of code that tells search engine crawlers that a specific page is a duplicated version of the specified URL. So that the search engine can send all links and rankings to the specified URL rather than the duplicated content.

Using a rel = canonical tag is an excellent option if the duplicated content doesn’t need to be removed entirely. 

301 Redirects

This tells the search engine crawler that all traffic and ranking power should be redirected from one page to another. You should have traffic direct to the version of content that is performing the best in SERPs. Redirecting traffic from multiple similar pages will result in one stronger and more relevant page that should perform better. 

Robots Meta Noindex, Follow Tag

This tag is a snippet of code you can add to the HTML head of a page. This will prevent that page from appearing on the results page but still allow search engines to crawl the links.

The noindex, follow tag works well with content spanning multiple pages, resulting in multiple URLs. Adding this tag will mean only the first page shows in the search results. 

How to remove duplicate content from other sites

If your content has been stolen or scrapped, it might appear on other sites without your knowledge, permission or attribution. If this happens, the first step should be to contact the site owner and ask them to remove your content.

The next step is to contact the website hosting service if the owner doesn’t respond or refuses to remove the stolen content. You can find this by going to ‘Who Is Hosting This’. The website hosting service should act quickly to remove the duplicate content or take down the entire site.

If another site is getting high rankings using your content, you might want to consider filing a 

Digital Millennium Copyright Act (DMCA) complaint. You can do this through Google Webmaster Tools. However, the process can be complex, so you should only do this if the content scraping is a serious issue. 

Need help with your SEO strategy?

If you’re struggling to get to grips with duplicate content or your wider SEO strategy, our team can help. 

Logica Digital has over 15 years of experience with all aspects of SEO. Our team will start by assessing your site’s current performance and rankings, looking at elements like site speed, indexing, duplicate content and mobile friendliness. We can ensure that duplicate content isn’t holding your site back. 

Why not take advantage of our free digital marketing audit? We’ll look at your entire digital marketing strategy (including SEO and content) and give you some suggestions about where you can focus your efforts to see the best results. 

Get in touch with a member of our team today to learn more about what we do and how we can help you get the results you want!

Want more insights like this?
Subscribe to our monthly digital marketing newsletter

*By submitting this form, you agree that Logica Digital may contact you via email with digital marketing advice, news & promotions. You can view our Privacy Policy here..

Blog written by

Kezia Humphries
Content Executive