How do you determine if Google sees your content as unique? A quick, easy test is to do a highly specific search. Copy and paste a sentence from your page into Google. If the result looks normal, Google does not detect any duplicate content issues. If, on the other hand, Google has found duplicate content issues, you may see only one page cited in the SERP (hopefully it’s yours) along with the following message:
Google defines duplicate content in their guidelines as such: “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.”
According to Google’s Search Quality Senior Strategist, Andrey Lipattsev, Google does not have a duplicate content penalty. This statement was supported by Google's Webmaster Trends Analyst, John Mueller, during a regular Google Webmaster hangout session. He emphasized that there is “No duplicate content penalty” but “We do have some things around duplicate content … that are penalty worthy”. What does that mean exactly? Our interpretation is that you cannot expect to rank high in Google with content that is duplicated from other, more trusted sites. You can check this with free seo tools.
Even with no duplicate content “penalty,” it is widely accepted that Google rewards quality, uniqueness, and the signals associated with adding value. Meanwhile, a critical component of cost-effective SEO is creating pages that are seen as unique but that also leverage existing content. This brings up a justified question: How much original information should appear on the page in order to be considered unique?
We tested this very question. Read on to find our results!
Hypothesis: At least 50% of a page needs to be unique
Our test was designed to find a “sweet spot” of words on a page - that number which would make a page unique in the eyes of Google. To that end, we ran 5 tests, each with its own target keyword. The goal was to compare 2 pages (Page A and Page B), each with a different percentage of duplicate content, and figure out the optimal percentage of unique text for various cases. Page A was indexed first; Page B is then indexed to test whether it’s duplicate content.
Objective: To test whether a 50/50 ratio of duplicate to unique content is considered unique.
Objective: To test whether 2 blocks of duplicate content at a 50/50 ratio of duplicate to unique content is considered unique.
Objective: To test whether a 33/66 percentage of unique to duplicate content is considered unique.
Objective: To test whether an increased word count at a 33/66 percentage of unique to duplicate content is considered unique.
OObjective: To test whether 2 blocks of duplicate content at a 40/60 percentage of unique to duplicate content is considered unique.
In order to test our hypothesis, we first had to determine in which situation Google considers a page ‘unique’. Fortunately, testing has given us an insight into some of Google’s default tendencies. If SERPs contain duplicate content, Google will omit the results and display the following:
It means that if that notice is not displayed in the SERPs for a target keyword, then Google sees all the displayed results as unique. If that notice is displayed, then we know that Google sees the displayed results as duplicate. The results of five tests varied depending on the percentage of duplicate content.
The objective for the first test was to determine whether a 50/50 ratio of unique to duplicate content is considered unique. In the image below, both pages were shown in SERPs, which leads to the conclusion that a 50/50 ratio of unique to duplicate content is seen as original by Google.
In the second test, we tried to prove whether 2 blocks of duplicate content at a 50/50 ratio of unique to duplicate content would be considered unique. The test results appeared to be positive again.
The goal of test #3 was to determine whether a 33/66 percentage of duplicate to unique content is considered unique. Unlike the previous 2 tests, this time SERPs displayed only one page along with the already familiar notice “In order to show you the most relevant reposts, we have omitted some entries very similar to the 1 already displayed...”. This means that a 33/66% unique to duplicate ratio is considered duplicate content.
The setting for test #4 was very similar to test #3 as we wanted to prove whether a 33/66% unique to duplicate ratio is seen as duplicate content. However, this time we wanted to test whether the results would stay the same if the unique word count was increased to 400. The content was also considered duplicate.
The goal of the final test was to scientifically prove whether a page with 60% and 300 words duplicate content would be seen as unique. The results showed this ratio of unique to duplicate content is still considered to be duplicate content.
The bottom line - It seems there needs to be at least a 50/50 ratio for a page to be determined unique. Regardless of how many words are used (i.e. 100 words unique and 100 words duplicate are considered unique content whereas 400 words unique and 800 duplicate - duplicate) the ratio appears to be the deciding factor.
This is a pretty exciting result: not only does it significantly lessen the burden of copywriting, but it also helps answer a question that nearly every client will ask at some point.
Note: In this study we were using duplicate content in the body copy of text excluding page titles.
This experiment was run about a year ago. There has since been a more recent and exciting test on a very similar subject that we will publish very soon. So stay tuned!