Why does Google keep one thin page and drop another that’s nearly the same?

Google drops the weaker of two near-duplicate pages without penalizing it, because what looks like a punishment is really a choice. When it finds pages that say nearly the same thing, it keeps the version with the stronger signals and sets the rest aside, selecting which one represents the group rather than flagging any of them as a copy. Reaching for the duplicate-content penalty to explain the disappearance gets the mechanism wrong; this is a referee making a call, not a foul being flagged.

The selection runs on signals, which is what makes it predictable rather than mysterious. Google weighs which version has the stronger backing: more links pointing to it, a cleaner fit for the query, a more recent or more complete treatment, and the canonical and internal-link cues that tell it which page you prefer. When those cues are consistent and well supported, Google usually honors them. When they are not, it overrides your declared preference and keeps whichever page it reads as strongest. Either way, it is comparing the two and crowning one, not docking the loser.

This is why the page that gets kept is rarely random. It is the one the signals favor, and the page that drops is the one those same signals leave weaker. Understanding that turns a frustrating outcome into an actionable one, because the deciding factors are things you can influence rather than a black box.

So the editor strengthens the version they want kept, by pointing internal links and the canonical at it and making it the more complete and better-supported page, instead of fearing a penalty on the one that disappeared. The way to control which page survives is to make it the obvious representative.

Why does Google keep one thin page and drop another that’s nearly the same?

Related posts: