Google indexes one of two near-identical pages because it is selecting a representative, not punishing a duplicate. When it finds pages whose main content is the same or nearly the same, it picks one as canonical and indexes that version, while the other is treated as an alternate rather than penalized. There is no duplicate-content penalty being applied here; this is deduplication, Google choosing which single version to show so it is not serving the same thing twice.
What decides which twin survives is which page Google judges the most complete and useful representative, weighted by a handful of signals you can actually influence. Google has described picking the version that is objectively the most useful for searchers, and the signals that steer that choice are known. Redirects are the strongest: if one URL redirects to the other, the destination wins. A rel=canonical tag is a strong signal but a hint, not a command, so Google can override it if other evidence disagrees. Sitemap inclusion is a weaker nudge. On top of those, Google leans on which version your internal links consistently point to, and it prefers HTTPS over HTTP when the pages are otherwise equal.
This is why the fix is about sending consistent signals, not about hunting for a penalty. When Google keeps the page you did not want, it usually means your signals are split: the canonical tag says one thing while your internal links, or a stray redirect, or the sitemap, point somewhere else. Google resolves that conflict by its own read of which page is the better representative, and that may not match your intent.
When two near-identical pages compete, you can predict and steer the outcome. Decide which version you want indexed, then line the signals up behind it: point internal links at it, set the canonical to it, include it in the sitemap, and serve it over HTTPS. When the signals agree, Google usually keeps the page you chose. When they fight, it keeps the one its signals favor, which is the real lesson worth acting on.