Your sitemap should include only the canonical, indexable URLs you actually want found, and exclude everything else. That means leaving out noindexed pages, URLs that redirect, non-canonical duplicates, and parameter or filter junk. A sitemap is not an inventory of every address your site can serve; it is a curated list of the pages you are asking search engines to index, and the cleaner that list, the clearer the request.

The reason for the rule is that a sitemap functions as a clarity signal. When every URL in it is a real, canonical, indexable destination, you are telling search engines plainly which pages matter and where to focus. The moment you stuff it with redirects, duplicates, noindexed pages, and parameter variations, you blur that signal. Now the file mixes pages you want indexed with pages that cannot or should not be, and search engines have to sort the wheat from the chaff you handed them. A bloated sitemap wastes the very clarity the sitemap exists to provide, and it can surface confusing errors in reporting tools when listed URLs turn out to be noindexed or redirected.

So the inclusion rule is simple to apply: a URL belongs in the sitemap only if it returns a 200, is the canonical version, and is meant to be indexed. If a URL redirects, points its canonical elsewhere, carries a noindex, or is a throwaway parameter variant, it does not belong there. This is not about size for its own sake; a large site with many genuinely indexable pages should have a large sitemap. It is about excluding the URLs that contradict the file’s purpose.

To put this to work, run your current sitemap against those four exclusions and prune anything that fails. Strip out the redirects, the non-canonical duplicates, the noindexed pages, and the parameter junk, so what remains is a tight list of canonical indexable URLs. The result is a sitemap that says exactly what you mean.