I have numerous pages on our website that I don’t want to feature in our sitemap. Long story short, these are certificates that we give to visitors who have completed a basic training on our website as a part of a marketing initiative.
Each page is auto-generated when they complete their training. However, we don’t link to these pages anywhere else: they just exist. Now they are showing up in our sitemap and analytics as orphaned pages.
It’s crazy that there is a ‘hide’ from search engines checkbox, but then the page is still included on a sitemap. Same with a ‘thankyou’ page for a form.
@greywood - EDITED - The option to “exclude this page from site search results” only stops the page from being indexed by Webflow’s internal search. Does not affect bots since it does not add a meta tag to the page.
You have to manually add one if you want to restrict bots from indexing a page. Then it does not matter if it is in the sitemap.
Hi Jeff, yes it matter for SEO if you have a URL in your Sitemap that you are tagging as “noindex” in the meta robots tag, that make no sense, and Webflow should to take care on this things ASAP.
Since they provide custom code areas for pages and on each CMS template and custom sitemaps, you can do what is needed, just not with a switch. I doubt you will see any movement on this one but who knows. You can create an item in the wish list if one does not exist.
I ran into exact same issue… I need pages to be published for data referencing in otherpages, but I don’t want actual pages to be visible (so I hide them with a password field). But those pages are still printed to the sitemap. Ignoring and hidding from the page search. I even added robots.txt:
User-agent: *
Disallow: /companies/
That still did not help, Google is not happy.
@greywood how did you manage generating your own sitemap?
Of course we don’t need anything to stop indexing of “internal search”, what is the use case for that, we expect “Exclude page from site” to be removed from the sitemap.
Hi,
I’m still confused after reading all the help files I can find on this. I also want to exclude some pages added by webflow to the sitemap, so tried to turn off automatic generation of the sitemap. How do I upload my version? (I tried just cutting and pasting content in the place given, but after publishing, the sitemap didn’t change, so I toggled back to automatic generation.)
Thanks for your help,
Libby
I would love to see more granular control over the auto-generated sitemap. It’s unusable for us currently and manually updating the sitemap is not sustainable as we’re constantly creating new pages and updating others.
Add it to the wishlist! I’m kidding of course. It’s been an item for 5 and a half years now and was actually initiated by someone who actually worked at Webflow for awhile. If that gives you any indication if it will ever be addressed.
Excluding it from the sitemap wouldn’t actually accomplish anything. The sitemap is just a convenience that helps search engines find your pages faster. However if you have any links to your page, searchbots will find it anyway.
The way to tell Google not to index the page is to add a noindex META tag to your page HEAD;
<meta name="robots" content="noindex">
Waldo put this feature on the Wishlist about 6 years ago, make sure to vote for it if you haven’t yet.
It shipped Nov 2023.
One issue with auto generating all pages to a sitemap (without control) and then setting select pages to noindex is it can cause errors in the Google Search Console. At times this will prevent the sitemap from ever being used, which defeats the purpose of it.
A second issue is while it’s nice we have the option to add our own sitemap, overriding the auto generated one, without us being able to update it via the API is crippling any real use. It’s not tenable for someone to log into project settings on every new blog post (as just 1 example) to update the sitemap.