Remove pages from sitemap

Hey all,

I have numerous pages on our website that I don’t want to feature in our sitemap. Long story short, these are certificates that we give to visitors who have completed a basic training on our website as a part of a marketing initiative.

Each page is auto-generated when they complete their training. However, we don’t link to these pages anywhere else: they just exist. Now they are showing up in our sitemap and analytics as orphaned pages.

How can I remove these pages from our sitemap?

3 Likes

By using a custom sitemap instead of relying on auto generation where you have no control. It’s a Bummer.

It’s crazy that there is a ‘hide’ from search engines checkbox, but then the page is still included on a sitemap. Same with a ‘thankyou’ page for a form.

2 Likes

@greywood - EDITED - The option to “exclude this page from site search results” only stops the page from being indexed by Webflow’s internal search. Does not affect bots since it does not add a meta tag to the page.

You have to manually add one if you want to restrict bots from indexing a page. Then it does not matter if it is in the sitemap.

Hi Jeff, yes it matter for SEO if you have a URL in your Sitemap that you are tagging as “noindex” in the meta robots tag, that make no sense, and Webflow should to take care on this things ASAP.

4 Likes

Since they provide custom code areas for pages and on each CMS template and custom sitemaps, you can do what is needed, just not with a switch. I doubt you will see any movement on this one but who knows. You can create an item in the wish list if one does not exist.

I’ve made my own sitemap and used meta tags to control indexing in this case. Not ideal but it works in getting technical items spot on at least.

I ran into exact same issue… I need pages to be published for data referencing in otherpages, but I don’t want actual pages to be visible (so I hide them with a password field). But those pages are still printed to the sitemap. Ignoring and hidding from the page search. I even added robots.txt:

User-agent: *
Disallow: /companies/

That still did not help, Google is not happy.

@greywood how did you manage generating your own sitemap?

Same issue… another issue with webflow…

Of course we don’t need anything to stop indexing of “internal search”, what is the use case for that, we expect “Exclude page from site” to be removed from the sitemap.

+1 It would be great if we could get this added as a toggle on the page options.

3 Likes

Hi,
I’m still confused after reading all the help files I can find on this. I also want to exclude some pages added by webflow to the sitemap, so tried to turn off automatic generation of the sitemap. How do I upload my version? (I tried just cutting and pasting content in the place given, but after publishing, the sitemap didn’t change, so I toggled back to automatic generation.)
Thanks for your help,
Libby

1 Like

CryoLayer gives you more control over sitemap, including excluding pages.

You can use some websites to help like https://www.xml-sitemaps.com/

and then modify to your needs

I would love to see more granular control over the auto-generated sitemap. It’s unusable for us currently and manually updating the sitemap is not sustainable as we’re constantly creating new pages and updating others.

5 Likes

Add it to the wishlist! I’m kidding of course. It’s been an item for 5 and a half years now and was actually initiated by someone who actually worked at Webflow for awhile. If that gives you any indication if it will ever be addressed.

https://wishlist.webflow.com/ideas/WEBFLOW-I-211

2 Likes

+1 It would be great if we could get this added as a toggle on the page options

3 Likes

+1 would be great if there is an option to simply exclude a page from the sitemap

6 Likes

+1 would be great if there is an option to simply exclude a page from the sitemap

2 Likes

Excluding it from the sitemap wouldn’t actually accomplish anything. The sitemap is just a convenience that helps search engines find your pages faster. However if you have any links to your page, searchbots will find it anyway.

The way to tell Google not to index the page is to add a noindex META tag to your page HEAD;

<meta name="robots" content="noindex">

Waldo put this feature on the Wishlist about 6 years ago, make sure to vote for it if you haven’t yet.
It shipped Nov 2023.

https://wishlist.webflow.com/ideas/WEBFLOW-I-211

UPDATE: This feature has now shipped, and can be found under page settings.

I’ve written up some notes here-

1 Like

Chiming in as another vote for this feature.

The problems…

One issue with auto generating all pages to a sitemap (without control) and then setting select pages to noindex is it can cause errors in the Google Search Console. At times this will prevent the sitemap from ever being used, which defeats the purpose of it.

A second issue is while it’s nice we have the option to add our own sitemap, overriding the auto generated one, without us being able to update it via the API is crippling any real use. It’s not tenable for someone to log into project settings on every new blog post (as just 1 example) to update the sitemap.

1 Like