How to adjust the canonical on paginated pages?

Is it possible to manually / automatically adjust the canonical on all paginated pages to be self-referencing?

In utilizing Webflow’s Global Canonical Tag URL feature, i’ve noticed my paginated pages (for example on our news category page) now contain a canonical to the first page. Giving search engines a signal that these pages contain duplicate content and should not be indexed.

From an SEO perspective not ideal as these paginated pages still contain linked content that’s unique, which further along the line might not get crawled (as frequently) or indexed.

Looking forward to your insights and answers.

Thanks in advance!


Here is my site Read-Only link: https://preview.webflow.com/preview/thesio-website?utm_medium=preview_link&utm_source=designer&utm_content=thesio-website&preview=e763623b19e8fd1ddcc6cea5e8ba6607&workflow=preview

3 Likes

Would love some input, even if it’s just to let me know it’s not possible to adjust the canonical on paginated pages.

Thanks and kind regards,

Nope since you only get one template with CMS collections.

1 Like

Are there any updates here? As far as I can tell, this is a real problem. It seems to be presenting indexing issues for older posts on my blog.

@JBeemer were you ever able to find a workaround here?

This default behavior goes against the guidelines that Google has put out re: pagination with paramaterized URLs from this page. It would be nice to get a little more of an explanation from Webflow rather than just a “nope.”

1 Like

Maybe you are mistaking me for someone that works for Webflow. I don’t. Truth is they don’t support what you desire at this time. Would be nice if they did.

1 Like

@Andrew_Bankson it’s not ideal but it’s what Webflow gives us.
You may be able to override it using javascript, since Google is good at processing JS as a part of its page parsing process.

That said it should not affect your blog post indexing.
Use the automated sitemap and register it in GSC to make your blog post pages as accessible as possible.

I’ve not seen any problems yet, and I have several blogs > 1000 articles. They’re well SEO’d.

1 Like

Here’s how to JavaScript it in… not sure if it’s effective yet. Taken from a Webflow wishlist page.

SEO friendly pagination (correct canonical URL)

The confusing/curious thing, is that ALL of my various collection item pages are in the sitemaps.xml file. However, the pagination pages are not.

And at least in an ahrefs.com crawl, it sees these pages as orphans that aren’t linked to anywhere. Because the pagination pages refer to a different canonical URL and aren’t index. Google seems to be a little smarter about it.

But I’m having different issues on Google. More of it just refusing to index pages that it has discovered. So I’m manually requesting indexing of at least category pages to help hopefully with it refusing to index discovered/crawled pages.

I’m wondering if each pagination page should also be added to the sitemaps.xml page. That would be a nice feature. Can’t see why it would hurt.

In Webflow, there isn’t such a thing as a paginated page- there are only paginated collection lists, and you can have up to 20 of those on a page. The URL querystring params you see are just a convenience for navigation; you can link directly to Page X, with list #3 on page 8 and list #2 on page 12 if you want to. They don’t really serve any SEO purpose.

The content we want Google to index is primarily on the collection item pages themselves, so Webflow highlights these in the sitemap.xml, along with all of the static pages, to make them extra-accessible to Googlebot.

Yes this confuses AHrefs. No this is not a problem.

There’s no real benefit;

  • 90% of the content on that page is likely redundant.
  • The pages you really want Google to direct people to are the content pages, so why clutter the SERPs with less valuable pages
  • The collection list ordering will change as you add new items, or as date filters apply, so what Google indexes often won’t relate to what people see when they click the link.

But most importantly, the math doesn’t work.

Let’s say you have 20 paginated collection lists on a page, each set to show 1 item per page. Each points to a collection containing only 100 items.

The number of pagination URL permutations you have is;
100^20 = 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

You really don’t want that many extra URLs in you sitemap.

@memetican So by that, is it necessary to include the script from SEO friendly pagination (correct canonical URL) since paginated URLs in Webflow are primarily for navigation convenience rather than serving an SEO purpose? Considering the potentially large number of pagination URL permutations that could be generated for multiple paginated collection lists on a single page, it makes sense to avoid including all of those URLs in the sitemap. Including such a vast number of URLs could lead to bloating the sitemap and may not provide any significant SEO benefit.

From my ahrefs.com crawl log, I got over 20 URLs from paginated pages as canonical issues. Would it make sense to use the script provided on the paginated pages?

Bonzer has written an interesting script, and I understand his thinking…
But I wouldn’t use it.

Here’s the approach I use with Webflow;

  • Design your SEO value into the Collection Pages, not the Collection Lists. That means that any SEO-rich content, even things like CMS-stored customer testimonials gets its own Collection Page.
  • Use Webflow’s canonical URL feature to direct all attention to the “page 1” of your collection list-containing pages, and ignore the rest.
  • Avoid trying to SEO paginated collection list pages, they are not where the value is, and you do not want noise in your SERPs. Direct all attention to the individual item pages where the content value is.
  • Keep generating content. Blog posts, news & press releases, customer testimonials, new products and variants…

This results in excellent SEO, and all-green in Google Search Console.

You don’t need to worry about AHREFs, it’s designed to worry about everything, including things that you don’t need to worry about.

Look at it this way-

In most sites, I focus on content generation, which means that if Google indexes ?kl2j343_page=3 of some page, the content there will likely have changed by the time someone clicks that link. I’d much rather they see and click the Collection Page search result instead, which actually contains the content they are looking for.

That’s a far better customer experience and results in higher conversions.

Sticking with Webflow’s designed approach also mitigates any duplicate content penalty risk.

Let’s say you have a blog post page, but that page has a “more articles” section at the bottom, which is a paginated collection list. You set pagination to 10 items, but you have 500 blog articles on your site. That’s 50 paginated pages.

If you give those pages distinct canonicals, then from Google’s perspective, you are asking them to index 51 copies of the exact same blog post, with some very minor content variations at the bottom of the page.

Thank you @memetican for your detailed input here. I almost got it. But please allow me to ask a few questions to clarify some points. I was a bit confused here.

Regarding the use of Webflow’s canonical URL feature to direct attention to “page 1” of collection list-containing pages, are you referring to setting the “Global canonical tag URL” which is the same as the default domain? Or do you recommend writing a script for each CMS page to handle canonical URLs individually?

You mentioned avoiding SEO optimization for paginated collection list pages. Should I consider “excluding these pages from site search results”? For instance, I have a static “news” page that contains a paginated news collection list. What would be your suggestion in this scenario?

Thanks in advance.

Keep it simple.

  1. Set your global canonical to your default domain e.g. https://www.mysite.com, no ending slash.
  2. That’s it. Publish. You’re done.

I essentially always use Webflow’s built-in canonical Url feature, it works great and makes life far simpler.

No, it works best if you leave things as is. Just don’t try to force Google to index e.g. page=2, simply let Webflow’s canonical prioritize page 1 as it already does.

If your news page is at e.g. /news, and you global canonical setting is https://www.mysite.com than Webflow auto generates a canonical on that page of;

https://www.mysite.com/news, and that’s perfect.

It does the same for all of the paginated versions, pointing them back to;
https://www.mysite.com/news, and that’s also perfect.

Best of all worlds.

99.9% of the time, messing with this will hurt you more than help you, but of course there are always edge case exceptions with weird site designs and complex SEO configs.

2 Likes

I’ve come across this thread as Screaming From is also reporting this “issue”. However, if you think about it logically it’s not an issue at all and how Webflow handles the non-indexing and canonicalisation is correct.

The ?page= parameter is canonicalised to the main CMS page, right. This MUST be better, as otherwise each paginated page will be duplicate content. And it’s used purely for a functional reason.

Technically, crawls like Screaming Frog and Ahrefs will pick these up as unique pages because the URL is different, but we don’t want them indexed. The post that the paginated URLs link to all work and aren’t orphaned. Also, the posts are the pages we want indexed (not the top level CMS page).

This is why tools like Ahrefs offer the ability to filter out pages so they are NOT considered in th app crawl. In Ahrefs crawl settings there is a option to add a regular expression and tell it what not to crawl. Try this regular expression in the “don’t crawl” setting:

.?page=.*
OR
\?.*page=.* (this is for the URLs with a random set of characters after the ? before the page= bit)

Obviously adjust the RegEx according to your needs…

Then, when you crawl these pages that Webflow has handled correctly won’t be flagged in your crawl. However, Ahref’s might report post pages listed in the sitemap that have no incoming internal links (we’ve told it not to crawl the paginated pages that link to them, remember). Because you know this isn’t an error and you want the posts to be crawled to be evaluated, you can turn this off in the project, see screengrab:

Steve

1 Like

this is just not true, google changes their algo all the time and seems to giving more and more weight to the data they get from their crawl, not so much what you want them to do but what they find… So the sitemap is just seen by them as a suggestion, they do not need to honor all the links in it.

What Andrew saying is true, this is an issue, it’s kind of lame that webflow isn’t at least looking through these forum posts for performance tweaks that are easy wins for ALL of their clients.

Hey Brent, read through Google’s guidance, the same link Andrew shared. It’s guidance for indexing the paginated pages themselves.

Indexing those pages makes sense in a situation where;

  • You have paginated set of unique items, like a comment feed
  • That content doesn’t exist anywhere else except in the paginated list
  • The entire page is solely about that content, e.g. page 1 and page 2 have no duplicate content

That’s not the feature Webflow has built. Webflow has paginated collection lists rather than paginated pages. You cannot take a long-form article page, and tell Webflow to make it 3 pages. Instead you can have up to 20 collection lists on the page each of which can be filtered to show a different paging.

Yes it’s referred to as “pagination” but it’s designed as a filtered-view feature- same page, with one part changed. If you look at my article on the topic, you’ll see why the math of trying to index that filtered URL doesn’t work, because the URL volume effectively grows exponentially with the number of collection lists.

If you’re Webflow, how do you give designers pagination support at the collection list level, without nuking the system with a million useless pages and a 5GB sitemap?
You design it as a filter query.

Instead the SEO approach is;

  • Design collection-list-containing pages to index as one page, with one canonical
  • Avoid indexing variations of that page, which have lists filtered to page 2 ( they get the same primary canonical, on purpose )
  • Build out your collection pages ( product pages, news articles, etc ) for any content you want indexed, because that’s a far more effective primary source - page titles, metas, etc. all match, and meet Google’s
  • Both the sitemap and the collection list point to those collection pages, you can see they get picked up in GSC immediately

I’ve had great SEO success with this approach for the past 6 years on WF, and even though I can easily change those canonicals and sitemap using our reverse proxy layer, there’s never been a reason to. It works great as-designed.

I find it interesting too, a lot of the work I do is specialized for dedicated SEO agencies, and none of them want to change how pagination works. Not even one. We focus instead on things like core web vitals and flexible semantic paths for collection hierarchies.

I think overall you’ll be happier if you understand that pagination is a UX convenience, not a content delivery mechanism. Use your collection pages for the content, you’ll get much better results.

1 Like