301 Redirect Issues with wildcards for migrated WordPress image page links

I’m running into a frustrating issue with redirects after migrating a site from WordPress to Webflow. The old WordPress site had thousands of articles and then image links — each with their own dedicated pages, and I’ve attempted to set up dynamic redirects to preserve SEO value.

All article links like these work fine
https://www.eta.co.uk/2016/09/16/crua-hybrid
redirects to
https://www.eta.co.uk/news/crua-hybrid

But then the image attachments like
https://www.eta.co.uk/2016/09/16/crua-hybrid/crua-hybrid-tent
should be redirecting to
https://www.eta.co.uk/news/crua-hybrid
as well, but sometimes does, and sometimes doesn’t.
In this case it goes to
https://www.eta.co.uk/news/crua-hybrid-tent
which results in a 404 as it is the ‘image’ part of the slug.

These are my setups and findings:

Example redirects added
/2010/(.*)/(.*)/(.*)/(.*)/news/%3
/20(.*)/(.*)/(.*)/(.*)/(.*)/news/%4
these were added after the main article redirects that seems to work fine
/20(.*)/(.*)/(.*)/(.*)/news/%4

This works as expected for some cases:

  • /2010/09/16/target/miss/news/target
  • /2010/01/18/correct1/incorrect2/news/correct1

But it fails in other cases:

  • /2010/09/16/yes/no/news/no
  • /2010/01/18/correct-1/incorrect-2/news/incorrect-2

Strangely, this works:

  • /2010/01/18/correct-link-1/incorrect-2/news/correct-link-1

But this, which is only slightly different does not:

  • /2010/01/18/correct-1/incorrect-link-2/news/incorrect-link-2

I also tried using broader patterns to catch all years like this:

  • /19(.*)/(.*)/(.*)/(.*)/(.*)/news/%4
  • /20(.*)/(.*)/(.*)/(.*)/(.*)/news/%4

However, some years work while others don’t. For example:

  • /2000/01/18/correct-1/incorrect-link-2/news/correct-1 (works)
  • /2001/01/18/correct-1/incorrect-link-2/news/correct-1 (works)
  • /2003/01/18/correct-1/incorrect-link-2/news/incorrect-link-2 (2003 doesn’t work)

Is there something about Webflow’s redirect system or regex handling that explains this behaviour? I gather that it has something to do with the ‘-’ part of the links, but really can’t explain how some links even with ‘-’ or years work and others not.

Have also found, that even when using new incognito tabs in chrome and safari, that the same link might be a hit in one and a miss in the other.

We can’t add invidual redirects as there are thousands of these links and we are already on 1400 redirects which is over the recommended 1000. The migration has really been a bit of a 404 nightmare because of this issue.

Hey Nic,

I couldn’t quite wade through your details to identify exactly what you’re trying to do or exactly where you’re having problems, but these notes might help.

Hyphens need to be escaped, see the docs here. I thought that was only needed in old paths that contain the wildcard construction (.*) but reading the docs and seeing your examples, it looks like it may well be required on all hyphens and other special characters listed in the docs.

That seems unlikely but not not out of the question. Incognito might disable 301 redirect caching, and it’s possible that the browsers encode the URLs differently- which might affect how the redirects match.

When the hyphen is un-escaped, that might match fine with certain URL encodings and not match with others.

Normally, in a site migration, I prepare the full redirect map in Google sheets. It’s much easier to make adjustments, and bulk-import the CSV. The 1,000 is not a hard limit and it sounds like your image-url constructions could be wildcarded as a second entry.

In heavy-duty cases, I’d move the DNS to cloudflare, configure it as a reverse proxy and then do the redirects either in cloudflare redirects or in a worker. Gives you a lot more control.

Thanks for your detailed reply @memetican
I think will follow up on the cloudflare route if I don’t manage. I just think it should really be able to handle something like this natively in webflow.

Have done some extensive testing in google sheets, using this variations of this very useful script to check the redirects and status codes.

I do have to say that I think there is definitely something not working as it is supposed to though.

An example is this where the goal is to get tot the title and not the attachment…

https://www.eta.co.uk/1998/09/09/this-is-a-longer-title/attachment-title-three
and
https://www.eta.co.uk/1998/12/12/this-is-a-longer-title/attachment-title-three

Where literally it is just the 09 and 09 is swopped out for 12 and 12, causes the one to pass and the other one to fail.

Strangely the behaviour on those are the inverse in Arc browser in comparison to Safari and Chrome.

Without the required escaping of special characters, it’s anyone’s guess how the matching algorithm would process the string against the browser’s requested URL.

You’ll really need to follow Webflow’s protocol in order for your redirect URLs to work. If you did that and could still prove they’re not working, then that’s a great time to message support.

That doesn’t surprise me at all. Pattern matching isn’t generally character-by-character whe way you’re looking at it, and the missing escape characters could cause it to use a different algorithm / regex altogether.

That also doesn’t surprise me. Consider when you request;

https://www.mysite.com/blog/agent forty-two

Browsers can correctly present that request to the server as any number of valid encoding variations, here are a few;

https://www.mysite.com/blog/agent%20forty-two
https://www.mysite.com/blog/agent+forty-two
https://www.mysite.com/blog/agent%20forty%2Dtwo
https://www.mysite.com/blog/agent+forty%2Dtwo

Same request, different strings. Now pattern match it with no normalization step, and an invalid pattern matching string.

I’d fix your patterns and see if you can reproduce any weirdness across browsers. If you can, definitely contact support on that.

The bigger problem here I think is that it’s difficult to know when you have an invalid pattern. The docs feel slightly vague in the wording of some important points, and there is no validation in the UX to tell you e.g. your Old path of /some-thing is invalid when un-escaped. That would be helpful.

Also, since the redirects use an unusual / proprietary approach here for the pattern matching, there should ideally be a tools for testing and evaluating the patterns client site, so that we can work out the correct pattern before pushing it to production.