Allow web crawling for compliance archiving

I need to allow a third party company to crawl my site every day for legal compliance archiving reasons.

The crawl from this US based vendor appears to be blocked by some security mechanism in webflow. We just migrated the site from wordpress to webflow, and now the compliance crawls are blocked (they worked fine in wordpress).

How can I whitelist this vendor IP or user agent to crawl my site?

Interesting problem. A few thoughts-

  • Webflow support is the only one who could actually adjust access to your site, if such a thing is even possible on an individual site level. Contact them to see what’s possible.
  • Failing that, I’d check the ToS to make certain your use is acceptable. Assuming this is a technical issue only, you could resolve this using a reverse proxy setup.
  • If permitted ( again ToS ), the archiver should be able to pull using a VPN.
  • Failing that, it’s your site, and Webflow’s restrictions apply only to sites hosted on their own servers. You could export your site and self-host. Obv. not ideal if you use the CMS or have other hosting dependencies, but it’s a certain way to solve the compliance-archival issue.

Historically scraping has been a sensitive issue for Webflow, so they are quite protective of it.

Webflow doesn’t technically allow it. If you read through their terms they specifically mention it. In the past, they’ve kicked an app off the platform for doing just this.

A better solution is to provide them an API key to your CMS and allow them to query that.

If that doesn’t work, your next best bet is to design in Webflow, export, and host elsewhere.

Thx for the response.
using a VPN is not an option. This is a SaaS company and don’t support custom VPN solutions.
Webflow allows crawling by google and others, It seems odd that other legit crawlers can’t be whitelisted.
So it sounds like you are suggesting a call to Support?

I had a similar requirement and overcame the issue by deploying a reverse proxy and instructed the scan company to hit the mirror. Webflow allows for reverse proxies as evidenced by the recommendation by the Enterprise team on how to cobble together multiple projects. If you need professional assistance just reach out.

Have you asked them? That would be unusual for a oompliance auditor, since they often use VPN to access IP-restricted intranets and extranets.

Failing that, you can try to convince Webflow to whitelist them, or find out why they’ve been blocked, or just skip all that and use the reverse proxy approach.

You’ll likely achieve your goal faster with the VPN, but the configuration must be very specific. I’d highly recommend you work with Jeff on that.

Thx. Going to look into the reverse proxy option now