Thanks Jeff, that’s very helpful.
I’m still trying to determine how the PDF was indexed so that I can avoid it happening again. If Webflow’s passwords blog Googlebot then that leaves two possibilities;
- Really bad timing. Perhaps I uploaded the PDF and put a link in my test page before I moved it into the password-protected folder. If Google just happened to traverse my sitemap in that brief gap, it would have picked up the (https://assets-global.website-files.com ) URL and then held onto it, since that link is not secured at all by Webflow’s password feature. I didn’t realize assets were stored off-site, and un-protected so I wouldn’t have been especially cautious.
- Or, assets-global.website-files.com is being indexed separately from the website, and everything there is getting picked up.
I’ll run some tests just to reassure myself.
It does sound very difficult to fix the SEO issue, even doing script link-replacements and a hosting media on a CDN via a subdomain ( e.g. http://photos.myportfoliosite.com ) would lead to all kinds of issues in making the designer usable, and creating the right SEO links back to your main site. For non-secure content, I’ll look at the API and see if I can file-sync those images somewhere, and do a URL fixup.
I was always puzzled why, after I migrate a client’s site to Webflow, I see about a 20% - 30% drop in traffic in the organic Analytics. Same domain. Same content. Better organized. Suddenly, fewer visitors in the logs. I suspect it’s the images.
I’m currently dealing with an issue that even after deleting the PDF and republishing my site, it’s still on assets-global so Google refuses to honor my remove-from-index request.
For the sensitive PDFs, I think the best approach is to share them on a per-user basis from Google Drive. It creates some added work for my clients, and it’s not as sexy from a UI standpoint, but there, security wins.