Streaming live at 10am (PST)

No-index an image not possible?

Has anyone used noindex to hide an image from Google? I’ve used it in the past to hide pages but a client would like to specifically hide an image used on a page. Looking at Google’s help on the subject, I should use:

User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

But, I don’t think I can disallow any images using this method because they are running under a different domain to my robots.txt - (https://assets-global.website-files.com). The robots.txt implies the root domain and that image is running off?

1 Like

Does anyone know if this is possible please?

It is not possible. If it were, I or anyone could stop Google from indexing files on anyone’s else’s site.

The solution for this one file, is to self host it, and use a robots.txt file on that host to exclude it.

Whoa, I just came across this problem as well. A client messaged that their private pricing PDF is appearing in search results. None of our public webflow pages are linking to it, only a password-protected page used by exclusive clients.

The private page does not appear in results, however that private PDF is appearing through the https://assets-global.website-files.com domain where we cannot robot-exclude it.

We can’t even ask Google to remove the search result through GSC, since it’s not on our domain.

We didn’t realize that webflow hosted assets ( even private & un-linked ones ) were being exposed to Google indexing directly through a different domain. I can’t imagine why since it doesn’t even boost SEO for Webflow clients?

Has anyone found a way to prevent this, or to remove these files?
Webflow support request?

Webflow assets are only indexed if they are linked from a page (any page, not just ones you control) that are crawlable.

Since you can’t control access to files uploaded to assets, at this time it should not be used for sensitive / private file storage.

You can delete an asset but you will need to contact support to have it removed as deleting does not automatically remove them.

Thanks Jeff, we’re not understanding why these files were indexed, when they were only ever linked from a password-protected page.

Is content on Webflow’s password-protected pages indexable by Google?

But in addition to the security problem here, I’m quite shocked to discover the SEO gap. It could explain why my Webflow sites are getting lower traffic than I’m used to. I’ve always found images and PDFs to be an important part of the SEO and traffic to a site, up to 30%, and far more for portfolio & educational sites that contain a lot of image / PDF content.

I’m pretty shocked that Webflow is delivering those through alternative URLs, which means that we’re investing in content that is drawing people AWAY from our site, to website-files.com instead.

Is there a configuration setting or some way to fix this?

Googlebot (any bot actually) won’t index a page that is password protected since there is no site specific content. Look at the source on a password protected page to see what bots see. But any page that is indexed (your site or another) that links directly to the asset can cause the crawl. That you can’t stop since the asset is public.

As do I. I am a long time photographer and for my work I refuse to use Webflow as my host since I would inadvertently loose all control of my assets.

Nope. You can’t change the way Webflow uses assets. You can use third party CDN’s that allow for referrer restrictions but since external images are only available via custom code it makes it very difficult to do, and renderes editing of them, for editors, impossible.

For my clients that have large amounts of images that need access control or SEO, I use alternate platforms / solutions.

Thanks Jeff, that’s very helpful.

I’m still trying to determine how the PDF was indexed so that I can avoid it happening again. If Webflow’s passwords blog Googlebot then that leaves two possibilities;

  1. Really bad timing. Perhaps I uploaded the PDF and put a link in my test page before I moved it into the password-protected folder. If Google just happened to traverse my sitemap in that brief gap, it would have picked up the (https://assets-global.website-files.com ) URL and then held onto it, since that link is not secured at all by Webflow’s password feature. I didn’t realize assets were stored off-site, and un-protected so I wouldn’t have been especially cautious.
  2. Or, assets-global.website-files.com is being indexed separately from the website, and everything there is getting picked up.

I’ll run some tests just to reassure myself.

It does sound very difficult to fix the SEO issue, even doing script link-replacements and a hosting media on a CDN via a subdomain ( e.g. http://photos.myportfoliosite.com ) would lead to all kinds of issues in making the designer usable, and creating the right SEO links back to your main site. For non-secure content, I’ll look at the API and see if I can file-sync those images somewhere, and do a URL fixup.

I was always puzzled why, after I migrate a client’s site to Webflow, I see about a 20% - 30% drop in traffic in the organic Analytics. Same domain. Same content. Better organized. Suddenly, fewer visitors in the logs. I suspect it’s the images.

I’m currently dealing with an issue that even after deleting the PDF and republishing my site, it’s still on assets-global so Google refuses to honor my remove-from-index request.

For the sensitive PDFs, I think the best approach is to share them on a per-user basis from Google Drive. It creates some added work for my clients, and it’s not as sexy from a UI standpoint, but there, security wins.