Google: Our Paywalled & Subscription Structured Data Method Is Not Leaky

Jan 22, 2024 - 7:51 am 0 by
Filed Under Google

Google Document Leaking Colors

Google's flexible sampling solution that replaced the first-click-free solution for gated, subscription or paywalled content launched in 2017. Since then, many publishers use the paywall structured data to communicate to Google the full content that is behind the content gate. Some are calling this solution "leaky" in which Google responded saying it is not.

Ryan Singel, a journalist covering tech business, tech policy, civil liberty and privacy issues, who has written at Wired and many other respected publications, posted a comment on this site calling this Google solution "leaky." He said:

Google Search and Google News are stuck in the past when it comes to these. It's crawler assumes that paywalled or reg walled content is still going to be in the HTML that Google crawler will see. In other words, it demands leaky bad tech from sites with paywalled or registration required content. It'd be great if it fixed that instead of sending Danny Sullivan out to lecture sites about their markup with directions that don't work for a smart, modern, non-leaky publishing system.

Danny Sullivan, Google's Search Liaison, then responded to that comment on this blog and on X and on Mastodon saying it is not leaky. Here is Danny's response from this blog:

Our system is looking to be shown the full content, if a publisher wants to do that. If they do, we understand more about it. If we understand more, then we might be able to show it for more queries where it's relevant. This doesn't involve using JS to somehow "hide" the content from people who aren't our crawler or anything like that.

Basically, you see our crawler, you show us the full content. And only us. And if you're worried that someone is pretending to be us, then you check our publicly shared IP addresses.

Next, you markup the page so we know what's paywalled / gated content so that we -- and only we are seeing this full content -- also know you aren't trying to cloak us by targeting our crawler specifically. Since only we are seeing this, there's nothing "leaky" as you are suggesting. Here's the doc.

Where the "leaky" stuff tends to come in is someone might search with us, then click on the cached copy of a page to see the full thing we saw. And if that's a concern, our guidance is to block the cached copy -- covered in the docs.

I hope that helps explain this more. If I'm missing something, or you have other suggestions, honestly very happy to hear them. I found Outpost and emailed both the info and press addresses, so look for that, happy to continue the conversation.

Sullivan also posted on X, saying:

I mentioned paywall and gated content in my tweet not as some type of lecture but guidance because it's something any publisher doing gated content might want to understand.

Gated content isn't something that our crawler can see, unless publishers let us in. If they do, we can better understand the full content they have. In turn, that might help us surface their content for relevant queries.

There's nothing "leaky" about this. That seems to be a suggestion that if someone lets us in, anyone can get in. That's not the case. We can be specifically allowed in. If someone is concerned that makes cached content available, they can also block us showing cached content.

This is all documented and hasn't changed for ages.

He seems to be involved in a company that provides registration systems, I think, to publications? Including the publication I was responding to? I'll reach out to his site to see if there are other suggestions on what we might do to help publishers with paywall / gated content issues. We're always open to that.

Some replied to that saying that you, a user, can change their user agent to a Googlebot. But technically, if you do the Googlebot IP verification method, you can block those attempts:

And let's not forget that Google does label content served through flexible sampling or that has a paywall requirement. I get complaints from my readers when I link to articles and do not mention there is a content gate on it. I mean, a label would be nice from Google, so at least you know before you click. But that is for a different story.

It use to be way easier to access gated content under the first-click-free program. It is much harder to do that now under flexible sampling. But technically, anything plugged into the internet can, in some way, be accessed. Some are harder than others...

Forum discussion at X and Mastodon.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google Updates

Google March 2024 Core Update Finished April 19th (A Week Ago)

Apr 26, 2024 - 4:40 pm
Search Forum Recap

Daily Search Forum Recap: April 26, 2024

Apr 26, 2024 - 4:00 pm
Search Video Recaps

Search News Buzz Video Recap: Google Core Update Updates, Site Reputation Abuse Coming, Links, Ads & More

Apr 26, 2024 - 8:01 am
Google Search Engine Optimization

Google Publisher Center No Longer Allows Adding Publications

Apr 26, 2024 - 7:51 am
Google

Google Tests Placing The Snippet Date Next To URL

Apr 26, 2024 - 7:41 am
Google

Google Breaks Out Googlebot IP Ranges For User-Triggered Fetchers

Apr 26, 2024 - 7:31 am
Previous Story: Google Drops Thousands Of Search Quality Raters In Latest Cuts