Google News Can't Index Articles With Too Much HTML Formatting

Feb 23, 2011 - 8:59 am 3 by

Google News IconI spotted an interesting thread at the Google News Help forum where one site was complaining their articles weren't being included in Google News and Google replied the reason was because some of the formatting tags weren't recognized.

What is interesting is that the specific tags called out by Google as the issue were standard paragraph break tags.

Harvey P. from Google said:

In reviewing your site, I found a couple of things that may be preventing our crawler from indexing your articles. In the HTML code of article pages, you use many formatting tags such as <p> and <br> that may cause problems for our crawler. Removing frequent use of these tags may help our system better identify and index your articles.

I looked at the site in question and picked a random article and it didn't seem out of the ordinary. The code, including the <p> and <br> used throughout the body content, didn't seem atypical.

click for full size

So I am not sure if there was a specific article that had too much HTML formatting in it?

We do get errors on some of our articles, specifically the daily recap posts. Specifically, the error we get is Article fragmented which means:

The article body that we extracted from the HTML page appears to consist of isolated sentences not grouped together into paragraphs. We generated this error to avoid including what might be an incorrect piece of text.

Recommendations

* Try formatting your articles into text paragraphs of a few sentences each.
* Make sure your sentences are well punctuated.
* Make sure you don't use frequent <p> and <br> tags within your paragraphs, and try to avoid breaking up the article body in general.
* Consider removing some of the non-article text from the article page.

So I suspect there is a specific form of articles that are not properly structured in which Harvey is responding to.

Forum discussion at Google News Help.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Google March Core Update Done, HCU Recoveries, Site Reputation Abuse &amp; AI Topics - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Forum Recap

Daily Search Forum Recap: May 9, 2024

May 9, 2024 - 4:00 pm
Google Updates

Google Search Ranking Update Volatility Starting On May 9th

May 9, 2024 - 7:51 am
Google

Google Renames AI Answer Back To AI Overview

May 9, 2024 - 7:41 am
Google

Google Tests New Search Notes Button

May 9, 2024 - 7:31 am
Google Ads

Google Analytics Gains Google Ads Conversion Performance Beta

May 9, 2024 - 7:21 am
Google

Sundar Pichai, Google's CEO, Responds To Google Search Quality Issues

May 9, 2024 - 7:11 am
Previous Story: Should You Host Images On Your Domain or Flickr For Best Traffic Potential?