Google News Can't Index Articles With Too Much HTML Formatting

Feb 23, 2011 - 8:59 am 3 by

Google News IconI spotted an interesting thread at the Google News Help forum where one site was complaining their articles weren't being included in Google News and Google replied the reason was because some of the formatting tags weren't recognized.

What is interesting is that the specific tags called out by Google as the issue were standard paragraph break tags.

Harvey P. from Google said:

In reviewing your site, I found a couple of things that may be preventing our crawler from indexing your articles. In the HTML code of article pages, you use many formatting tags such as <p> and <br> that may cause problems for our crawler. Removing frequent use of these tags may help our system better identify and index your articles.

I looked at the site in question and picked a random article and it didn't seem out of the ordinary. The code, including the <p> and <br> used throughout the body content, didn't seem atypical.

click for full size

So I am not sure if there was a specific article that had too much HTML formatting in it?

We do get errors on some of our articles, specifically the daily recap posts. Specifically, the error we get is Article fragmented which means:

The article body that we extracted from the HTML page appears to consist of isolated sentences not grouped together into paragraphs. We generated this error to avoid including what might be an incorrect piece of text.


* Try formatting your articles into text paragraphs of a few sentences each.
* Make sure your sentences are well punctuated.
* Make sure you don't use frequent <p> and <br> tags within your paragraphs, and try to avoid breaking up the article body in general.
* Consider removing some of the non-article text from the article page.

So I suspect there is a specific form of articles that are not properly structured in which Harvey is responding to.

Forum discussion at Google News Help.


Popular Categories

The Pulse of the search community


Search Video Recaps

Google Core Update Volatility, Helpful Content Update Gone, Dangerous Search Results &amp; Ads Confusion - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Search Video Recaps

Search News Buzz Video Recap: Google Core Update Volatility, Helpful Content Update Gone, Dangerous Google Search Results & Google Ads Confusion

Apr 12, 2024 - 8:01 am
Google Search Engine Optimization

Google: Indexing & Algorithm Updates Are Independent

Apr 12, 2024 - 7:51 am
Google Search Engine Optimization

Google Structured Data Carousels Beta Docs Clarifies Feature Availability & Markup Location

Apr 12, 2024 - 7:41 am

Google: Ranking In Shopping, Images & Other Verticals Doesn't Hurt Your Web Rankings

Apr 12, 2024 - 7:31 am

Google Knowledge Panels - Mentioned People

Apr 12, 2024 - 7:21 am
Google Maps

Google Maps Suggest An Edit Flow Updated

Apr 12, 2024 - 7:11 am
Previous Story: Should You Host Images On Your Domain or Flickr For Best Traffic Potential?