Bing's MSNBot Crawling Fake File Names?

Dec 28, 2009 • 8:41 am | comments (6) by twitter Google+ | Filed Under Bing Search
 

A WebmasterWorld thread and an older Bing Forums thread has discussion from webmasters over the issue of Microsoft Bing's web crawler, MSNBot, crawling file names that do not exist on a specific site.

This reminders me of the ongoing issue of Bing creating fake referrals in webmaster log files. This has been going on for years, where Microsoft claims they have fixed it, but never really has.

In this specific case, it seems like Bing is creating file names on a specific site to crawl. Wel, they are not creating files, just trying to fetch pages that do not and never have existed on a specific site. I am not sure if this is a Bing issue or a webmaster issue.

A long time WebmasterWorld member explained the issue:

In what is apparently a rather old bad behavior, msnbot has a practice of regularly requesting totally manufactured URIs that appear to be designed to trigger 404 errors. Here are two sample log entries of the two styles of bogus URIs msnbot requests:

'65.55.207.126'¦Tue, 15 Dec 2009 20:39:49 -0500¦'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'¦'*/*'¦'/ADBF3C7AB534E8356F30D8AC05291640_00000.temp019f.html'¦'' '65.55.207.28'¦Wed, 16 Dec 2009 05:46:22 -0500¦'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'¦'*/*'¦'/000166709_00001.temp00be.html'¦''

The requests ALWAYS take on one of the formats above starting with either a 32byte GUID or a nine digit integer.

In the Bing thread, another person said:

For many many years, msnbot has been crawling my sites looking for files that have never existed... i'm trying to figure out why... the filenames have changed slightly in recent times but they have been similar in structure since the beginning... they are something like 000092601_00002.temp0001.htm... in other words, 9 numbers underscore 5 numbers dot temp 4 numbers dot htm... the search for these is all over my server's directory tree...

I'll emphasize once more that these files have never existed on my site and i have no clue how msnbot may have picked them up...

Honestly, I feel bad that I am always beating up on Microsoft. I know they are new to the game, when you compare them to Google. But I have to report these issues.

Forum discussion at WebmasterWorld & Bing Forums.

Previous story: 60% of U.S. Government's Data on Google Servers? Nope
 

Comments:

Benj Arriola

12/28/2009 02:44 pm

I have seen Google do this also. But only when verifying account ownership on Google Webmaster Tools and if your 404 page is broken, like the status is 200, GWT will call this out saying your site cannot be verified because you don't have properly working 404 pages.

David McDougal

12/28/2009 08:25 pm

I am seeing the same thing. I am having files that existed 3-4 years ago but have not been on my server for 2 plus years suddenly reappear in the index, and files with query values that have zero possibility of ever being on my website show in the index. The bad part, is that I cannot get the recent correct files to appear any longer. It is almost like Bing has hit some magical file number value for my site, and even though they are messing up their results with fake and deleted files I am paying the penalty.

kelvin newman

12/29/2009 08:50 am

Very strange behaviour, I've always checked to see if 404 are working when doing a check on a site for SEO, was doing that mainly to avoid duplication though. In this case it's not indexing those pages it's crawling is it?

Alistair Lattimore

12/29/2009 01:03 pm

For those that have the information, are they always requesting the same filenames over and over per site or are they changing? Do you think this might be a 'quality' check for a well configured server by deliberately trying to generate a 404 error?

Robert Damian Mauro

01/18/2011 10:33 am

Just a quick note on "new to the game" - if you do some digging, you will see that they've been at this for over 8 years (which coincidentally, is how long some of their bot's problems have existed).

Radhika V

05/15/2013 05:54 pm

Recently since few months same thing is happeing with my web site. Mostly 'msnbot' some random IPs and from 'Google.com/search' coming to these fake urls on my site. These URLs are noway related to my site. Actually I posted a thread in Google webmaster help forums. But nobody had answered...

blog comments powered by Disqus