Google Spidering Encoded HTML in Urls?

Apr 24, 2008 • 2:13 pm | comments (4) by twitter Google+ | Filed Under Other Google Topics
 

Maybe so, just a spidering glitch, weird links, or sitemap error? I was searching this morning in Google doing some tests on Google's new Whois feature. When I plugged in a domain what popped up in the first page of the results was a weird URL including encoded ampersand and other characters in front of the shown URL for the website (aboutus.org). Once I clicked on the link I got a 404 error.

Screenshot: google spidering encoded urls

Link to Result.

Thoughts? Comments?

Previous story: Yahoo's Q1 Profits: $542.2 Million
 

Comments:

Brian Mark

04/24/2008 09:38 pm

Looks like someone fubar'd an href and didn't encode things properly. That's supposed to be a br tag (yeah, a line break), but it ended up being part of the URL.

John Honeck

04/24/2008 10:25 pm

Try cutting and pasting the URL into your browser: <br>www.aboutus.org/EasyJournal.com

Sam Daams

04/25/2008 07:41 am

Haha, nice find John :) This is certainly weird. Google seems to sometimes parse an entire website with mistakes in the url. For example someone was linking to one of our sites a while ago with a space between http:// and www . In most browsers that kicks up an error when clicking on it (some actually handle it!). But Google had crawled that one link and then based off of it crawled an insane number of pages all with the space in the url. Lots of these were then indexed, even though they were clearly dup content. Users clicking on the links nearly always got a browser error (except those using the browsers that managed to handle it). There's potential there for some serious damage to a competitors site. Pump through some links with spaces in the url to competitive pages and then watch as google drops the regular link for the one with the space which users can't actually use..... nice!

Monique

07/02/2008 11:34 pm

Hi Ben, Good catch and great article! I suppose also whoever is uploading those excel sheets of destination URLS is half a sleep??

blog comments powered by Disqus