Bulk Submit 2.0

Dec 5, 2006 • 12:21 pm | comments (0) by twitter Google+ | Filed Under Search Engine Strategies 2006 Chicago
 

We seem to be starting a bit late... Some technical difficulties with Google's presentation. I offered my VGA to HDMI cable for Amanda's Mac to connect to the projector. Now they can begin. I seriously wonder where the world would be without me, just kidding..

Danny explains the bulk submit 1.0 was where you submitted URLs in the add URL forms. They then added site inclusion programs, but those also went away. Now we have 2.0 with Google Sitemaps, Site Explorer. Danny then said it was announced at PubCon so Brett would be happy, a standard sitemaps protocol for all the engines.

Amanda Camp from Google is first up, with her Mac. She works as a software engineer at Google, works with Sitemaps. She is only going to talk about Sitemaps and not add URL form or other forms. Sitemaps is the current way to tell Google about your pages via submission. It helps them find new urls faster and helps them be smarter about the way they crawl. Of course, what you give Google is just hints and they wont rely on it 100%. Sitemaps is 4 different formats; (1) Text File, (2) RSS/Atom feed, (3) Sitemap protocol and (4) OAI-PMH (open archives initiative protocol for metadata harvesting). A simple HTML sitemap is not a Google Sitemap. Sitemap rules: always submit full URL and remove unnecessary parameters from the URLs. Sitemaps should be placed in the highest directory of the URLs you are submitting. You have to make sure the path is an exact match, i.e. http vs https, www vs non www, and subdomains. You can name the sitemaps anything you want. URLs must use HTF-8 encoding. All URLs must be encoded for readability by the web server. They can accept max of 50,000 URLs or 10MB, and up to 1,000 sitemaps in one index file. Please use gzip to compress your sitemaps. The text file format is one URL per line, the URLS cannot contain embedded new lines. Each time file can contain a max of 50k URLs. The text file should contain no info other than the list of URLs. The Atom or RSS feed accept 2.0 and .3 atom. If your feed includes only recent URLS, Google can still use that info. She then shows the XML format for a official sitemaps file; urlset, url, loc (path), lastmod (last time page was changed, optional), changefreq (always, hourly, daily, weekly, monthly, yearly or never, and this is optional), priority (between 0.0 and 1.0, this tells Google which of your pages are most important, internal to your site). There is an official Google Sitemaps Generator at google.com/webmasters/sitemaps/ etc.. Once you make your sitemap, you add your site to Google Webmaster Tools. She shows screen shots of a pending verification status, and then will change to OK or a red error link. She then talks a bit about Sitemaps.org, supported by Google, Yahoo and Microsoft.

Amit Kumar from Yahoo is next up. He explains that the have Site Explorer and explains they are the only major engine that offers reports that shows inlinks. It also allows you to submit sitemaps. It is especially useful dynamic sites, etc. It is important to authenticate your site, and you can use the YDN API to ping Yahoo. Check out the publisher network at publisher.yahoo.com, lots of tools you can use as publisher and webmasters. He shows the main Yahoo Site Explorer interface (siteexplorer.search.yahoo.com). You can then manage your feeds for those sites (lots of different formats supported). He recommends you use the feedback link, they do their best to respond, but try to also use the forum in the tool. He shows a submit sitemaps page. He then shows how you can authenticate your site in Site Explorer. He shows the inlink reports and the "explore your site" feature, all old stuff. You can download your data but there are limits. He is missing a slide... Then he puts up a new slide on ysearchblog.com, read some interesting things there.

Eric Papczun from Performics is next up with a case study for Google Sitemaps. He explains these tools are great and exciting for him, for large sites. When building a sitemaps you need to get a complete and accurate list of URLs. Then you convert that file to an XML protocol. Pick your verification method, either meta tag or a file you upload. Sitemaps usually get picked up in a couple days, the entire sitemap is crawled within 3 to 14 days. The average time is about one week. Smaller sites with low PageRank take longer, so refresh your content regularly and add external links. Make sure to have an HTML, native, sitemap for your end users. Focus the crawler on the right content by excluding redundant content, disembodied content (like flash) and spammy stuff. Use preferred domain tool to tell Google if you want www or non www to appear in search results. Include a separate sitemap for news and mobile content. You either see the number of pages increase or decrease after you submit a sitemaps to Google. It depends on how many URLs lead to the same page. Both instances are successes. Google sitemaps is just a tool, use it to help you accomplish your objectives. Use the priority XML tag to tell Google which of your pages are most important (home page, category pages). They use this tag to spotlight frequently updated pages and new pages. They found that Google is responsive to the crawl priority tags. He then shows some Google crawl errors; such as not found errors, etc. A lot of the time are 404 errors, this is like a "poor man link checker." He then shows URLs restricted by the robots.txt, review that carefully. There is also an unreachable URL report. He then shows the Crawl Rate report, very interesting stuff he says. There is also an advanced image search option, he explains, it relates to Google Image Labeler.

Todd Friesen from Range is now up. He explains there are two types of feeds. Back in the day, it was all about the bulk submit, submitting to infoseek and altavista via dumptruck. Now, there are many reasons why our sites are hard to crawl. There is also paid inclusion; shopping, etc. He will talk about Yahoo Shopping, MSN Shopping and Google Base. Yahoo is probably the most active paid inclusion Range uses. He explains that your natural URL will show in the listing, the feed URL may show and you may also show up in the PPC area, so three URLs on that one page is possible. The data used to rank you and display in the SERPs is from the feed and not your page itself. A case study shows that feeds really work, with Yahoo paid inclusion. He then moved over to comparison shopping engines. Normally this stuff is automated, just spit out the data from your database and then put a drop of little human elements into it and it works out very well. MSN shopping he said converts best, Yahoo is OK but not as great as MSN, and Google isnt that great. Highs and Lows of Google Base: It is free and it converts, ranks by relevance over price, limited user support, lots of competition, beta means no guarantees, and no third party tracking allowed. MSN Shopping: reaosnable CPC costs, best grouping algo, on average the volume is lower than base and lackluster on promo. Yahoo Shopping, the highest volume and most expensive, high CPC, conversions are good, customer reviews a bit outdated, they send most traffic. They had one client use Google Base; they built a sitemap on the current site, then built a new sitemap on a new site, 301ed the old urls to the new ones. They submitted the old sitemap, and the crawlers picked up the old urls 301ed to the new ones and they picked up on it and it worked well. They got an order within 24 hours from Google Base, daily traffic grew by 250 visitors and daily sales average about $1,000 per day from Google Base. He then talks about SEO in a more theoretical and philosophical way - deep man, deep.

These posts may have spelling and grammar issues. These are session notes, written quickly and posted immediately after the session has been completed. Please excuse any grammar or spelling issues with session posts.

Previous story: Keynote: Jason Calacanis & Danny Sullivan
 

Comments:

No comments.

blog comments powered by Disqus