Meet The News Search Engines

Dec 6, 2005 - 3:14 pm 0 by
Filed Under SES Chicago 2005

There are about 25 people in the room. Andrew Goodman jokes about how we have a small by hyper interested group. I have attended this session before, and it’s been good but I hope they present some interesting information this time around..

Nathan Stoll from Google is up first to talk about Google News. He is going to present some historical background for Google News and where it came from. He will continue to discuss how Google News functions, no trade secrets unfortunately, but a nice overview of how it works.

His first screen shot is a demo shot of what Google New first looked at. Google wanted to provide a level of introspection to present new articles. The styles then (2001) are similar to what it looks like today. There is the example of Indian and Pakistani news, and he asks what perspective these articles were written from. What is the best article to present from these various sources from similar locations.

Google News hosts a conversation around stories in the news. All news publishers are invited to participate. They want to offer a comprehensive coverage of online news. He says they want users to become more passionate about news from what they do. It can change the perspective the way people use news. They are scientists and don’t want to editorialize the news. The interface has gotten more complex over times, by adding RSS and Atom feeds, you can customize the interface and so on. Google crawls thousands of news sites and there is an ongoing challenging to find the best content. They want to be able to accept the broadest amount of news out there. They are always looking for new news urls. Once they grab the html page, and extract the new article and images while obeying the robots exclusion protocol, it goes to be sorted. There is some specific aspects for new articles that can be problematic for Google. They want to do it accurately as website varying and so does structure. One of the mistakes for news site is to use the same url over and over. The crawler may not recognize the article if its from a url that has been used before. He says be sure to use a unique url each time your do a news article.

So how does Google group articles by story. Google looks for the words in the articles and groups them together (clustering). They build a cluster of articles that have a high degree of commonality. Some are put in different sections, but at the cluster level they are in the same place. The search engine classifies the articles into section from clustering.

What is the most important news story of the day? How does Google decide? How do they look at a cluster to do that? A big event causes a lot of articles to be published at once. A smaller article has lesser sources. Within the group of stories, what article will be shown first? They use different signals to determine this from a web search aspect, clustering, and so on. Google News homepage changes often. They rotate between a bunch of different viewpoints. Originality is important. Google wishes to include opposing viewpoints as much as they can.

Google News Search is based on relevancy, such as the importance, recency, and relevance. One of the problems they run into is wrap ups of articles, or several news articles on the same page. Users really do understand brands very much in the news space he says. They understand brand, but also speed, reliability, and cleanness in the space. Those things can also help with publishing news articles.

Chris Tolles, VP of Marketing from Topix.net and is going to talk about the business of online news. Online news is driving major traffic. News is the second largest driver of traffic. 41% of Google users are looking for news. Online news is very sticky, a lot of return users. 39% of users at online news sites visit two or more times per day. Key 18-24 old demographic reads more news per week than any other. MySpace generate 10 billion page views a month. How do they do it? They let users create there own news.

The online transition rough for print. Online growth vs. print revenues is an issue. Is online news growing fast enough to affect offline news. He says no. There are natural monopolies breaking down due to online news from accidents in geography. You are competing with sources from around the world instead of just a local area. The business of journalism is changing. The trust of main stream sources are in flux. The audience wants to join the game. The internet you heard about in 1996 is here. The audience to support it is here. A cheap publishing system for everyone. Advertising networks turn traffic to revenue.

Search is kind of like the yellow pages. People want to come find something, and its easy to monetize. For example, you are not going to Google before you go to lunch, to find the lunch specials. It just doesn’t work yet. The monetization of news is different than the yellow pages. The user problem is the proliferation of content sources makes it difficult for users to survey the entire pool. Example of this is Palo Alto, CA, there are so many news sources. There is a tremendous amount of growth in news search. If there is a free way to publish anything, people will use it.

They are finding now that there are 8-16 million blogs that they can add as new sources. People are changing the way they view news. Why do you include them as news? They are different yes, but the people that often matter are in conversation in the blogspace. Entertainment and business are better covered on blogs than on traditional news. If you tag the news, it happens to monetize a whole lot better.

There is a huge investment ongoing in the online news area. Yahoo and Google both investing. People want to read more, they get to the end of the page and want more. There is opportunity there. This is no longer a niche, it’s a conversation. Need to create a system of participation. There is no way to tell if a emerging article will have influence on the news. The new generation expects a voice and one of the keys about the internet is that is interactive. People expect to participate and interact. A newsroom should also support itself. Weblogs point to new content models.

Very nice presentation, lots of good information in this one.

Q & A

Q: Have you had people refuse to publish? How do you monetize Google News? A: They don’t monetize Google News at this point. Maybe in the future it seems. He said he can’t comment on things going on. Google will also not pay for royalties. You either request to be added or not. By in large the vast majority of people want to be included in Google News. They usually don’t see it as a competing source. People still use there CNN.com homepage. Its not a competitive space in that regard. Topix says they have an opt out option. In 2 years they have had only 4 people request not to be spidered. You should want the traffic.

Q: We work with organizational products. If we were to try to include links in the news source, can you do that? Or would be better to approach your (Topix) sources? A: If you publish information that is useful to your users, you will most likely get found. You can do promotional editorials, or buy link based advertising with them. It needs to be clear to get picked up correctly.

 

Popular Categories

The Pulse of the search community

Follow

Search Video Recaps

 
Google Core Update Rumbling, Manual Actions FAQs, Core Web Vitals Updates, AI, Bing, Ads & More - YouTube
Video Details More Videos Subscribe to Videos

Most Recent Articles

Google Updates

Google Urges Patience As The March 2024 Core Update Continues To Rollout

Mar 18, 2024 - 7:51 am
Google

Official: Google Replaces Perspective Filter With Forums Filter

Mar 18, 2024 - 7:41 am
Google Maps

Google Business Profiles Now Offers Additional Review After Appeal Is Denied

Mar 18, 2024 - 7:31 am
Google Maps

EU Searchers Complaining About Google Maps Features Changes Related To DMA

Mar 18, 2024 - 7:21 am
Google

Google Showing Fewer Sitelinks Within Search

Mar 18, 2024 - 7:11 am
Search Forum Recap

Daily Search Forum Recap: March 15, 2024

Mar 15, 2024 - 4:00 pm
Previous Story: SEM Via Communities, Wikipedia & Tagging