Can A Search Engine Like Google.com Index My PDF Files?

Nov 10, 2006 • 6:51 am | comments (5) by twitter Google+ | Filed Under SEO - Search Engine Optimization
 

pdf_logo_trefoil.gifA Search Engine Watch Forums thread has a member not understanding why Google has not indexed his PDF files. To make a long story short, Google did index his PDF files, but I thought it would make a nice quick post to explain the type of PDF documents search engines can or cannot index well.

Like with any document, HTML, PDF, Word file, etc, the search engines love text. So you write this document, 100% text in Word and then you convert it to a PDF file. Some PDF convertors will translate the text in the document into text format in the PDF document. Some PDF convertors will take an image screen capture of the Word file and use that within the PDF document.

Now images may look fine, but just like you cannot copy and paste the text from an image from one text editor to another, the same is with this PDF document.

If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.

I assume, eventually, if not now, search engines will use OCR technology to read those PDF files that appear to be text driven, but in reality they are graphics.

So how do you know if your PDFs are Search Engine Friendly? Try to copy and paste the body text from the PDF to a text editor like Word or Note Pad. If that works, then it is most likely that Google, Yahoo!, MSN (Live.com), and Ask.com will index those PDFs.

Forum discussion at Search Engine Watch Forums.

Previous story: Google Tests AdWords "Account Snapshot" Beta
 

Comments:

Duff Johnson

11/10/2006 04:52 pm

For more information on getting PDFs to index in Google, see this post: http://www.acrobatusers.com/blogs/duffjohnson/2006/06/29/making-your-pdfs-work-well-with-google/

Barry Schwartz

11/10/2006 05:15 pm

Excellent article Duff, thank you for posting it here.

BUGabundo

11/10/2006 07:09 pm

Google as already been using OCR for some time When I had Google Desktop Search on my PC, I installed a plugin from Scansoft, and almost all my images and PDF were correctly index

Barry Schwartz

11/10/2006 07:19 pm

Well, yea, they also do it in <a href="http://books.google.com/">Google Book Search</a>.

webdco.com

10/05/2012 05:25 pm

Find uselful info here: http://googleblog.blogspot.ca/2010/06/our-new-search-index-caffeine.html

blog comments powered by Disqus