Google Tests AdWords "Account Snapshot" Beta | Main | A Method To Delist Your Competitor From MSN Search's Live.com?

Can A Search Engine Like Google.com Index My PDF Files?

pdf_logo_trefoil.gifA Search Engine Watch Forums thread has a member not understanding why Google has not indexed his PDF files. To make a long story short, Google did index his PDF files, but I thought it would make a nice quick post to explain the type of PDF documents search engines can or cannot index well.

Like with any document, HTML, PDF, Word file, etc, the search engines love text. So you write this document, 100% text in Word and then you convert it to a PDF file. Some PDF convertors will translate the text in the document into text format in the PDF document. Some PDF convertors will take an image screen capture of the Word file and use that within the PDF document.

Now images may look fine, but just like you cannot copy and paste the text from an image from one text editor to another, the same is with this PDF document.

If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.

I assume, eventually, if not now, search engines will use OCR technology to read those PDF files that appear to be text driven, but in reality they are graphics.

So how do you know if your PDFs are Search Engine Friendly? Try to copy and paste the body text from the PDF to a text editor like Word or Note Pad. If that works, then it is most likely that Google, Yahoo!, MSN (Live.com), and Ask.com will index those PDFs.

Forum discussion at Search Engine Watch Forums.



Like The Story? Vote For It On Yahoo Buzz! Or On Sphinn!

posted rustybrick in Search Engine Optimization at November 10, 2006 6:51 AM Comments (4)

Comments

For more information on getting PDFs to index in Google, see this post:
http://www.acrobatusers.com/blogs/duffjohnson/2006/06/29/making-your-pdfs-work-well-with-google/

 

Excellent article Duff, thank you for posting it here.

 

Google as already been using OCR for some time
When I had Google Desktop Search on my PC, I installed a plugin from Scansoft, and almost all my images and PDF were correctly index

 

Well, yea, they also do it in Google Book Search.

 

Post a comment (Note: Can Take 120 Seconds For Your Comment To Show Up)

Do you want us to save your personal Information?


To subscribe to the Search Engine Roundtable, click here