Can A Search Engine Like Index My PDF Files?

Nov 10, 2006 • 6:51 am | comments (5) by twitter Google+ | Filed Under SEO - Search Engine Optimization

pdf_logo_trefoil.gifA Search Engine Watch Forums thread has a member not understanding why Google has not indexed his PDF files. To make a long story short, Google did index his PDF files, but I thought it would make a nice quick post to explain the type of PDF documents search engines can or cannot index well.

Like with any document, HTML, PDF, Word file, etc, the search engines love text. So you write this document, 100% text in Word and then you convert it to a PDF file. Some PDF convertors will translate the text in the document into text format in the PDF document. Some PDF convertors will take an image screen capture of the Word file and use that within the PDF document.

Now images may look fine, but just like you cannot copy and paste the text from an image from one text editor to another, the same is with this PDF document.

If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.

I assume, eventually, if not now, search engines will use OCR technology to read those PDF files that appear to be text driven, but in reality they are graphics.

So how do you know if your PDFs are Search Engine Friendly? Try to copy and paste the body text from the PDF to a text editor like Word or Note Pad. If that works, then it is most likely that Google, Yahoo!, MSN (, and will index those PDFs.

Forum discussion at Search Engine Watch Forums.

Previous story: Google Tests AdWords "Account Snapshot" Beta
blog comments powered by Disqus