Link

Does this PDF document contain structured text content?

Maybe, depends on the file.

It is possible to create structured PDF files (tagged PDF) which contain information on the page structure or unstructured PDF files which contain no structural information and the content can be in any order. This happens when the PDF is created, and it is not possible to convert unstructured PDF files into structured PDF files.

You can determine if a PDF file contains structured content by opening the file in Adobe Reader and viewing the Document Properties. There is an advanced field named Tagged PDF. If the value is Yes then the files have structured content.

There is an article on our blog with more information on how to find out if your PDF file contains structured content in Acrobat.

The PdfUtilities class also includes a method to test if a PDF file is fully tagged according to the PDF specification.