PDF files can contain metadata tags to preserve the structure of textual content in a PDF (this is an option when the PDF file is created). If present, JPedal provides several methods to extract text content from a PDF file. In this case, we can extract any structured text present in a PDF. If not present the output file will contain a brief message explaining no content was available
Extract Structured Text from PDF with the Command-Line or another language
java -cp ./jars/jpedal.jar org/jpedal/examples/text/ExtractStructuredText
Extract Structured Text from PDF in Java
This example uses the JPedal ExtractStructuredText class. ExtractStructuredText outputs an XML file for the file detailed the structured content the file contains.