JPedal provides several methods to extract textual content from a PDF file. A PDF file can contain an optional Document outline object. This is a table of contents which can include titles, and links pages with control over zoom and exact area to display. If this is present, this code will extract the outline data object to an XML file. In this case, we can extract the documents outline from a file. If there is no outline, no file is created.
Extract Outline from PDF with the Command-Line or another language
java -jar jpedal.jar --metadata "pdfFile.pdf" outline
This will output the outline data to the console as a JSON object string.
Extract Outline from PDF in Java
This example uses the JPedal ExtractOutline class. ExtractOutline outputs an XML file per PDF containing various details about the outline entries such as its title, page and initial zoom level.