Extract document outline from any PDF file
JPedal provides several methods to extract textual content from a PDF file. A PDF file can contain an optional Document outline object. This is a table of contents that can include titles, and links pages with control over zoom and exact area to display. If this is present, this code will extract the outline data object to an XML file. In this case, we can extract the documents outline from a file. If there is no outline, no file is created.
Extract Outline from PDF from Command Line or another language
java -jar jpedal.jar --metadata "inputFile.pdf" outline
This will output the outline data to the console as a JSON object string.
Example to access API methods
ExtractOutline extract = new ExtractOutline("inputFile.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
Document pdfOutline = extract.getPDFTextOutline();
}
extract.closePDFfile();
Extract Outline from PDF in Java
ExtractOutline.writeAllOutlinesToDir("inputFileOrFolder", "outputFolder");
This example uses the JPedal ExtractOutline class. ExtractOutline outputs an XML file per PDF containing various details about the outline entries such as its title, page and initial zoom level.