Link

Extract document outline from any PDF file

JPedal provides several methods to extract textual content from a PDF file. A PDF file can contain an optional Document outline object. This is a table of contents that can include titles, and links pages with control over zoom and exact area to display. If this is present, this code will extract the outline data object to an XML file. In this case, we can extract the documents outline from a file. If there is no outline, no file is created.

Extract Outline from PDF from Command Line or another language

java -jar jpedal.jar --metadata "inputFile.pdf" outline

This will output the outline data to the console as a JSON object string.

Example to access API methods

ExtractOutline extract = new ExtractOutline("inputFile.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
    Document pdfOutline = extract.getPDFTextOutline();
}
extract.closePDFfile();

Extract Outline from PDF in Java

ExtractOutline.writeAllOutlinesToDir("inputFileOrFolder", "outputFolder");

This example uses the JPedal ExtractOutline class. ExtractOutline outputs an XML file per PDF containing various details about the outline entries such as its title, page and initial zoom level.


Why JPedal?

  • Actively developed commercial library with full support and no third party dependencies.
  • Process PDF files up to 3x faster than alternative Java PDF libraries.
  • Simple licensing options and source code access for OEM users.

Start Your Free Trial


Customer Downloads

Select Download