Link

Extract document outline from any PDF file

JPedal provides several methods to extract textual content from a PDF file. A PDF file can contain an optional Document outline object. This is a table of contents that can include titles, and links pages with control over zoom and exact area to display. If this is present, this code will extract the outline data object to an XML file. In this case, we can extract the documents outline from a file. If there is no outline, no file is created.

Extract Outline from PDF from Command Line or another language

java -jar jpedal.jar --metadata "pdfFile.pdf" outline

This will output the outline data to the console as a JSON object string.

Example to access API methods

ExtractOutline extract=new ExtractOutline("C:/pdfs/mypdf.pdf");
 //extract.setPassword("password");
 if (extract.openPDFFile()) {
     Document pdfOutline=extract.getPDFTextOutline();
 }

 extract.closePDFfile();

Extract Outline from PDF in Java

ExtractOutline.
writeAllOutlinesToDir("inputFileOrDirectory", "outputDir");

This example uses the JPedal ExtractOutline class. ExtractOutline outputs an XML file per PDF containing various details about the outline entries such as its title, page and initial zoom level.


Start Your Free Trial


Customer Downloads

Select Download