Link

Extracting Metadata from a PDF on the Command Line

JPedal is able to extract metadata from a PDF as a JSON object to reuse.

All or only part of this data can be extracted in any order from a file using the following command-line command.

java -jar jpedal.jar --metadata filename.pdf [metaDataType]...

Data that can be accessed in this way and the metaDataType are as follows.

  • Document MetaData fields - fields
  • MetaData XML - xml
  • Page size data - pagesizes
  • Document Bookmarks/Outline - outline
  • Document font list - fonts
  • Document page count - pagecount

The valid values for metaDataType are any combination of the above bold values separated by a space character. If you request the same type multiple times for a single command then it will only be output once.

If no value is set, the default option of metaDataType is the full list.

Piping the data

If you wish to save this information for use elsewhere you can use a pipe in batch or bash scripts to pipe the output to a file with the following.

java -jar jpedal.jar --metadata filename.pdf > output.txt

This functionality uses the JPedal PDFUtilities class.