Link
Skip to main content

Extracting Metadata from a PDF on the Command Line

JPedal is able to extract metadata from a PDF as a JSON object to reuse.

All or only part of this data can be extracted in any order from a file using the following command-line command.

java -jar jpedal.jar --metadata inputFile.pdf [metaDataType]...

Data that can be accessed in this way and the metaDataType are as follows.

  • Document MetaData fields - fields
  • MetaData XML - xml
  • Page size data - pagesizes
  • Document Bookmarks/Outline - outline
  • Document font list - fonts
  • Document page count - pagecount

The valid values for metaDataType are any combination of the above bold values separated by a space character. If you request the same type multiple times for a single command then it will only be output once.

If no value is set, the default option of metaDataType is the full list.

Piping the data

If you wish to save this information for use elsewhere you can use a pipe in batch or bash scripts to pipe the output to a file with the following.

java -jar jpedal.jar --metadata inputFile.pdf > outputFile.txt

This functionality uses the JPedal PDFUtilities class.


Why JPedal?

  • Actively developed commercial library with full support and no third party dependencies.
  • Process PDF files up to 3x faster than alternative Java PDF libraries.
  • Simple licensing options and source code access for OEM users.

Learn more about JPedal

Start Your Free Trial


Customer Downloads

Select Download