Extracting Metadata from a PDF on the Command Line
JPedal is able to extract metadata from a PDF as a JSON object to reuse.
All or only part of this data can be extracted in any order from a file using the following command-line command.
java -jar jpedal.jar --metadata inputFile.pdf [metaDataType]...
Data that can be accessed in this way and the metaDataType are as follows.
- Document MetaData fields - fields
- MetaData XML - xml
- Page size data - pagesizes
- Document Bookmarks/Outline - outline
- Document font list - fonts
- Document page count - pagecount
The valid values for metaDataType are any combination of the above bold values separated by a space character. If you request the same type multiple times for a single command then it will only be output once.
If no value is set, the default option of metaDataType is the full list.
Piping the data
If you wish to save this information for use elsewhere you can use a pipe in batch or bash scripts to pipe the output to a file with the following.
java -jar jpedal.jar --metadata inputFile.pdf > outputFile.txt
This functionality uses the JPedal PDFUtilities class.