Link

Extracting PDF Metadata and Metrics in Java

The JPedal library can be used to extract metadata about the PDF file. There are several PdfUtilities class.

You can use PdfUtilities in your own applications with the following example code, just remove the lines you do not require.

final PdfUtilities utilities = new PdfUtilities("path/to/exampleFile.pdf");
utilities.setPassword("password"); //Only required is file requires password
try {
    if (utilities.openPDFFile()) {
        
        //Returns a String containing the PDF version for the document
        final String documentPDFVersion = utilities.getPDFVersion();

        //Returns true if files contains any embedded fonts
        final boolean hasEmbeddedFonts = utilities.hasEmbeddedFonts();

        //Returns a map where the key is the page number and the value is a String detailing fonts for that page
        final Map<Integer, String > documentFontData = utilities.getAllFontDataForDocument();

        //Returns a String containing all metadata fields for the document
        final String documentPropertiesAsXML = utilities.getDocumentPropertyFieldsInXML();

        //Returns a map where the key is the property name and the value is the properties value
        final Map<String, String > documentPropertiesAsMap = utilities.getDocumentPropertyStringValuesAsMap();

        //Returns a boolean to show true if the file confirms to all tagged PDF conventions. It may be possible to extract some tagged content even if false
        final boolean isFullyTagged = utilities.isMarkedContent();

        //Returns the permissions value for this PDF and shows the permissions as a string in the console
        final int permissions = utilities.getPdfFilePermissions();
        PdfUtilities.showPermissionsAsString(permissions);

        //Returns the total page count as an int
        final int totalPageCount = utilities.getPageCount();

        for (int i = 1; i != totalPageCount; i++) {
            //Get the page dimensions for the specified page in the given units and type
            final float[] pageDimensions = utilities.getPageDimensions(i, PdfUtilities.PageUnits.Pixels, PdfUtilities.PageSizeType.CropBox);

            //Returns the total number of PDF commands used to define the specified pages content
            final int commandCountForPage = utilities.getCommandCountForPageStream(i);

            //Returns the font data as a string for the specified page
            final String fontDataForPage = utilities.getFontDataForPage(i);

            //Returns the image data as a String for the specified page
            final String xImageDataForPage = utilities.getXImageDataForPage(i);

        }
    }
} catch(final PdfException e) {
    e.printStackTrace();
}
utilities.closePDFfile();

The PDFUtilities class can also be instanced using a second constructor that will accept a pdf as a byte array instead of a file name. This constructor can be used as follows.

byte[] pdfByteArray;

//Read PDF into pdfByteArray

final PdfUtilities utilities = new PdfUtilities(pdfByteArray);

Start Your Free Trial


Customer Downloads

Select Download