Extract Clipped Images from PDF
JPedal provides several methods to allow easy extraction of clipped images that appear with PDF files. Clipped images are the images from the PDF as they would appear on the page. PDFs can contain multiple images that can be clipped, rotated, and scaled for display on the page. This example returns the image with these applied, to get the raw images you can use ExtractImages
Extract Clipped Images from PDF with the Command-Line or another language
java -jar jpedal.jar --extractClippedImages inputFileOrFolder outputFolder outputFormat outputSize subFolder [outputSize subFolder]...
The final two variables for this example allow you to specify the size you want the images to be extracted at by specifying a height or a scale for the outputSize. If the outputSize is a number the images are resized to be the specified size or the size it appears on page if it is smaller. A value of -1 will output the image at the size it appears on page. If the outputSize is a x followed by a number (x1) the image is scaled by this factor. The subFolder is the subdirectory to save the previous size variable into. By repeating the outputSize and subFolder variables you can create multiple sizes from a single command.
java -jar jpedal.jar --extractClippedImages inputFileOrFolder outputFolder outputFormat -1 raw 100 medium 50 thumbnail x2 large
Extract Images from PDF in Java
Static Convenience Method
ExtractClippedImages.writeAllClippedImagesToDir(
"inputFileOrFolder",
"outputFolder",
"outputImageFormat",
new String[] {"imageHeightAsFloat", "subFolderForHeight"});
API Access Methods
ExtractClippedImages extract = new ExtractClippedImages("inputFile.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
int pageCount = extract.getPageCount();
for (int page = 1; page <= pageCount; page++) {
int imagesOnPageCount = extract.getImageCount(page);
for (int image = 0; image < imagesOnPageCount; image++) {
BufferedImage image = extract.getClippedImage(page, image);
}
}
}
extract.closePDFfile();
This example uses the JPedal ExtractClippedImages class. The final variable is a String array that will specify a series of output sizes and sub-directories pairs in a similar manner to the outputSizes / subDirectory variables used on the command line.
ExtractClippedImages can output images in several different image formats including BMP, PNG, JPG and TIFF.