Link

Find Text in a PDF File

JPedal provides a simple class to search text within a PDF and output the coordinates of the found results.

This search can be done using a simple convenience method or using a more in-depth approach providing greater control.

Convenience Static Method

String pdfFile = "inputFile.pdf";
String textToFind = "textToFind";

ArrayList <Float[] > resultsForPages=FindTextInRectangle.findTextOnAllPages(pdfFile, textToFind);

Customizable Method

FindTextInRectangle extract = new FindTextInRectangle("inputFile.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
    int pageCount = extract.getPageCount();
    for (int page = 1; page <= pageCount; page++) {
        float[] coords = extract.findTextOnPage(page, "textToFind", SearchType.MUTLI_LINE_RESULTS);
    }
}
extract.closePDFfile();

Search Type

For complex searches, you can set a SearchType. Valid values are

  • DEFAULT (0)
  • WHOLE_WORDS_ONLY (1)
  • CASE_SENSITIVE (2)
  • FIND_FIRST_OCCURANCE_ONLY (4)
  • MUTLI_LINE_RESULTS (8)
  • HIGHLIGHT_ALL_RESULTS (16)
  • USE_REGULAR_EXPRESSIONS (32)

These values can be combined by using the bitwise OR operator. For example,

int searchType = SearchType.WHOLE_WORDS_ONLY | SearchType.CASE_SENSITIVE;

A note on co-ordinates

Examples use the PDF co-ordinates which start at the bottom left of the page and run up the page. This is the opposite of Java (which run from top left down the page).

This example uses the JPedal FindTextInRectangle class.


Why JPedal?

  • Actively developed commercial library with full support and no third party dependencies.
  • Process PDF files up to 3x faster than alternative Java PDF libraries.
  • Simple licensing options and source code access for OEM users.

Start Your Free Trial


Customer Downloads

Select Download