Link

Find Text in a PDF File

JPedal provides a simple class to search text within a PDF and output the coordinates of the found results.

This search can be done using a simple convenience method or using a more in-depth approach providing greater control.

Convenience Static Method

String pdfFile = "/path/to/file.pdf";
String textToFind = "textToFind";

ArrayList <Float[] > resultsForPages=FindTextInRectangle.findTextOnAllPages(pdfFile, textToFind);

Customizable Method

FindTextInRectangle extract = new FindTextInRectangle("/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
    int pageCount = extract.getPageCount();
    for (int page = 1; page <= pageCount; page++) {
        float[] coords = extract.findTextOnPage(page, "textToFind", SearchType.MUTLI_LINE_RESULTS);
    }
}
extract.closePDFfile();

Search Type

For complex searches, you can set a SearchType. Valid values are

  • DEFAULT (0)
  • WHOLE_WORDS_ONLY (1)
  • CASE_SENSITIVE (2)
  • FIND_FIRST_OCCURANCE_ONLY (4)
  • MUTLI_LINE_RESULTS (8)
  • HIGHLIGHT_ALL_RESULTS (16)
  • USE_REGULAR_EXPRESSIONS (32)

These values can be combined by using the bitwise OR operator. For example,

int searchType = SearchType.WHOLE_WORDS_ONLY | SearchType.CASE_SENSITIVE;

A note on co-ordinates

Examples use the PDF co-ordinates which start at the bottom left of the page and run up the page. This is the opposite of Java (which run from top left down the page).

This example uses the JPedal FindTextInRectangle class.


Start Your Free Trial


Customer Downloads

Select Download