Link

Find Text in a PDF File

JPedal provides a simple class to search text within a PDF and output the coordinates of the found results.

This search can be done using a simple convenience method or using a more in-depth approach providing greater control.

Convenience Static Method

String pdfFile = "/path/to/file.pdf";
String textToFind = "textToFind";

ArrayList <Float[] > resultsForPages=FindTextInRectangle.findTextOnAllPages(pdfFile, textToFind);

Customizable Method

FindTextInRectangle extract=new FindTextInRectangle("/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
    int pageCount=extract.getPageCount();
    for (int page=1; page <=pageCount; page++) {
        float[] coords=extract.findTextOnPage(page, "textToFind", SearchType.MUTLI_LINE_RESULTS ) ;
    }
}
extract.closePDFfile();

Search Type

For complex searches, you can set a SearchType. Valid values are

  • public final static int DEFAULT = 0;
  • public final static int WHOLE_WORDS_ONLY = 1;
  • public final static int CASE_SENSITIVE = 2;
  • public final static int FIND_FIRST_OCCURANCE_ONLY = 4;
  • public final static int MUTLI_LINE_RESULTS = 8;
  • public final static int HIGHLIGHT_ALL_RESULTS = 16;
  • public final static int USE_REGULAR_EXPRESSIONS= 32;

These values can be combined by using the bitwise OR operator. For example,

int searchType = SearchType.WHOLE_WORDS_ONLY | SearchType.CASE_SENSITIVE;

A note on co-ordinates

Examples use the PDF co-ordinates which start at the bottom left of the page and run up the page. This is the opposite of Java (which run from top left down the page).

This example uses the JPedal FindTextInRectangle class.