Find Text in a PDF File
JPedal provides a simple class to search text within a PDF and output the coordinates of the found results.
This search can be done using a simple convenience method or using a more in-depth approach providing greater control.
Convenience Static Method
String pdfFile = "/path/to/file.pdf";
String textToFind = "textToFind";
ArrayList <Float[] > resultsForPages=FindTextInRectangle.findTextOnAllPages(pdfFile, textToFind);
Customizable Method
FindTextInRectangle extract=new FindTextInRectangle("/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
int pageCount=extract.getPageCount();
for (int page=1; page <=pageCount; page++) {
float[] coords=extract.findTextOnPage(page, "textToFind", SearchType.MUTLI_LINE_RESULTS ) ;
}
}
extract.closePDFfile();
Search Type
For complex searches, you can set a SearchType. Valid values are
- public final static int DEFAULT = 0;
- public final static int WHOLE_WORDS_ONLY = 1;
- public final static int CASE_SENSITIVE = 2;
- public final static int FIND_FIRST_OCCURANCE_ONLY = 4;
- public final static int MUTLI_LINE_RESULTS = 8;
- public final static int HIGHLIGHT_ALL_RESULTS = 16;
- public final static int USE_REGULAR_EXPRESSIONS= 32;
These values can be combined by using the bitwise OR operator. For example,
int searchType = SearchType.WHOLE_WORDS_ONLY | SearchType.CASE_SENSITIVE;
A note on co-ordinates
Examples use the PDF co-ordinates which start at the bottom left of the page and run up the page. This is the opposite of Java (which run from top left down the page).
This example uses the JPedal FindTextInRectangle class.
Just click a button below if you have more questions or need any help