Find Text in a PDF File
JPedal provides a simple class to search text within a PDF and output the coordinates of the found results.
This search can be done using a simple convenience method or using a more in-depth approach providing greater control.
Convenience Static Method
String pdfFile = "/path/to/file.pdf";
String textToFind = "textToFind";
ArrayList <Float[] > resultsForPages=FindTextInRectangle.findTextOnAllPages(pdfFile, textToFind);
Customizable Method
FindTextInRectangle extract = new FindTextInRectangle("/pdfs/mypdf.pdf");
//extract.setPassword("password");
if (extract.openPDFFile()) {
int pageCount = extract.getPageCount();
for (int page = 1; page <= pageCount; page++) {
float[] coords = extract.findTextOnPage(page, "textToFind", SearchType.MUTLI_LINE_RESULTS);
}
}
extract.closePDFfile();
Search Type
For complex searches, you can set a SearchType. Valid values are
DEFAULT
(0)WHOLE_WORDS_ONLY
(1)CASE_SENSITIVE
(2)FIND_FIRST_OCCURANCE_ONLY
(4)MUTLI_LINE_RESULTS
(8)HIGHLIGHT_ALL_RESULTS
(16)USE_REGULAR_EXPRESSIONS
(32)
These values can be combined by using the bitwise OR operator. For example,
int searchType = SearchType.WHOLE_WORDS_ONLY | SearchType.CASE_SENSITIVE;
A note on co-ordinates
Examples use the PDF co-ordinates which start at the bottom left of the page and run up the page. This is the opposite of Java (which run from top left down the page).
This example uses the JPedal FindTextInRectangle class.