Link
Skip to main content

How to alter the file encoding for Search and Extraction?

Java is capable of writing out text in different encodings. In some cases this can mean the text you see on the page might not match that seen once extracted. The most common cause is the content is read as one encoding but somewhere else it is treated as another.

One common result of this are characters not being recognised and being returned as ????.

If you find yourself using the search or extraction feature we recommend setting the following VM argument.

-Dfile.encoding=UTF-8

As of Java 18 this flag is not required as it has been set by default.

A list of supported encodings can be found here.


Why JPedal?

  • Actively developed commercial library with full support and no third party dependencies.
  • Process PDF files up to 3x faster than alternative Java PDF libraries.
  • Simple licensing options and source code access for OEM users.

Learn more about JPedal

Start Your Free Trial


Customer Downloads

Select Download