Validating PDFs
We are sometimes asked if it’s possible to validate PDF files, for example as a method to determine if there will be errors in the output.
Whilst it is possible to validate that a PDF file conforms to the PDF specification, for example using the Arlington PDF Model, this is primarily intended for software that produces PDF files rather than software that consumes them.
PDF is a complex format, and we have encountered a lot of “creative” interpretations of the PDF specification over the years. If it works in Adobe Reader then most people assume the PDF file is valid (even when it’s not). For software that consumes PDF files, it’s important to handle issues gracefully.
It is very similar situation to HTML, where the majority of web pages would likely fail if you validate them, but they are still rendered by web browsers in an entirely acceptable manner. A large proportion of PDF files would also fail validation but are still rendered well.
BuildVu makes a best effort attempt at converting PDF documents. If there is a critical error that prevents conversion from continuing, then BuildVu will throw a PdfException (if running from Java) and set a non-zero exit status (if running from command line).
When we receive bug reports, most of the time we are able to add a workaround for the issue when it is caused by not following the PDF spec as we expect.