Link

Text Mode Options

One of the main configuration options for BuildVu is the Text Mode. This controls how the text, shapes, and images are output. There are a range of options to meet different requirements.

PDF files are rendered using a layering system, where content can be hidden behind other content in layers above. When converting to HTML, BuildVu simplifies this model to a background layer for presentation and a foreground layer for text selection. Some text modes handle this in different ways. See the descriptions below for the exact behaviour.

The text mode can be set using the VM argument -Dorg.jpedal.pdf2html.textMode=[MODE] or using the setTextMode() method in the relevant Options class:

There are two main output formats (SVG or Image), each with three text behaviours, making a total of 6 modes to choose from.

  • svg_realtext (default)
  • svg_shapetext_selectable
  • svg_shapetext_nonselectable
  • image_realtext
  • image_shapetext_selectable
  • image_shapetext_nonselectable

Image or SVG?

This controls the format of the background layer.

SVG Modes (svg_realtext, svg_shapetext_selectable, svg_shapetext_nonselectable)

  • Vector content remains vector content, meaning that pages can be zoomed without loss of quality.
  • Shapes are drawn as SVG shapes.
  • Images and shades are drawn as images within the SVG.
  • A single .svg file is used per page (with separate image files by default).
  • Higher quality but slower to render than image modes.

Image Modes (image_realtext, image_shapetext_selectable, image_shapetext_nonselectable)

  • All shapes, images, and shades are drawn onto an image layer.
  • Image modes have the smallest file size and are rendered fastest (generally speaking).
  • With just one image per page, the output is simple and concise.
  • Shapes will become pixelated when zooming in.

Real Text, Shape Text, or both?

This controls how text is rendered in the output.

RealText Modes (svg_realtext, image_realtext)

  • PDF Text is converted to real, selectable text and displayed using fonts.
  • File sizes are the lowest.
  • Text is searchable (such as by web crawlers or in web browsers).
  • Text can be modified (such as by automatic translation tools).
  • Text spacing is averaged across each line (text kerning is not supported in HTML)
  • Text is brought to the front layer for selection purposes, which may display text that would otherwise be hidden by other content, and will prevent blend modes being applied to text.

If the most accurate display of text is required then a shapetext mode should be used.

ShapeText Selectable Modes (svg_shapetext_selectable, image_shapetext_selectable)

  • Text is drawn out as shapes within the SVG or Image background layer.
  • A layer of invisible, real text is drawn in the foreground layer to be used for search and selection.
  • Text can be searched and selected.
  • Perfect representation with the PDF, retaining text selection.
  • There is a performance and file size penalty when using svg shapetext variants.
  • Text may become pixelated when zooming in when using image shapetext variants. This can be alleviated by increasing the imageScale setting.

ShapeText Nonselectable Modes (svg_shapetext_nonselectable, image_shapetext_nonselectable)

  • Text is drawn out as shapes within the SVG or Image background layer.
  • Perfect text representation of the PDF (but no search/selection).
  • There is a performance and file size penalty when using svg shapetext variants.
  • Text may become pixelated when zooming in when using image shapetext variants. This can be alleviated by increasing the imageScale setting.

Experimental Text Mode

We are currently evaluating an experimental text mode which writes text into the background SVG layer as real text. As SVG supports text kerning, this improves the display accuracy, and also prevents layering issues that can occur with realtext modes.

An additional layer of invisible HTML text is also added to the foreground layer for selection purposes.

This mode can be enabled with the command line option (or system property): -DexperimentalTextMode=true

Please let us know if you have any feedback regarding this mode.

The default setting (svg_realtext) offers a good balance between conversion accuracy, file size and rendering performance.

If your priority is display accuracy then we recommend the shapetext modes, however this comes at the price of slower rendering (in SVG modes) and larger file sizes.

If you are looking for the highest performance then we recommend the realtext modes. Generally, browsers are also quicker to render image modes than SVG.