Link

How did div tags change in 2021.06 and why?

In the 2021.06 release of BuildVu-HTML, the text elements switched from using div tags to span tags. An extra parent container element with associated CSS was also added.

What changed?

Before:

<style class="shared-css" type="text/css" >
.t {
	transform-origin: bottom left;
	z-index: 2;
	position: absolute;
	white-space: pre;
	overflow: visible;
	line-height: 1.5;
}
</style>

<div id="t1_1" class="t s1_1">Some things never change</div>
<div id="t2_1" class="t s2_1">Never trust a dog to watch your food.</div>
<div id="t3_1" class="t s3_1"></div>
<div id="t4_1" class="t s4_1">Patrick, age 10</div>

After:

<style class="shared-css" type="text/css" >
.t {
	transform-origin: bottom left;
	z-index: 2;
	position: absolute;
	white-space: pre;
	overflow: visible;
	line-height: 1.5;
}
.text-container {
	white-space: pre;
}
@supports (-webkit-touch-callout: none) {
	.text-container {
		white-space: normal;
	}
}
</style>

<div class="text-container"><span id="t1_1" class="t s1_1">Some things never change </span>
<span id="t2_1" class="t s2_1">Never trust a dog to watch your food. </span>
<span id="t3_1" class="t s3_1"></span><span id="t4_1" class="t s4_1">Patrick, age 10 </span></div>

Why did it change?

This provides a range of benefits, including fixing a problem where new lines would erroneously be inserted when copying and pasting text. This is because browsers automatically insert new line characters at the end of div elements when copying them.

In addition, we have added additional handling to properly detect and insert line breaks between text elements and to add space characters at the end of text blocks where necessary for the proper extraction of the text.

This change also fixes issues in the search.json file where space characters could be inserted unnecessarily including in the middle of words.

Before:

before

After:

after

Why the @supports CSS?

The text selection engine on iOS and iPadOS is somewhat sub-optimal, and can struggle with text that is positioned absolutely (which is how text in PDF is positioned).

The text selection engine really struggles when dealing with line breaks within absolutely positioned text.

broken

For this reason, we have added CSS to ignore the line breaks on iOS and iPadOS.

working

Irrespective of the support for line breaks, the switch to span tags does not affect the text extraction on iOS and iPadOS regardless. This is because unlike other browsers, Safari on iOS/iPadOS inserts new line characters at the end of span tags when copying them if they are positioned absolutely.

We are aware that the iOS behaviour is not ideal and will continue exploring ideas to improve this going forwards.


What's included in your BuildVu trial?

  • Access to download the SDK and run it locally.
  • Access to the cloud trial to convert documents in the IDR cloud.
  • Access to the Docker image to set up your own trial server in the cloud.
  • Communicate with IDR developers to ask questions & get expert advice.
  • Plenty of time to experiment and build a proof of concept.
  • Over 100 articles to help you get started and learn about BuildVu.
  • An exceptional PDF to HTML converter that took over 20 years to build!

Start Your Free Trial