Exciting new features in XSLT 3 for book publishers

Photo of Liam Quin.

Liam Quin has been working with digital typography and books since first encountering text formatting software in 1981. He worked at SoftQuad Inc. (an outgrowth of Coach House Press in the 1980s) and was later the head of XML work at the World Wide Web Consortium for many years. He is now a self-employed consultant. He's written here about XSLT 3 and its new, exciting features to give you a glimpse of what you can expect from his ebookcraft 2019 talk, XSLT 3 for EPUB and Print.

If you’ve been using XSLT 1 or 2 as part of your ebook production workflow, you’ve got some treats in store. If you haven’t, the changes in XSLT 3 will make you want to start.

XSLT, the Extensible Style Sheet Language (Transformations) from the World Wide Web Consortium (W3C), is actually a language for telling a computer how to transform information from one format to another. More formally for the technophiles, it’s a declarative (functional) tree transformation language with referential transparency and implicit dispatch, but for our purposes, here, today, it’s a language for processing XML and HTML and text documents. In other words, XSLT brings text processing out of the computer science laboratory and lets ordinary people work magic.

If you’ve used the original version of XSLT from 1998 you may have found it a little verbose and tedious, and very limited in scope: It was primarily designed to take a single XML document and produce a single document as output. Soon after, there came extensions to create multiple output documents, and since XSLT had a document() function to read files, the thinking had evolved by XSLT version 2.

Decorative image.

Technically, you could read text files in both XSLT 1 and 2, but without extensions you couldn’t easily process JSON or HTML5, and if a file was missing the transformation would bomb out, with no way to recover.

So now XSLT 3 takes it further. You can read from many different formats of file, and there are even extensions (widely implemented) that let you read from and write to zip archives. XSLT 3 adds support for JSON and for HTML 5. So the ecosystem is richer.

Decorative image.

For e-publishers, the ability of XSLT 3 engines to read from and write to zip archives means you can generate EPUB files directly, or even extract files from ebooks. You can also process binary files, so that it's possible to work out the size of a bitmap image in pixels, which is useful when embedding graphics into web pages or ebooks. And you can process text files a line at a time with fn:unparsed-text-lines().

Probably the single feature that’s the biggest game-changer for most people in publishing, the most fun, and that gives the largest reduction in costs, is the ability to call XSLT from within XSLT using the new fn:transform() function. This means you can easily build a collection of documents, such as making an EPUB 3 zip file, even if it involves running a separate transformation to create some or all of the components such as the spine or table of contents or index, without resorting to complex batch scripts or other programming languages. This reduces the number of programming or scripting languages you need in a project, reduces the number of components, controls the way the components interlock, and results in something easier to understand and maintain by the same person who works with the underlying XSLT transformations.

Alongside the ability to run transformations is the ability to parse strings as XML with fn:parse(), and to convert trees back to pointy-bracket XML or HTML with fn:serialize(). This means you can call fn:transform() to generate HTML for your chapters, then use fn:serialize() to then turn the HTML into strings that you can later put into the zip archive for an EPUB book.

Accumulators can simplify chapter and section numbering; this depends somewhat on the markup used for the input. There’s also improved support for formatting numbers according to a particular culture’s conventions, such as using commas or spaces to separate groups of digits in large numbers.

There are features that make the language more concise, some introduced in XSLT 2 and some of which are new in XSLT 3. For example, you can now use let along with if/else in XPath expressions, and, if expand-text="yes" is present on some ancestor, you can embed {$expressions} in messages and text.

XSLT has even grown up enough to have a package system. And functions are now first-class objects and can be passed as parameters, held in variables, and so forth.

By far the biggest impact of XSLT 3 on book publishers is that you can do more of the work with XSLT directly, reducing the skill-set needed, reducing the number of tools needed, reducing the interactions between libraries and programs and versions, and reducing time to market. That’s got to be worth an upgrade.

I’ll be talking about all this and more at ebookcraft 2019, so come along and meet the new XSLT.

If you'd like to hear more from Liam Quin and XSLT 3, register for ebookcraft on March 18 and 19, 2019 in Toronto. You can find more details about the conference here, or sign up for the mailing list to get all of the conference updates.