Epistemic Systems

[ cognition – information – knowledge – publishing – science – software ]

Books markup with XHTML5

leave a comment »

To my embarrassment I see that it’s been nearly three years (!) since I last posted on here, and an eventful time it’s been. In 2012 I returned to STM journal publishing, when I was privileged to learn how BioMed Central (BMC) has made the Open Access model work so well. So well, indeed, that parent company Springer Science+Business Media decided to absorb BMC into the company as a brand. (We shall have to wait and see what the merger of most of Macmillan with Springer will mean.) But for the past six months or so I’ve been working with Manchester University Press – the third largest such press in England after those at Oxford and Cambridge – on the development of a new books production workflow. We’re not quite there yet, but trials are going well. I figure this is something that could be of interest to quite a few people so here I’ll outline some of the thinking behind the new process.

Of course, XML is at the heart of things. XML standards and processes in STM journal publishing are by now quite mature – Elsevier’s journal production operations, for example, have been based on structured markup since even before the advent of XML in 1998 (if memory serves) – and the gotchas, if that’s the word, have by now generally been exposed and ironed out. The situation regarding books markup, however, is rather less settled. Markup options we considered included DocBook, TEI, NLM JATS (in its BITS variant for books), and HTML5. Central to our decision-making were the goals of expressiveness and extensibility. By expressiveness I mean the ability to capture the rich structural and semantic characteristics of content in a way that maximizes future downstream processing possibilities. (Who knows what functionality we might wish to deliver to content users in future?) By extensibility I mean the ability to develop the markup scheme as new requirements become apparent. And as if that wasn’t enough, we also sought relative simplicity, to make life as easy as possible for publishing staff as well as MUP’s suppliers.

In the end, after a lengthy content analysis phase, a certain amount of experimentation and some deliberation, we decided that HTML5, suitably XML-ified, offered the best combination of virtues relative to MUP’s needs. The (X)HTML5-based MUP markup specification is currently at the Version 1.1 stage, and trials with suppliers are under way to identify issues and areas where refinement is needed. Already, however, we have a markup language that handles well the sort of content for which MUP is noted, namely monographs in the humanities, and verse and drama editions.

Working with (X)HTML5 requires something of a gestalt switch if like me you are used to seeing markup, as regards the business of capturing overall content structure and semantics, primarily in terms of elements as opposed to attributes. HTML’s elements are of very broad potential applicability, and are more oriented to the capture of visual layout requirements than to the encoding of abstract structure and semantics. Achieving the latter means making full use of the class attribute (and to a lesser extent the id attribute), which in HTML5 can be applied to any element. The <article> and <section> elements that are new to HTML5, as well as attribute names prefixed by ‘data-‘, provide additional expressive power.

In subsequent posts I’ll describe some of the details of the MUP markup scheme, but to give a flavour here is how top-level book structures are handled:

<?xml version=”1.0″ encoding=”UTF-8″?>
<!DOCTYPE html>
<html lang=”en” xmlns=”…”>
   <link rel=”stylesheet” type=”text/css” href=”… [CSS stylesheet name] …” />
   <title>Smith and Jones | My First Sample MUP Book | Manchester Medieval Sources Series</title>

<article class=”book” id=”[MUP project identifier]” data-doi-book=”[Book DOI]”>
<section class=”book-meta”>
   <section class=”book-series-info-sec”> … </section>
   <section class=”book-title-page”> … </section>
   <section class=”book-pub-rights”> … </section>
<section class=”book-front” id=”ABCD0000-front” data-doi-book=”[Book DOI]”>
   <section class=”book-dedication”> … </section>
   <section class=”book-pref-quotes”> … </section>
   <section class=”book-toc”> … </section>
   <section class=”book-inclusion-lists”> … </section>
   <section class=”book-series-ed-preface”> … </section>
   <section class=”book-preface”> … </section>
   <section class=”book-contributors”> … </section>
   <section class=”book-acks”> … </section>
   <section class=”book-abbrevs”> … </section>
   <section class=”book-permissions”> … </section>
   <section class=”book-chronology”> … </section>

<section class=”book-body”>
<article class=”chap” id=”chap-0″ data-chap-num=”-1″ data-doi-chap=”[Chapter- level DOI]”>
   <section class=”chap-body”> … </section>
   <section class=”chap-footnotes”> … </section>
   <section class=”chap-endnotes”> … </section>

<section class=”book-back” data-doi-back=”[Book DOI]”>
   <section class=”book-glossary”> … </section>
   <section class=”book-appendix”> … </section>
   <section class=”book-bibliography”> … </section>
   <section class=”book-index”> … </section>


Written by Alex Powell

March 9, 2015 at 3:21 pm

Posted in Uncategorized

Tagged with , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: