Archive for March 2015
As mentioned in the previous post, I’ve recently been involved in the development of a markup scheme for Manchester University Press’s books publishing programme. That programme is especially strong in the humanities and literature, with the latter including plays and verse.
Having decided that (X)HTML5 was the way to go, the question arose of how to mark up poetry. An obvious first thought is that stanzas (verses) amount to paragraphs, so it seems natural to think in terms of something like this:
<p>Verse 1 line 1<br />
Verse 1 line 2<br />
Verse 1 line 3<br />
Verse 1 line 4</p>
<p>Verse 2 line 1<br />
Verse 2 line 2<br />
Verse 2 line 3<br />
Verse 2 line 4</p>
Blog posts like this one seemed to support this line of thinking. However, further investigation gave pause for thought. In particular, the second reply to this question on stackoverflow seemed particularly relevant, for lines of poetry may be indented in essentially unlimited and arbitrary ways. To accommodate this inconvenient fact it is necessary to mark up each line as a block element in its own right.
In the end what we ended up with was as shown in the example below. We obtain the control we require by applying a CSS style attribute directly to each line that needs to be indented.
To my embarrassment I see that it’s been nearly three years (!) since I last posted on here, and an eventful time it’s been. In 2012 I returned to STM journal publishing, when I was privileged to learn how BioMed Central (BMC) has made the Open Access model work so well. So well, indeed, that parent company Springer Science+Business Media decided to absorb BMC into the company as a brand. (We shall have to wait and see what the merger of most of Macmillan with Springer will mean.) But for the past six months or so I’ve been working with Manchester University Press – the third largest such press in England after those at Oxford and Cambridge – on the development of a new books production workflow. We’re not quite there yet, but trials are going well. I figure this is something that could be of interest to quite a few people so here I’ll outline some of the thinking behind the new process.
Of course, XML is at the heart of things. XML standards and processes in STM journal publishing are by now quite mature – Elsevier’s journal production operations, for example, have been based on structured markup since even before the advent of XML in 1998 (if memory serves) – and the gotchas, if that’s the word, have by now generally been exposed and ironed out. The situation regarding books markup, however, is rather less settled. Markup options we considered included DocBook, TEI, NLM JATS (in its BITS variant for books), and HTML5. Central to our decision-making were the goals of expressiveness and extensibility. By expressiveness I mean the ability to capture the rich structural and semantic characteristics of content in a way that maximizes future downstream processing possibilities. (Who knows what functionality we might wish to deliver to content users in future?) By extensibility I mean the ability to develop the markup scheme as new requirements become apparent. And as if that wasn’t enough, we also sought relative simplicity, to make life as easy as possible for publishing staff as well as MUP’s suppliers.
In the end, after a lengthy content analysis phase, a certain amount of experimentation and some deliberation, we decided that HTML5, suitably XML-ified, offered the best combination of virtues relative to MUP’s needs. The (X)HTML5-based MUP markup specification is currently at the Version 1.1 stage, and trials with suppliers are under way to identify issues and areas where refinement is needed. Already, however, we have a markup language that handles well the sort of content for which MUP is noted, namely monographs in the humanities, and verse and drama editions.
Working with (X)HTML5 requires something of a gestalt switch if like me you are used to seeing markup, as regards the business of capturing overall content structure and semantics, primarily in terms of elements as opposed to attributes. HTML’s elements are of very broad potential applicability, and are more oriented to the capture of visual layout requirements than to the encoding of abstract structure and semantics. Achieving the latter means making full use of the class attribute (and to a lesser extent the id attribute), which in HTML5 can be applied to any element. The <article> and <section> elements that are new to HTML5, as well as attribute names prefixed by ‘data-‘, provide additional expressive power.
In subsequent posts I’ll describe some of the details of the MUP markup scheme, but to give a flavour here is how top-level book structures are handled:
<?xml version=”1.0″ encoding=”UTF-8″?>
<html lang=”en” xmlns=”…”>
<link rel=”stylesheet” type=”text/css” href=”… [CSS stylesheet name] …” />
<title>Smith and Jones | My First Sample MUP Book | Manchester Medieval Sources Series</title>
<article class=”book” id=”[MUP project identifier]” data-doi-book=”[Book DOI]”>
<section class=”book-series-info-sec”> … </section>
<section class=”book-title-page”> … </section>
<section class=”book-pub-rights”> … </section>
<section class=”book-front” id=”ABCD0000-front” data-doi-book=”[Book DOI]”>
<section class=”book-dedication”> … </section>
<section class=”book-pref-quotes”> … </section>
<section class=”book-toc”> … </section>
<section class=”book-inclusion-lists”> … </section>
<section class=”book-series-ed-preface”> … </section>
<section class=”book-preface”> … </section>
<section class=”book-contributors”> … </section>
<section class=”book-acks”> … </section>
<section class=”book-abbrevs”> … </section>
<section class=”book-permissions”> … </section>
<section class=”book-chronology”> … </section>
<article class=”chap” id=”chap-0″ data-chap-num=”-1″ data-doi-chap=”[Chapter- level DOI]”>
<section class=”chap-body”> … </section>
<section class=”chap-footnotes”> … </section>
<section class=”chap-endnotes”> … </section>
<section class=”book-back” data-doi-back=”[Book DOI]”>
<section class=”book-glossary”> … </section>
<section class=”book-appendix”> … </section>
<section class=”book-bibliography”> … </section>
<section class=”book-index”> … </section>