dimelab dimelab: shrinking the gap between talk and action.

Markup Metalanguage

Markup has been around about as long as Unix, starting long years ago with SGML. A big breakthrough came with the explosion of the WWW, when HTML came into widespread use. Deficiencies of SGML (mainly excessive generality) led to a revamping that gave us XML, and nowadays XML-based markup is ubiquitous. Only a fool would think to challenge the hegemony of XML, right?

Well, fools are celebrated here at dimelab! When we kicked off this misguided effort back at the dawn of the 21st century, our obsession with general-purpose representation was still a-simmer, but we took for granted that whatever the details would be, the broad picture was sure to include XML. So we started digging in to XML with certain applications in mind. We were not particularly impressed with what we found.

We found the then-prevalent use of document type descriptions (DTD) somewhat less than elegant. We found the DOM API clumsy and awkward, and the SAX API rather low-level. We could not identify a coherent semantics for attributes, and we were discouraged by the reliance on hierachy as the principal structuring mechanism in representation. Also we were bothered by the fact that the concatentation of two XML files was not itself a valid XML file, since only a single root element is permissible. This is the sort of thing that make old-time Unix users grind their teeth. So we grumbled and swore, and started to cope.

First thing was to tackle the API issue. From an application-oriented programmer's perspective, DOM is a dog's breakfast. It isomorphically reproduces the XML file as a data structure, in which the elements literally represent the XML elements. No concession whatsoever to application-specific representation. Ugh. SAX at least doesn't waste your time reconstructing what you've already got, namely an XML file. It provides notifications of low-level parse events, and leaves you to make sense out of it all. Naturally, suckers for notification that we are, we recognized that with some support infrastructure, we could use the SAX notifications to build our own application-specific object-oriented data structure in the course of the parse. We just needed a sensible idiom and an API which, retaining the event-driven flavor of SAX (because we like things that way) also conceded the need to deal in application-specific objects as opposed to raw text or XML-oriented objects. The result of this enquiry was a new XML API which allowed for object-oriented processing of markup. We filed a patent application No. 11/286914 for object-oriented processing of markup, which is still pending. For your reading convenience, here's a PDF of the published application (1.6M PDF).

Our mucking about with APIs also clarified our sense of the deficiencies of XML. We found the limitation to a single root element quite odious. We found the verbosity of tagging annoying and inefficient. And we felt that some sort of attribute semantics should be provided, and issues of nullity, singularity, and plurality in attributes should be explicitly addressed. So we did some language hacking, too.

The limitation to a single root element is easily dispensed with. Of course any application can enforce that limitation where appropriate but we cannot accept that this a feature in the meta-language context of a general purpose markup syntax. The issue of tag verbosity is likewise easy to fix; we adoped a C-style curly bracket tagging syntax that dispenses with explicit naming of end tags, and we use a bar character to delimit the tag head from the tag body. So a tagged element like <a href='http://foo.com'>foo</a> becomes {a href 'http://foo.com'|foo}.

Finally consider the semantics of attributes. We like to think in terms of subjects and predicates. We can think of a tagged element as an implicit subject. Then the key of the attribute can correspond to the relation (verb) in a predicate. And the value of the attribute can correspond to the object in a triple-like predicate. Of course, we also like to consider characteristic-type predicates which lack an object, and predicates harboring multiple objects. So we extended our markup syntax to permit these variations on attributes as predicates. Following the tag identifier, optional predicates may appear, separated semicolons. Each predicate consists of a relation and optional objects. The objects, if any, are separated by commas.

Our revised markup meta language has been implemented and we thought it was original enough for a patent. The patent office agreed, and U.S. Patent 7,698,633 for Markup Metalanguage duly issued. Here's a PDF of the published patent (1.8M PDF). However our debt of gratitude to the XML community is such that we are dedicating this patent to the public domain. Likewise, the API patent application above, should it issue. We will provide reference implementations when we get a few moments to spare. Have at, XML folks!