Thursday, 5 April 2012

Adding Inline SVG and MathML to DITA

One interesting feature of HTML5 is its ability to render inline SVG and mathML markup to display 2D graphics, syntax diagrams, and equations. For example:
<html>
<body>
<h1>My first SVG</h1>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1">
stroke-width="2" fill="red" />
</svg>
</body>
</html>
This sort of markup works in most modern browsers. It's called inline SVG because the SVG tags are embedded directly within the HTML code, in contrast to external SVG in which an SVG file is referenced in exactly the same way as you would a GIF or JPEG:
<img src="images/myDiagram.svg" alt="An external SVG graphic"/>
My plan for the DITA to HTML5 plugin is to pass inline SVG or mathML markup directly through from DITA topics to HTML5. Unfortunately, getting inline SVG and mathML to work in DITA is not straightforward. In fact, I've just spent the last two days doing some specialization, the mysterious science of customizing the set of tags that authors can use in DITA, in order to get it to work. The reason that native SVG and mathML support has never been included in the DITA Open Toolkit seems to be that there simply hasn't been much demand for it (and it was difficult to display in older browsers). SVG is still the only vectorial graphics format supported by DITA and hopefully one day it'll be fully integrated into the DITA Open Toolkit. My main sources of information about specialization have been Eliot Kimber's excellent DITA Configuration and Specialization tutorial and Introduction to DITA by Jennifer Linton and Kylene Bruski. Specialization can be used to modify DITA's original set of elements and attributes in several ways:
  • If you don't need a particular domain (a related set of tags, for example, the User Interface domain), removing it completely so that authors no longer see any of the domain's tags in the list of available elements.
  • Modifying the properties of particular tags, for example, so that <p> must contain plain text only and none of the inline formatting tags like <b> or <i> that are normally available.
  • Creating new attributes for existing tags.
  • Adding new custom domains.
In DITA parlance, these techniques are called respectively "Document Type Shell", "Topic Constraint", "Attribute Specialization", and "Element Domain Specialization". Conclusion: you don't have to be a geek to specialize, but it certainly helps! We're going to be doing Element Domain Specialization. To be honest, though, following Eliot's tutorial was a lot easier than I anticipated. Using oXygenXML, DITA Open Toolkit 1.5.3 and a few articles I found on Google, I got inline SVG and mathML working without too much trouble. I suspect I'll have more problems packaging it as a plugin so that others can use it, but that's for later. And there are still a lot of things I don't understand. For now, I'm going to switch to technical author mode to describe how to implement the specializations.  

Preparing a Test Environment

  1. Copy the entire {dita-ot-root}/dtd/technicalContent folder (where {dita-ot-root} is the root folder of your DITA Open Toolkit installation) to a temporary folder.
  2. Create a new DITA concept topic and change the DOCTYPE line to point to the concept.dtd in your temporary folder, for example:
    <!DOCTYPE concept SYSTEM "C:/temp/technicalContent/dtd/concept.dtd">
    Note: If your editor adds a PUBLIC identifier as well as or instead of a SYSTEM identifier when it creates a new topic, I would recommend removing it, as a PUBLIC identifier takes precedence over the SYSTEM one and your topic will validate even if the SYSTEM identifier is wrong or a problem occurs in the specialization files.
  3. Validate the topic to check that the DTDs in your working folder are being used.
  4. Save the topic with a .dita or .xml extension to any folder.

Adding Inline mathML support to DITA

Domain specialization requires you to create two files, a .mod (module) and a .ent (entity), then  update the DTD to reference them. This example only shows the concept DTD, but you'd need to do it to the other topic types' DTDs too (there must be a way of doing it to the base DTD, ditabase.dtd, so that it works for all topic types, but I couldn't figure that out).
  1. Copy and paste this .mod file (which I've taken from a specialization article I found) and save it in your temporary technicalContent/dtd folder as mathmlDomain2.mod.
  2. Copy and paste this .ent file and save it in the technicalContent/dtd folder as mathmlDomain2.ent.
  3. Edit the concept.dtd file in your temporary folder and make the following changes:
    • Add these lines to the bottom of the DOMAIN ENTITY DECLARATIONS section:
      <!ENTITY % math-d-dec SYSTEM "mathmlDomain2.ent">
      %math-d-dec;
    • In the DOMAIN EXTENSIONS section, add the lines:
      <!ENTITY % foreign "foreign | %math-d-foreign;">
      <!ENTITY % unknown "unknown | %math-d-unknown;">
    • In the DOMAINS ATTRIBUTE OVERRIDE section, add the line:
      &math-d-att;
    • In the DOMAIN ELEMENT INTEGRATION section, add the following lines:
      <!ENTITY % math-d-def SYSTEM "mathmlDomain2.mod">
      %math-d-def;
    Save the changes and validate your concept topic to check that you haven't messed things up.
That's it! Position the cursor within a <p> element in your concept topic and you should now see new elements like <equation> and <math> in the list of available elements. To test it on something meaningful, you can use the following sample code:
<math type="presentation">
   <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"
    display="block">
    <mml:semantics>
     <mml:mrow>
      <mml:mrow>
       <mml:mi mathvariant="bold">a</mml:mi>
       <mml:mo>=</mml:mo>
       <mml:mfrac>
        <mml:mrow>
         <mml:mi mathvariant="bold">F</mml:mi>
        </mml:mrow>
        <mml:mi>m</mml:mi>
       </mml:mfrac>
       <mml:mo>=</mml:mo>
       <mml:mfrac>
        <mml:mrow>
         <mml:mi>q</mml:mi>
         <mml:mo>[</mml:mo>
         <mml:mi mathvariant="bold">E</mml:mi>
         <mml:mo>+</mml:mo>
         <mml:mfenced>
          <mml:mrow>
           <mml:mi mathvariant="bold">v</mml:mi>
           <mml:mi>X</mml:mi>
           <mml:mi mathvariant="bold">B</mml:mi>
          </mml:mrow>
         </mml:mfenced>
         <mml:mo>]</mml:mo>
        </mml:mrow>
        <mml:mi>m</mml:mi>
       </mml:mfrac>
      </mml:mrow>
     </mml:mrow>
    </mml:semantics>
   </mml:math>
  </math>
Note:  In my original post, I had wrapped <equation> tags around the above example. This was wrong. The equation element is meant to be used as the top-level element in a separate file and as a container for MathML markup. You would then include the markup in a topic using something like <xref type="eq" href="equation1.dita"/>. I have not been able to get this to work and it isn't even documented anywhere as far as I can tell.

Adding Inline SVG support

SVG integration follows the same basic procedure as MathML: create .mod (module) and .ent (entity) files, then update the DTD file.
  1. Copy and paste this .mod file and save it in the technicalContent/dtd folder as svgDomain.mod.
  2. Copy and paste this .ent file and save it in the technicalContent/dtd folder as svgDomain.ent.
  3. If you don't already have it, do a Google search for the svg11.dtd file and copy it into the technicalContent/dtd folder.
  4. Edit the concept.dtd file in your temporary folder and make the following changes:
    • In the DOMAIN ENTITY DECLARATIONS section, add the lines:
      <!ENTITY % svg-d-dec SYSTEM "svgDomain.ent">
      %svg-d-dec;
    • In the DOMAIN EXTENSIONS section, modify the line that you previously edited for mathML to:
      <!ENTITY % foreign "foreign | %math-d-foreign; | %svg-d-foreign;">
    • In the DOMAINS ATTRIBUTE OVERRIDE section, add the line:
      &svg-d-att;
    • In the DOMAIN ELEMENT INTEGRATION section, add the following lines:
      <!ENTITY % svg-d-def SYSTEM "svgDomain.mod">
      %svg-d-def;
    Save the changes and validate your concept topic again to check that everything works.
That's it! Position the cursor within a <p> element in your concept topic and you should now see the new <svg> element in the list of available elements. To test it on something meaningful, you can use the following sample code:

<svg>
   <svg:svg xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
      <svg:ellipse cx="300" cy="150" rx="200" ry="80" style="fill:rgb(200,100,50);
            stroke:rgb(0,0,100);stroke-width:2"/>
    </svg:svg>
</svg>
When you've finished testing, remember to;
  • Repeat the procedure for the Task and Reference topic types
  • Copy the contents of your temporary /dtd/technicalContent folder to your DITA Open Toolkit folder, replacing the original contents.

Conclusions and Next Steps

Just to prove it does work, here's a screenshot from oXygenXML showing a bit of inline SVG and mathML:

Inline mathML and SVG in a DITA Topic

The next steps are:
  1. To package this as a plugin so that anyone can add it to their DITA Open Toolkit
  2. To update my DITA to HTML5 transformation so that the inline mathML and SVG appear in my HTML5 topics.
More on that at a later date...

1 comment:

Eliot Kimber said...

Nigel,

I had earlier proposed built-in MathML and SVG domains for DITA 1.3 but I hadn't done any work on implementing those proposals. I'm doing so now--your post was just the nudge I needed.

My solution is pretty much just like yours: a specialization of <foreign> that then contains the foreign element (math or svg).

Note that the DITA for Publishers project has had a MathML integration for a while (dita4publishers.sourceforge.net), but the D4P math domain is more extensive, providing various containers for equations, where the content of the equation may be many things, of which MathML is one.

I'll be updating that vocabulary module to use the separate MathML domain module once it's officially proposed.

Cheers,

Eliot