XML Workflow, A New Direction

A-Z 2016 "X"A couple years ago, or just slightly more, I wrote about my workflow for extracting game information captured in Word. It’s kind of long:

  • Type (or copy and paste) into Word;
  • Convert Word files to ‘Filtered HTML’;
  • Fix character encoding;
  • Convert to XHTML;
  • Convert to XML closer and closer to the problem domain (game elements) using a series of XSLT scripts.

Once I’ve got the information encoded I can do other transformations to get my actual goal products:

  • Machine-generated diagrams:
    • Build a hierarchical model for each game element that has or is a prerequisite;
    • Convert that hierarchical model into DOT format (GraphViz input file; I’ve written about visualization using GraphViz before);
    • Render the DOT files into PNG and SVG format, giving me diagrams I can redraw (GraphViz is powerful, the output isn’t always suitable for inclusion in my books) showing the relationships between game elements.
  • PDFs:
    • Convert XML files (created as above, but using ‘book markup’) to LaTeX;
    • Convert LaTeX to PDF (this can incorporate diagrams redrawn as described above.
  • Index and analysis files; I sometimes create spreadsheets containing…
    • spell summary information;
    • master spell lists for all classes (and domains and bloodlines and patrons…);
    • summary monster stats;
  • I also sometimes create new Word files containing aggregated or reformatted content.

This has proven fairly effective over the last few years, but I think it’s time for a change. Word has a ‘WordProcessingML’ that represents, to a fairly large degree, the internal memory representation of a document. There is a great deal of information there that can be discarded, and some ‘internal Wordisms’ I’ll need to work around, but I think this can get me past some niggling translation and encoding difficulties I’ve been having.

The new workflow will probably look much like:

  • Type (or copy and paste) into Word;
  • Convert Word files to WordProcessingML;
  • Convert to XML closer and closer to the problem domain (game elements) using a series of XSLT scripts.

This doesn’t seem like it saves me a lot of steps, but in reality it does. The “Fix character encoding”, “Word -> Filtered HTML”, and “Filtered HTML -> XHTML” do useful work, but all three stages introduce some annoying data artifacts I need to work around. The new workflow should not only reduce the number of stages (the initial bullet points), but should make the processing after that much simpler.

3 Comments

  1. Pingback: Yet Another Grand Reorganization | In My Campaign - Thoughts on RPG design and play

  2. Pingback: Z-A Challenge 2016 Index | In My Campaign - Thoughts on RPG design and play

  3. Pingback: I Speak Parsertongue – KJDavies

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top