XML Workflow, A New Direction

A couple years ago, or just slightly more, I wrote about my workflow for extracting game information captured in Word. It’s kind of long:

Type (or copy and paste) into Word;
Convert Word files to ‘Filtered HTML’;
Fix character encoding;
Convert to XHTML;
Convert to XML closer and closer to the problem domain (game elements) using a series of XSLT scripts.

Once I’ve got the information encoded I can do other transformations to get my actual goal products:

Machine-generated diagrams:
- Build a hierarchical model for each game element that has or is a prerequisite;
- Convert that hierarchical model into DOT format (GraphViz input file; I’ve written about visualization using GraphViz before);
- Render the DOT files into PNG and SVG format, giving me diagrams I can redraw (GraphViz is powerful, the output isn’t always suitable for inclusion in my books) showing the relationships between game elements.
PDFs:
- Convert XML files (created as above, but using ‘book markup’) to LaTeX;
- Convert LaTeX to PDF (this can incorporate diagrams redrawn as described above.
Index and analysis files; I sometimes create spreadsheets containing…
- spell summary information;
- master spell lists for all classes (and domains and bloodlines and patrons…);
- summary monster stats;
I also sometimes create new Word files containing aggregated or reformatted content.

This has proven fairly effective over the last few years, but I think it’s time for a change. Word has a ‘WordProcessingML’ that represents, to a fairly large degree, the internal memory representation of a document. There is a great deal of information there that can be discarded, and some ‘internal Wordisms’ I’ll need to work around, but I think this can get me past some niggling translation and encoding difficulties I’ve been having.

The new workflow will probably look much like:

Type (or copy and paste) into Word;
Convert Word files to WordProcessingML;
Convert to XML closer and closer to the problem domain (game elements) using a series of XSLT scripts.

This doesn’t seem like it saves me a lot of steps, but in reality it does. The “Fix character encoding”, “Word -> Filtered HTML”, and “Filtered HTML -> XHTML” do useful work, but all three stages introduce some annoying data artifacts I need to work around. The new workflow should not only reduce the number of stages (the initial bullet points), but should make the processing after that much simpler.

3 Comments

XML Workflow, A New Direction

Related

3 Comments

Leave a Reply Cancel reply