A couple years ago, or just slightly more, I wrote about my workflow for extracting game information captured in Word. It’s kind of long:
- Type (or copy and paste) into Word;
- Convert Word files to ‘Filtered HTML’;
- Fix character encoding;
- Convert to XHTML;
- Convert to XML closer and closer to the problem domain (game elements) using a series of XSLT scripts.
Once I’ve got the information encoded I can do other transformations to get my actual goal products:
- Machine-generated diagrams:
- Build a hierarchical model for each game element that has or is a prerequisite;
- Convert that hierarchical model into DOT format (GraphViz input file; I’ve written about visualization using GraphViz before);
- Render the DOT files into PNG and SVG format, giving me diagrams I can redraw (GraphViz is powerful, the output isn’t always suitable for inclusion in my books) showing the relationships between game elements.
- PDFs:
- Convert XML files (created as above, but using ‘book markup’) to LaTeX;
- Convert LaTeX to PDF (this can incorporate diagrams redrawn as described above.
- Index and analysis files; I sometimes create spreadsheets containing…
- spell summary information;
- master spell lists for all classes (and domains and bloodlines and patrons…);
- summary monster stats;
- I also sometimes create new Word files containing aggregated or reformatted content.
This has proven fairly effective over the last few years, but I think it’s time for a change. Word has a ‘WordProcessingML’ that represents, to a fairly large degree, the internal memory representation of a document. There is a great deal of information there that can be discarded, and some ‘internal Wordisms’ I’ll need to work around, but I think this can get me past some niggling translation and encoding difficulties I’ve been having.
The new workflow will probably look much like:
- Type (or copy and paste) into Word;
- Convert Word files to WordProcessingML;
- Convert to XML closer and closer to the problem domain (game elements) using a series of XSLT scripts.
This doesn’t seem like it saves me a lot of steps, but in reality it does. The “Fix character encoding”, “Word -> Filtered HTML”, and “Filtered HTML -> XHTML” do useful work, but all three stages introduce some annoying data artifacts I need to work around. The new workflow should not only reduce the number of stages (the initial bullet points), but should make the processing after that much simpler.
Pingback: Yet Another Grand Reorganization | In My Campaign - Thoughts on RPG design and play
Pingback: Z-A Challenge 2016 Index | In My Campaign - Thoughts on RPG design and play
Pingback: I Speak Parsertongue – KJDavies