Transformations

My publishing workflow covers many stages, which I’ll document here.

One of my quirks is that, as much as possible, I like transformations to be named in a monotonically-increasing manner. Once I get away from the source files (DocX and XlsX) and before I reach my output processing, I like the stages to be named in a way that sorting will tell me the processing order.

Weird quirk perhaps, but it does make my life easier.

Transformation Stages

The data gets transformed in a series of stages, with some branches to create interim output (index files, link graph files to use in creating prerequisite diagrams, etc.).

StageTypeDescription/Notes
DocXInputMy documents start as Word files (DocX extension). These are ZIP files containing XML.
XlsXInputSome helper files are stored as Excel workbooks (XlsX extension). Also are ZIP files containing XML.
BaseSingleFirst stage of transformation, content is extracted from ZIP (DocX or XlsX) files, to be processed using XSLT.
CleanSingleWord files contain many elements I don’t care about. This stage discards those elements.
CleaningSingleProcess Excel workbook to create a file to be used in denaming — cleaning — Product Identity from my data set.
DenameSingleReplace Product Identity (proper nouns, mostly) with different strings. Also correct common errors typically introduced as (de)hyphenation problems. For instance, if ‘full-round’ spans the end of a line, copying and pasting into Word often will strip the hyphen and leave me with ‘fullround’. I can fix those here also. System-specific rules, Product Identity is not shared between systems.
GroupSingleRebuild the document hierarchy. Word stores blocks all at the same level: Heading 1 and Paragraph are shown hierarchically in the navigation pane in Word, but are stored as siblings in document.xml. Similarly, nested list elements all are stored as siblings (in the same level as Heading 1 and Paragraph). This stage rearranges those. Also, cracks document and object headings (see Markup) and starts building metadata.
HelperSingleTaxonomy and Section 15 information files are created here and prepared for use in later stages. Section 15 is global (used by all systems), Taxonomy is system-specific.
IDSingleBuild object ID strings and copy information from taxonomy. Local metadata overrides inherited data. That is, if the taxonomy says to categorize an object one way, explicitly setting that field in the object will overrule the taxonomy.
IdKeySinglePrepares content for integration into the Index file. This mostly means discarding all elements that are not game objects (or business domain objects) and building search keys for each object.
IndexAggregateGather search information for all game objects and build an index to be used to resolve references. Domain-specific file (may have several per system).
LexiconAggregateCounts all words by source, ignoring all words that appear in a dictionary. Outputs an Excel XML file I can turn into a Excel workbook for examination. Entries I find here will be copied to a file that will be used to build the Cleaning file above. System-specific file.
LinkSingleResolve all implied links: mod-ref and alt-ref for things like subdomains and archetypes (mod-ref is the object being modified, alt-ref is a contextual aspect such as race for a racial class archetype) and prerequisites. Uses the Index file (master-sys-all.index.wml) but does not have a Make dependency.
LinkTextSingleSearches body text for implied links based on convention (feat names and skill names are capitalized, so seeing ‘Fly’ in the middle of a sentence likely means the Fly skill, while ‘fly’ might be a verb or refer to an insect; magic spells and magic items are typically italicized, and so on). Uses the Index file but does not have a Make dependency.
ListAggregateCreates an Excel XML file (to be converted to Excel workbook) identifying all the objects in the domain. Includes some other useful content such as all prerequisites, all mod- and alt- relationships, and so on.
LinkGraphAggregateIdentifies all potential nodes and edges based on object prerequisites, and bundles for processing into DOT (GraphViz) files and presentation as prerequisite diagrams.
MasterAggregateGathers all objects that could be packaged for publication. This means all elements that have IDs (all game objects, supplementary objects that add to these, and all document objects with IDs) and their child elements. Domain-specific file.
PackageSinglePrepares the document for output: gathers information from the Master file for the domain being built, including supplementary information, and arranges for conversion to output formats.
PicsAggregateProcesses diagrams defined in the LinkGraph file and outputs an XML manifest and a large number of DOT files. Classifying as ‘aggregate’ because it is built specifically from the LinkGraph aggregate file, but actually is a single-input transformation.
DOTOutputThousands of DOT files, to be processed by GraphViz to produce images and diagram placement details.
JSONOutputOne file per diagram created, defining how the nodes and edges are arranged. To be consumed (Perl script?) to automatically generate files in other formats (PGF/TikZ in particular).
PNGOutputRaster graphic file showing the prerequisite diagram for the game object.
SVGOutputScalable Vector Graphics file showing the prerequisite diagram for the game object. Nodes are hyperlinks to the matching object diagram.
PubHTMLOutputPreparing the Package file for transformation to HTML. Future work. Likely will consume Pics manifest file so either PNG or SVG can be used to display prerequisite diagrams.
PubTexOutputPreparing the Package file for transformation to LaTeX. Includes things like reworking tables to align with LaTeX needs. Likely will consume Pics manifest file and automatically-generated PGF/TikZ diagrams.
PubWordOutputPreparing the Package file for transformation to WordML. Future work. Haven’t decided if this should automatically include prerequisite diagrams, that could get immense.
TeXOutputLaTeX file, to be compiled into PDF.
PDFOutputFinal output file for documents, ready for print or reading.
WordMLOutputWord XML file containing output document, to be converted to a DocX file for distribution.

File Naming Conventions

With the exception of the Input and Output files, all processing above is done using XML. By convention, files are named by ‘stage.wml’ (compact XML — no extraneous spaces or line breaks), and can be converted to ‘stage.xml’ (‘pretty printed’, with white space and line breaks) to make it readable. All processing is done using the WML files, the XML files are a courtesy to the developers when troubleshooting code.

For example, when processing egd-draconic-bloodlines.docx, after the document hierarchy is built I have a file called ‘egd-draconic-bloodlines.group.wml’. This file is hard for a human to read, so I run it through xmllint to get ‘egd-draconic-bloodlines.group.xml’.

Aggregation files are named either by data set or by domain. Each aggregate stage has ‘files-sysset‘ and ‘master-sysdomain‘ files. The first aggregates data for the system data set, the latter aggregates data for the system domain. That is, ‘files-pf1-prd.index.wml’ contains the aggregate data for the PRD set, and ‘master-pf1-all.index.wml’ contains the index for the entire Pathfinder 1e data domain (‘master-pf1-pzo.index.wml’ excludes third-party content, but does include the PRD, PZO, and supporting content).

Other file types (XSLT, Word, Excel, etc.) use file extensions consistent with their standards.

Workflow Diagram

This diagram has seen many changes over the years, but should still be fairly recognizable.

This is the most recent version, from October 3, 2024. Previous versions, with varying amounts of supporting documentation, can be seen at

Back to Top