Often during the A-Z Blog Challenge I use ‘M Day’ as a ‘Midpoint Check-In’, listing the posts up to this point. This time I think I’ll summarize my current design position.
Overall, I have a sense of where I’m going and how to get there, but there is much that is deliberately “to be determined”.
- The data model has identified some major entities and I have a sense of what they will look like, but the actual definitions are pending. Despite that, I am likely to have a built-in model so I’ve got something to start with, that the user can modify later if desired.
- Document is nominally the central entity type, since those represent something a user could actually have in hand. This started with books (or rather, ebooks) but quickly grew to incorporate other types of ‘document’.
- Documents may have subdocuments. Whether a subdocument is a section of an ebook (chapter of a book, essay in a journal, article in a magazine) or a song on an album (which is one way to model this, not the only way possible), it is possible for one document to have a set of documents (probably ordered) within it.
- File is ultimately what prompted this exercise, since managing files is what I really need to do and ‘Documents’ are just what the files contain. This is probably the only entity that is likely to have some hard definition… and even then those hard definitions are only for what I need in order to store the object. This entity can be expanded on as needed.
- Files may have subfiles. I know I want to support files contained within an archive of some sort (ZIP file, say), but there could be other cases.
- Creator is a ‘Person-type’ entity involved in the creation of a Document or (potentially) File. These have roles such as ‘Author’, ‘Artist’, ‘Editor’, and so on.
- Character is ‘Person-type’ entity ‘from inside a Document’, and while they exhibit many of the same attributes as a Creator I probably want to keep them separate. It is possible a ‘Document Character’ connects also to a Creator with the ‘Actor’ role.
- Organizations are not-Person entities involved in the creation or production of a Document, such as a publisher. This is a lower-priority entity for implementation, but as it’s a relatively simple one I will probably include it.
- Series is a collection (optionally but not necessarily ordered) of Documents.
- Series may have subseries. For instance, the Midkemia series by Raymond Feist has many subseries in it (Riftwar, Empire, etc.).
- A Document may be part of multiple Series. The Discworld series by Terry Pratchett has several sets of related stories (The Watch books, The Witch books, the Death books, and so on).
- Documents in multiple Series do not necessarily have the Series connected, except for this relationship. The Secret Wars crossover event from the Marvel Universe could have a Series to identify all the issues from all titles involved in the crossover, despite them otherwise not intersecting. That is, the Secret Wars Series could have Documents from each of the Fantastic Four, Avengers, X-Men, and Spider-Man Series.
- Similarly, subdocuments can be members of Series. For example, ‘The Wizards Three’ was a recurring, if sporadic, column in Dragon Magazine. Each issue of the magazine would be part of the “Dragon Magazine” series, and each of the articles (subdocument of the issue it’s in) could be a member of the ‘The Wizards Three’ Series. It may be that ‘The Wizards Three’ is itself a subseries of ‘Dragon Magazine’ since it was a recurring column in that magazine. I have seen other columns move from one magazine to another, though, and those would not be modeled well this way.
- Tag is a user-defined value attached to another entity. These should be lightweight markers (basically just a label and optionally a ‘description’ that explains what the tag means). Any entity type should be allowed to take Tags, but I haven’t decided if Tags are shared between entity types (i.e. Document and File can both have Tags, but the Tags might not be shared between them — even when they have the same label). I suspect an entity might have multiple tag-like attributes (for my RPG books I might use Tags in the regular sense, but also use something like Tags for rules system, setting, content types, and so on).
- I don’t know if it makes sense for Tags to have subtags. Part of me says no, these should be lightweight attributes… but another part says yes, because they can be grouped: game system broken down by edition, setting by region, ‘monster’ content type by type of monster (‘bestiary’ has content type/tag ‘monster’, but Ye Big Booke of Dragones has “monster / dragon” content tag/subtag).
- (Not actually an entity) The connectors between entities can have attributes also. This is where the role of a Document’s Creator is identified (the list of roles is defined in a table elsewhere).
- Document is nominally the central entity type, since those represent something a user could actually have in hand. This started with books (or rather, ebooks) but quickly grew to incorporate other types of ‘document’.
- The graphical user interface has more of a strategy than a design at this point. The core will be a framework in which the rest works, but as with the data model, much is yet to be defined concretely. Things are still evolving as I work with my existing tools.
- Displays and editors are configurable.
- Each entity type and connector can have multiple displays and editors possible. I haven’t decided if the choice of display/editor is to be chosen automatically at runtime by rules built into the definitions (i.e. “if this is an image, show the image-specific view”) or if the user should specify what view they want: “I’m looking for a particular magazine article, show me the list of magazines with this keyword — or with subdocuments with this keyword — and the list of subdocuments (articles) within the selected issue”.
- Ideally these will not be fixed in place, or even have fixed locations with dynamically-assigned panes. I’d prefer to have them dockable so I can move and position them as I like… including having them undocked so I can put them on another monitor.
- Certain implementation characteristics are pretty much set, unless I think of something better. Performance is a significant consideration here, given the volumes of data I’m looking at.
- It will be possible to have multiple instances of the application running concurrently. It might be nice to have multiple instances open on the same library, but absolutely it will be possible to be working in more than one library at once.
- Files will be stored in the repository in a metadata-agnostic fashion. Changing information about a Document will not change the associated files on disk.
- A single Document may have multiple Files (a stock art collection might have a file for each image, plus one for the license).
- A single File may be associated with multiple Documents (such as a license file applying to multiple stock art collections). Ideally there will be no duplicated files. (the ‘original file name’ is probably an attribute on the connector between Document and File).
- Database and file storage can be on separate drives. The metadata database should be as fast as feasible, but the actual files could be on much slower drives: put the database on an SSD, and the documents on a really big and slow USB HDD.
- A single library can have multiple file repositories, such as spreading the content across multiple drives because of the amount of storage used, or to put content you expect to use a lot on a faster drive.
- This is probably the one situation I can think of that could cause the repository copy of a file to move: shifting content from one drive to another to manage it better. I don’t yet know if this is done automatically as a file is more frequently accessed, or by a rule configured by the user (‘put all anime on that drive’) or even manually (‘I know I want to use this a lot, put it somewhere I can get at it quickly’). I can even imagine a combination, where there are standard rules to manage it normally, but the user might decide ‘all files like this are to be moved to this other drive’, where ‘this’ could be based on some rule or another, or even from file attributes (“I’m replacing that drive, time to move everything to another one”).
- The core engine is almost certain to be its own layer, apart from any front end I care to put on it.
- I’ll definitely want a command line or other programmatic interface for batch operation. I haven’t talked about it much because for me, it’s relatively clear what’s needed (i.e. “expose pretty much everything the engine can do”).
- I’ll definitely want a graphical user interface for the user to interact with. Even I don’t want to do everything from the console, I want a more convenient interface.
- I may want a web-based interface, a web application I can interact with remotely. It’s entirely possible I’ll be a simple (even simplistic) one as an exercise, but I likely won’t go to the trouble of making (and securing) a ‘good one’. I don’t truly have a need for this.
- I may want a RESTful API or the like, for much the same reason as I’d want a command line version. As the web app version, I may create one as an exercise but likely wouldn’t go to the trouble of making a ‘good one’. I don’t truly have a need for this.
In some cases I might have things that are different enough to mandate separate libraries. Others, I hope won’t cause separate libraries. I’d really prefer to keep the number of libraries down, partly so I don’t have to switch between them as often and partly to minimize the time I need to spend keeping them configured consistently.
- Number of entries in the library: Hopefully won’t cause a split. I want to be able to comfortably deal with hundreds of thousands of titles (and potentially more than that — if I track subfiles, archives could increase things by at least an order of magnitude).
- Storage space requirements: Hopefully won’t cause a split. If I need more space I can add a new repository location and start filling that up.
- Nature of the content: Hopefully won’t cause a split, in that I want to be able to mix file types and topics in a single library (my ‘game development’ library should be able to contain image resources, sound resources, technical books, and even training videos). Still, I might choose to split the libraries, such as splitting ‘roleplaying games’ from ‘technical’ from ‘comic books’.
I think I now have a pretty good sense of where this is going. There’s still a lot of thinking and a lot more work to be done, but I’m pretty pleased with how things are shaping up.