My last post looked at some abstractions that might do good things for me, but as with any design decision (or even design think-hard-about) it’s a good idea to validate it. Even if it’s not possible to prove correct, it can at least be possible to determine it’s not a fundamentally bad idea.
The model from the previous post does a few things for me. And a few things to me, I suppose you might say.
- Entity table has all entities represented, so the Entity_Link table can have enforceable foreign key references. Entity.rowid is monotonically increasing and unique, which means I have unique ID values throughout the system. I very much like this, it means that given an ID value it can refer to one and only one thing within the system.
- Caveat: if I ignore the metadata tables (Entity_Type, Entity_Link_Type, Entity_Attribute), at least… I could do the same with them, I suppose, but I’m not convinced I want to.
- Having a single place to look for entity links means I can build the engine for that once, and it’ll always work the same way. Regular RDBMS modeling would have different cardinality joins implemented in different ways, and while that generally is the correct answer, in this case because the tables are dynamically defined by the user I’m not sure that’s the best approach. Changing a 1:many to a many:many in a regular database sucks, here it might be as simple as changing the entity link’s cardinality definition.
- Where my earlier model had complex XM (linking) tables with additional attributes such as roles, and additional tables to define those roles, it looks like a simple XM table can handle almost all of it, and adding new connections expands on the use of this table rather than introducing more tables.
- For example, I might want to add a ‘Game_System’ table to my RPG media library. I add one table and a record to Entity_Link saying ‘Document has a many:many with Game_System’ and I’m done. I don’t need to add Game_System and Game_System_XM, and the metadata to connect them.
- With the old model I might have had moderately complex logic for splitting up the (Document, Creator) links by role. Adding new roles might be easy (add record to the Document_Creator_Role table), but now I add a slightly more complex configuration item (new link between Document and Creator) and it just works like all the others. I haven’t written the code yet but I have the sense this will be easier to create and maintain because it is the same for all such links.
- All items are connected via a single table, which might lead to some performance issues. Every time a new item is added to the database that table could conceivably be locked, which can slow things down, and this table will have a number of records equal to the total number of entities.
- This actually might not be so bad a problem. Having a single object to lock can make it easy to head off the starving philosophers: if one process is adding a document and files, and another process is adding a file and documents, locking the Entity table until complete means each can be considered more or less atomic.
- The data volume probably isn’t a big deal either, since when I look at the Document table I don’t really need to look at the Entity table.
- However! I do run the chance of referential integrity errors because Entity_Link only needs to connect to a record in the Entity table, not necessarily a Document record in the Entity table. It will be on the developer to ensure these never run agly.
Okay, that right there answers my question. I’m not going to use exactly this model.
However, I think I will use a model quite like it. Probably get rid of the Entity table, and have as many Entity_Link-style tables as I need. This increases the number of tables overall, compared to having only a single Entity_Link table, but the application of each one is simpler. Unlike the regular RDBMS practice of having 1:many implemented as a foreign key in one table and then using different logic depending on cardinality, always using a link table like this means I can easily change cardinality later (as long as it’s 1:many to many:many; going many:many to 1:many always sucks).
I think this means I am likely to have:
- Creator
- Document
- Document_Artist_XM (links Document to Creator, as ‘artist’)
- Document_Author_XM (links Document to Creator, as ‘author’)
- Document_Editor_XM (links Document to Creator, as ‘editor’)
- Document_File_XM (links Document to File; may now have additional attributes)
- Document_Game_System_XM (links Document to Game_System)
- Document_Series_XM (links Document to Series)
- Document_Tree (hierarchical table connecting Documents to subdocuments)
- File
- File_Tree (hierarchical table connecting File to subfiles)
- Game_System (lightweight, hierarchical)
- Game_System_Tree (hierarchical table connecting Game_System to Subsystems — usually be edition, I expect)
- Series
- Series_Tree (hierarchical table connecting Series to subseries)
- Document_Tag (tags specifically intended for Documents)
- Document_Tag_Tree (hierarchical table connecting Document_Tag to subtag)
- Document_Tag_XM (links Document to Document_Tag — by pure naming convention this might be ‘Document_Document_Tag_XM’ but that seems silly to me)
Hmm. Not sure about hierarchical relationships. I could put them on basically everything, which came almost free in the abstract model above (just the cost of defining them), but I’m not sure if that’s the right move. On the other hand, just about all significant entity types, and even lightweight entities like Game_System, could reasonably be hierarchical.
Eh, for the cost of a nullable foreign key and a nullable child_order field, and a bit of logic to handle the reporting, maybe it’ll be easier overall to give it to everything and let it sort itself out.
TL;DR: the more-abstract model I explored in the previous post looks like it won’t quite work for me, but exploring the question gave me some insight into something that might work better for me in the end. Conclusion at this point: not mad, I’ll take it.