Media Library: Pondering Abstraction, Possible Disaster

In my last post, regarding Operations, at one point I started to write “and everything can have Tags”… when I realized that can happen, but takes a little extra work.

What if they could, though? Early in this series when I wrote about data modeling I recognized that many entities could have ‘subentities’: documents have subdocuments, files have subfiles, series have subseries, even tags can be hierarchical!

At the same time, most entities have almost no built-in attributes. They all have IDs, but except for Files (which have some special characteristics I need to be able to count on) almost everything about the entities could be configured and changed.

What happens if we abstract some of those characteristics?

New Core Entities

I see two primary entity types: Entity and Entity_Link. Everything else can be implemented in terms of these two. In fact, I think it likely Entity and Entity_Link might be the only two data tables (as opposed to metadata tables used to define how the rest work) that won’t have more attributes added by the user.

That said, ‘File’ is going to be an important entity type, if not quite a primary entity type like the two above, and if it doesn’t look like I expect it to then bad things could happen.

Entity

Every object that exists in the application, outside the application entity metadata definitions, is present in this table. It is a very simple table, but every item is here because it serves as the anchor for so many other things.

FieldTypeNotes
rowidINTEGER PRIMARY KEY(created implicitly, auto increment)
entity_type_fkINTEGER NOT NULLReferences Entity_Type.
stateSTRING(possibly unneeded) Indicates state of the object, most notably allowing soft deletes.

Entity_Link

Connects two entities in potentially arbitrary ways. The relationships are pretty arbitrary, but I may want to define in the application metadata exactly what connections are allowed.

There should be a convention. Document:File is a many:many relationship and the structure here allows you to use (left, right) as either (Document, File) or as (File, Document), but life will be easier if a convention of one or the other is followed. That is, I would always use (Document, File), just so I always know ‘left_entity_fk’ points at the Document. Similarly, for parent-child relationships I’m likely to always define (left, right) as (parent, child).

Entity_Link only makes sense if I don’t need other attributes associated with a link. Almost all of the links I have need only the left and right edges, and possibly a sort key if the links are ordered. The only exception that comes to mind is when I want to capture the original file name and path I loaded the file from, in context of a document… and even then I could get away with treating those as another pair of links. Kludgy but perhaps acceptable, if it means the rest of the model still works cleanly. Or perhaps a ‘third_link’ to supplement left_link and right_link.

FieldTypeNotes
rowidINTEGER PRIMARY KEY(created implicitly, auto increment) Does not need to exist in Entity table.
left_entity_fkINTEGER NOT NULLReferences Entity table, left side of the link.
right_entity_fkINTEGER NOT NULLReferences Entity table, right side of the link.
link_type_fkINTEGER NOT NULLReferences Entity_Link_Type
link_orderINTEGER NULLABLEIdentifies the order of the right entity within the left. May be null if the relationship is unordered (such as tags), and it’s possible that a collection of links for an object will be a mix of ordered and unordered links.
cardinalitySTRING1:1, 1:*, *:1, *:* (default *:*)

Entity Metadata

These tables do not contain ‘object data’ such as documents and files and whatnot. Instead, these tables contain metadata describing how the various tables and whatnot interact within the system.

Metadata records do not connect to the Entity table.

Entity_Attribute

This table defines the attributes attached to each entity type. I suspect it will use my column type definitions. Unsure at this time if it will make any use of the Entity_Link_Type table… the existence of a the link definition implies in my mind that the relationships exist on both sides, neither truly owns the definition. Even when customarily the link might be defined from one side to the other (a 1:many usually has a foreign key in the ‘many’ table of the relationship) the information often is presented via both sides: the child presents the parent, and the parent presents the children. Cardinality is a consideration of the link more than the attribute (an entity:subentity relationship almost always is 1:many).

I doubt this will have presentation information, but will focus more on things that will affect the internal database definition. UI considerations will be driven by another layer dedicated to presentation.

I am not presenting this table here now, I want to think more about how the attribute definitions will work.

Entity_Link_Type

This table defines how entities can be linked.

The ‘names’ are how the entity on the other side sees it. For instance, in an entity link type defining document authors (left is Document and right is Creator, indicating who wrote the document), left_name might be ‘Document’ and right_name might be ‘Authors’ (since those are the objects on that side). The alternative would be to use the name as seen by that side (i.e. left_name is what the link looks like to the Document, therefore ‘Authors’) but I thought that would be even more confusing.

It might make more sense for the name to be a UI consideration, but I have the sense this could be helpful in understanding the nature of the link. I can imagine many (Document, Creator) links, thereby replacing the ‘roles’ mechanism I originally considered. Instead of having a small roles table, I could have multiple records in Entity_Link_Type that indicate author, artist, editor, etc.

FieldTypeNotes
rowidINTEGER PRIMARY KEY(created implicitly, auto increment) Does not need to exist in Entity table.
left_entity_type_fkINTEGER NOT NULLReferences Entity_Type table, type of entity on the left side of the link.
right_entity_type_fkINTEGER NOT NULLReferences Entity_Type table, type of entity on the right side of the link.
nameNAMEName of the relationship type (might not keep).
left_nameSTRINGName of relationship to the left. Could be by type (‘Document’) or nature of relationship (‘Parent’).
right_nameSTRINGName of relationship to the right. Could be by type (‘Creator’) or nature of relationship (‘Authors’ or ‘Child’ or ‘Subseries’).
descriptionTEXTExplains the nature of the entity link. Should usually be simple but specific.

Entity_Type

Defines concrete entity types, especially with regard to attributes and allowable links.

FieldTypeNotes
rowidINTEGER PRIMARY KEY(created implicitly, auto increment)
nameNAMEName of the entity type.
table_nameSTRINGName of the RDBMS table containing the records of this entity type.
descriptionTEXTExplains the nature of the entity and what it represents.

Closing Comments

Now that I’ve written this, I’m not entirely sure how much really will change. I think the main driver will be whether I need to have additional attributes in the links between tables, or if (left, right, [optionally] order) will be sufficient. If so, I can probably go with this model. The more abstract and flexible I can make things, the less code I (probably) need to write.

Then again, when you have a hammer… I admit to a bias toward rules-based systems and design, and if that works here I may be predisposed to doing so. On the other hand, I like to think I have that bias because it has worked so well for me and saved me from so much specific-purpose code… I’ll need to see how it goes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top