It seems almost certain that the data engine for this application is going to be a relational database. My analyses so far have concluded the data is pretty structured.
Most applications I’ve worked on have had pretty specific requirements when it comes to their data. These have mostly been business applications that could have fairly static requirements.
This application’s basic needs imply a level of adaptability I don’t normally see at work. Which is fine; the pieces will tie together nicely.
The basic needs outline several entity types and to me at least, imply some others. The top-level entries in the list below are entities, each entity will list multiple attributes. These attributes are something of a brainstorm; not all will be implemented and others will almost certainly come up during analysis, design, and implementation.
- Document is the conceptual object we’re tracking: a book, a song, a movie, whatever. A document might have several sets of metadata, in fact: from the file itself, downloaded from a metadata repository such as Amazon, or entered by the user.
- Title. The name of the thing.
- Publishers. The creator (probably a company) that produced the thing.
- Files. The data being stored that represents the thing.
- Creators. People who crafted the thing. Creators may have different roles, and a creator may have multiple roles on the same document.
- Description. Freeform text about the document.
- File is a collection of bytes on disk containing the data that makes up a document… or part of it. A document may have more than one file, of different formats or of the same format.
- Location. Where to find the file.
- Size. Number of bytes making up the file.
- CRC64. CRC-64 checksum of the file.
- SHA256. SHA-256 checksum of the file.
- Mime Type. Type of file (how the file is structured, etc.: application/epub+zip, application/pdf, image/jpeg, audio/ogg, etc.); should not depend on file extensions but likely uses libmagic to identify the file type.
- File Name. I suspect this will not be the original file name, but an internally-assigned file name because multiple documents could in fact have bitwise-identical files.
- Description. Freeform text about the file. (I expect this will be null most of the time… but will include it for now.)
- Person is information about a meat popsicle. This could be almost a primitive (at its base, all it needs is a name), but I can imagine many other fields that could be added to this data type. These are used
- Name. Everyone has a name. Even ‘The Man With No Name’ is a name, of a sort.
- Description. Freeform text about the person.
- Series is a collection of documents. Usually this would be ordered (‘set’ is probably a better term for an unordered ‘series’), but I think I can use series for this case also.
- Tag is an important element, allowing the user to add arbitrary classifications and markers to entity types. On the face of it a tag is just a string (and a single entity may have multiple tags), but there could be many kinds of tags with similar but slightly different behavior. As with Person, a particular library might choose to extend the tag type definition.
- Label. The text to display in a tag list.
- Description. Freeform text about the tag.
These likely will not be built directly into the application. Instead, they are likely implemented using plugins (that are shipped with the application and loaded by default… but still, not actually hardcoded).
This is not metadata about the objects being described (name, location, etc.), but metadata used internally by the application to decipher the data entities. Table names are tentative and subject to change, but I know what they do.
- ML_Entity defines the entity types used internally.
- Table Name. Where the content for this entity type is.
- ML_Attribute defines the attributes of an entity, including cardinality and certain constraints. For instance, a ‘rating’ attribute might be defined as a real number from [0..5]. The implementation of this table is not yet fully understood.
- Attribute Name. Name of the attribute, used in queries and the like.
- Attribute Label. How to mark the attribute in the user interface.
- Attribute Type. ‘Primitive’ data type of the attribute.
- Column Name. Name of table column to store the attribute data/value. May be null if the attribute is a many:many (i.e. defined by a link table between this entity’s table and another).
- ML_TagType defines a type of tag.
- Entity Type. What tags of this type can be applied to.
- Tag Delimiter. How to separate tags in a list (often will be ‘,’, but it might be useful to separate on strings that don’t appear in values we might want to use).
- Hierarchy Delimiter. How to separate levels (and group tags within their groups) in the tag browser (as tag delimiter, might commonly be one value but there may be times another value works better).
These likely will be built directly into the application, and the application will have a facility to modify the content of these tables (and the database underneath them). Adding or removing ML_Entity records should be expected to add or remove entity types, adding or removing ML_Attribute records should be expected to add or remove columns from the entity’s table (or add or remove link tables, for many:many attributes).
This post is likely to see a fair bit of revision as I dig deeper into how this application’s data will be represented. In fact, I know it will because there are other metadata tables I haven’t touched on yet.