Media Library: Simulating Use Cases

A recent post included some quick validation of a tentative design… and failed it! I argue this is a successful validation exercise: if the ‘first-best results’ prove a design or implementation is correct, ‘second-best results’ find a problem before it matters.

Second-best results are just fine by me, they me from wasting time on something that won’t work

Another form of validation is to walk through the steps of doing the work, and seeing where things come together and where they fall apart. I’m going to take a run at a few of my use cases now.

Capturing Images

I have a slew of images I’ve downloaded from various stock art sites. I was consistent enough to organize them by stock art source and date (well, month) I downloaded, but I’ve otherwise really not paid much attention. I am confident I have multiple copies of the same images, because I tend to do that when I find something cool. Not on purpose! But because I didn’t bother to see if I already had it.

Assuming I start with an empty media library, and actually don’t care about Document right now (I’m all about the files), the following things might happen:

  • Define entity types.
    • I will start with the stock File, Creator, Media Type, and Organization definitions.
      • This is a stock art library; I might extend the File attributes to hold image information (resolution, color depth, etc.)
      • I might extend Organization with a URL (for the Organization’s store home page).
    • I will add a File_Store_xm table connecting File and Organization, identifying the organization I downloaded the file from. I could put a URL here if I could figure out exactly what store page I got it from… which isn’t necessarily impossible, if the store has a habit of putting a stock ID number in the file name, as some do.
    • I probably want a File_Artist_xm or File_Photographer_xm or File_Poster_xm table connecting File to the Creator.
    • I will almost certainly want File_Tag and File_Tag_xm to help categorize the images.
  • I’ll create an Organization for each stock art site I’ve downloaded from and name the folders I’m loading from to match. Since there aren’t all that many, I’ll also grab my landing page URL for each site and store that in the Organization as well.
  • For each top-level folder in my download directory, I’ll do a recursive sweep for image files. Image files will be loaded into the media library, non-image files will be ignored.
    • Add a new File (or find an existing one if I have a bitwise-identical file).
    • Connect to the Media Type appropriate to the file format.
    • Connect it to the Organization I downloaded it from, as identified by the top-level directory.
    • Possible oversight: Assuming I want to be able to recreate the original file, I’ll need to store the original file path and location. These can’t go on the File record because there could be more than one. I’ll need a subsidiary table. This could be simulated using a subfile mechanism, but I feel that might not be a great way to do it.
    • Populate the image metadata (read image dimensions and dates and whatnot from the file, JFIF fields if I have a place to put them, and so on).

Once this is done, I will have a set of bitwise-unique files, with the information needed to write them back to their original locations on disk and to find the site where they came from. If I got bitwise-identical stock images from multiple sites I might need to manually intervene to say which downloaded copy came from which site, but I probably don’t really care. They’re the same files, after all.

The files are now functionally deduplicated and I can assign tags as I want to categorize them.

Capturing Images, Mark II

Much the same as above, but slightly more sophisticated. I’ve got bundles of the same files from Humble Bundle, itch.io, Unity Asset Store, and Eldamar Studios, among others. I didn’t store them locally by when I downloaded them, they’re grouped by bundle and the bundles might have subfolders. I want to be able to navigate them all.

  • Same implementation as above, plus:
    • File_Path hierarchical entity (root is the download source folder, entries are delimited with ‘/’ because I’m a UNIX guy at heart and because ‘\’ is a pain to code around).
    • File_Path_xm connecting the File to the leaf folder of the original path, and has the original file name.
  • On loading a File I’ll create any File_Path records needed, and connect the File to the File_Path leaf node via File_Path_xm.

This gives me a navigable structure for the original source paths and a cleaner place to store the original file name. I likely still need to resolve the Organization-Folder/Filename thing manually, but since the top-level File_Path is the organization name it probably isn’t too difficult.

Capturing RPG Books

I have quite a large number of RPG books, from DriveThru, from Open Gaming Store, from Humble Bundle, and more.

  • Same implementation as Capturing Images, Mark II, but this time I will
    • Add Document
    • Add Document_Store_xm (joining Document to Organization — I do get some files/documents directly from the publisher, so I want to say as much)
    • Add Document_Publisher_xm (joining Document to Organization)
    • Add Document_Tag and Document_Tag_xm
    • Add Game_System and Document_Game_System_xm (Game_System is fairly lightweight hierarchical table)
    • Add Document_Author_xm and Document_Author_xm (each joining Document to Creator)
    • Remove File_Artist_xm, File_Photographer_xm, etc. from the earlier implementation
  • To deal with DriveThru Downloader files
    • Create an Organization for DriveThru/OBS; all documents will have a Document_Store_xm connecting to it)
    • For each subfolder, create an Organization if needed; all documents loaded from that subfolder will have a Document_Publisher_xm connecting to it)
    • For each subsubfolder, create a Document if needed; all files loaded from that subsubfolder will have a Document_File_xm connecting them to that Document. Also create a File_Path if needed.
    • For each file in each subsubfolder, create a File if needed or reuse an existing bitwise-identical File, and connect the File to the Document.
  • When uploading from the DriveThru downloader file store,
    • Create the File_Path as before, but also
    • Create an

Observations and Possible Changes

There are a few things that could use refinement, but they are largely consistent with the rest of the model.

  • Consider ‘extension tables’ rather than embedding directly in a base table. This is mostly of value if an Entity can have different — or multiple — additional fields. This is prompted by Files of different types having different extension fields: image fields, video fields, audio fields, etc. An image file doesn’t need audio metadata, a sound file doesn’t need book metadata such as page count, and so on. These extension tables could even have non-singular cardinality, such as a TIFF file having multiple pages or any file being loaded from more than one source path and file name.
  • Even inasmuch as File captures bitwise-unique files once, it seems I have a common need to capture source location and name. I expect that normally when exporting files from the library I’ll want to create a filename using a template that pulls metadata from various related entities, but I feel like at the least I should have the original file name. Or at least an original file name.
  • I had previously defined Document_File_xm as having a link referencing the File_Path record, but considering the download directory might be associated with the publisher, it might not be a bad idea to connect the path that way also.

Potential specific counterexample to that last point: Tome of Adventure Design… I have it from DriveThru, I have it from Frog God, and I’m pretty sure I have it via Humble Bundle. Assuming these are all bitwise-identical files ultimately published by Frog God, I might have:

  • DriveThru copy in DriveThruFiles/Frog God Games/Tome of Adventure Design/Tome of Adventure Design.pdf
  • FGG copy in Frog God Games/tome-of-adventure-design.pdf
  • Humble Bundle copy in Humble Bundle/RPG Worldbuilding Megabundle/Tome of Adventure Design.pdf
  • Humble Bundle copy in Humble Bundle/RPG Stuff from Frog God and Friends/Matt Finch’s Tome of Adventure Design.pdf

If these are all different documents with bitwise-identical files, I suppose I have some simple redundancy… and that’s fine, that the simple case. If I merge them, though, I have

  • Three different stores.
  • One publisher.
  • Four paths.
  • Three file names.

It looks like I cannot reasonably collapse Document_Store_xm to one entry and have the File_Path connected (Humble Bundle follows two different paths to different file names), and I can’t have the Document_Publisher_xm have File_Path connected because Frog God had four different paths here. If I want to track these all, if anything I’d have to connect the File_Path to the Store and/or Publisher. Thankfully I don’t think I’d need to actually do this. I’d keep the original paths simply so I’ve got something I can use to navigate to the document (because I know where I loaded it from) and if I want to create output I’d probably do something like ‘{publisher.name}/{document.title}.{file:extension}’ (pretending I know what my template language looks like) to get a canonical output filename.

A few tweaks needed, but nothing too horrendous. As always, it’s when you get into it that you find out these little things.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top