Media Library: Thinking and Rethinking Identity and Redundancy.

My last post explored a few of my use cases and how they might work… and ended with a situation that puzzled me: how do I resolve the ‘original file name’ of a document file, when I have merged multiple copies of the same document (and file) from different sources?

For example, I have bought four copies of Matt Finch’s Tome of Adventure Design (and backed the recent Kickstarter for the revised version, but that is clearly a different document because it has different content).

  • From Frog God Games’ online store, I have a copy in Downloads/FrogGodGames/TomeOfAdventureDesign.pdf
  • From Humble Bundle, I have a copy in Bundles/Humble Bundle/RPGs from Frog God and Friends/Tome Of Adventure Design.pdf
  • From Humble Bundle, I have a copy in Bundles/Humble Bundle/RPG World Building/TomeOfAdventureDesign.pdf
  • From DriveThruRPG, I have a copy in DriveThruFiles/Frog God Games/Tome of Adventure Design/Tome of Adventure Design.pdf (actually paid for via a Bundle of Holding, but since it was delivered via DTRPG that’s where I have it)

(Yes, I bought it originally from the publisher, then picked up multiple copies via bundles. It happens.)

If I load them all into the media library I’ll have four Documents. Each one has a single file, each one has a single download source, it’s all good.

As far as it goes, at least. However, one could argue that they are all representations of the same physical object, a most excellent game book by Matt Finch. It would be reasonable to think that almost all metadata would be the same. After all, each copy has the same author, same artists, same tags, and so on. It could make sense to merge them into a single Document, then all the document metadata will match.

The exceptions? Possibly the files; if they are not bitwise-identical I might want to keep all different copies or I might want to choose one and discard the others… that’s where things get complicated.

The simplest thing to do is keep a single copy of each piece. That is, keep my favorite version of each file in the package (in this case there’s one, so I would keep only the one I like best) and discard the rest. Similarly, I could keep only my favorite ‘store’ information and discard the others (or even discard them all, if I don’t care where I got them.

The more complicated thing would be to retain all the source details (i.e. which version of the files I downloaded from which store). I have only a single copy of each bitwise-unique file, so I can ‘duplicate’ those cheaply. In this case I would probably want to not use a Document_File_xm table, but a Document_Store_xm (joining Document to Organization as a many:many), and a Document_Store_File_xm that connects that to the File table.

Closing Comments

I can end up with multiple files all representing the same physical object.

Simplest is to treat them all as distinct entities in their own right (i.e. each one has its own Document and Files), but that gives me redundant copies… and potentially discrepancies and inconsistency between them.

Next simplest would be to merge them, which includes merging some redundant metadata (such as the author — if I merge the two or more Documents, I would expect to have only the authors of a chosen master Document, or the union of all authors, but either way I end up with one set) and discarding other redundant information (such as different files — keep only the ‘best’ File and discard the others). I might decide to discard some attributes entirely, rather than pick a definitive or canonical version (such as getting rid of the Store information: I have the definitive Files and Document, and don’t care about the Store).

Both options above can work with the model I’ve described in previous posts. I can make a more complex model that allows me to keep all the information. I see how to do it and know it can be done.

The real question is… do I care that much?

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top