Media Library: Just What Metadata Is There?

In looking at the data models, I’ve focused mostly on ‘user-defined columns’ in the expectation that I’d use the same mechanism for ‘built-in columns’.

Which raises the possibility of someone taking the library and bending it in strange ways, doing things I had no reason to think they would.

If you know me, you’ll recognize that I’m totally okay with this. One of the signs of a good tool is that it does well what it’s made to do… but another sign is that it can do well things that were not imagined when it was built.

Regardless, what I haven’t done is look specifically at what metadata I’m starting with (via calibre) and what I might do to extend this.

Calibre Built-In Fields

Calibre has the following built-in fields. These all apply to books, there is no built-in mechanism for other entity types (which basically are strings, just applied in various ways). The ‘order’ is the default order they will be displayed, but this can be overridden in places.

In fact, it appears order is the only field that can be changed in the built-in fields. Everything else is unchangeable.

OrderColumn HeaderLookup NameTypeDescription
0On DeviceondeviceYes/No with textFlag indicating whether the book is present on the attached device (i.e. ebook reader).
1TitletitleTextTitle of the book.
2AuthorsauthorsAmpersand separated text, shown in the Tag browserList of authors, presented in (firstname lastname) order, separated by ampersands.
3DatetimestampDateDate (timestamp) book was loaded into calibre.
4SizesizeFloating point numbersSize of the largest book file in megabytes (i.e. if a book has a 0.90MB epub and a 0.62MB text file, this will show 0.9MB). Not shown in the document metadata editor (it is not editable) but is shown for each book file.
5RatingratingRatings, shown with stars‘No rating’ or 1..5 stars. No decimal values.
6TagstagsComma separated text, like tags, shown in the Tag browserList of user-specified tags (may be taken from document metadata, such as PDF keywords, but can be edited by the user).
7SeriesseriesText column for keeping series-like informationName of the series and the book’s position within the series.
8PublisherpublisherText, column shown in the Tag browserName of the book’s publisher.
9PublishedpubdateDateDate the book was published (shown as ‘MMM yyyy’ by default, can be tweaked via ‘gui_pubdate_display_format’).
10Modifiedlast_modifiedDateDate the book entry was last updated. Not shown in the metadata editor (it is an internal value and not editable) but can be displayed in the book list and in book details pane.
11LanguageslanguagesComma separated text, like tags, shown in the Tag browserBehaves much like tags, but does have some additional uses (such as the ‘Find Duplicates’ plugin taking languages into account when looking for duplicates).

There also are some other columns added to support those above.

OrderColumn HeaderLookup NameTypeDescription
?Author sortauthor_sortTextList of author name in (last name, given names) order, separated by ampersands.
?CommentscommentsLong textFreeform text (HTML markup, I think) about the book.
?coverBooleanTrue if the book has a cover image, otherwise False.
?IdentifiersidentifiersComma-separated textActs something like tags, but with special handling. Each value is ‘idtype:idvalue’ (isbn:isbn-number-goes-here), each id type may be present only once for a book (but IDs of multiple may be present for a single book), and there may be arbitrary types. There might be special handling (ISBN numbers are validated via check digit, and IDs may be used by Find Duplicates to look for duplicated books.
?markedBooleanTrue if the book is ‘marked’ (transient status effect), otherwise False.
?series_indexFloating point numberIndex of the book in its series, if any.
?Title Sorttitle_sortTextBook title in ‘sorted’ format (such as leading ‘The’ and ‘A’ moved to the end, “A Brief History of Time” -> “Brief History of Time, A”).
?uuiduuidTextMachine-generated Universally Unique Identifier.

Order is shown as ‘?’ because these are not presented in the list of fields. I found them via the regular expression tab in the bulk metadata editor (powerful tool, but be careful: it comes with no safeties). Not all have column headers, in fact none of them can be displayed as columns in the book list, but the ones with ‘Column Headers’ entries can be presented in the book details pane.

Calibre Custom Fields

Of course, calibre doesn’t come with any custom fields built in. If they were built in, they’d be built-in fields.

That said, calibre does support custom field definitions. The following metadata types are supported.

Column TypeDescriptionAdditional Fields
Text, column shown in the Tag browserShow checkmarks
Comma separated text, like tags, shown in the Tag browserContains names
Long text, like comments, not shown in the Tag browserColumn heading, Interpret this column as
Text column for keeping series-like informationHas associated [Column]_index field
Text, but with a fixed set of permitted valuesHardcoded (in the database and can be changed, but it’s a hardcoded list in the column definition)Show checkmarks, Values, Colors
DateFormat for dates
Floating point numbersFormat for numbers, Decimals when editing
IntegersFormat for numbers
Ratings, shown with starsAllow half stars
Yes/NoShow
Column built from other columnsUses the template engine to calculate new values based on other book fields. Be warned, ‘column built from other columns’ can do horrible things to performance.Template, Sort/search column by, Show in Tag browser, Show as HTML in Book details, Show in comments in book details, Column heading
Column built from other columns, behaves like tagsTemplate, Sort/search column by, Show in Tag browser, Show as HTML in Book details

Each custom column has the following configuration information. All custom columns have ‘Lookup name’, ‘Column heading’, ‘Column type’, ‘Description’, and ‘Default value’. The first three must be specified, the last two may be blank.

Built-In FieldDescription/Notes
Lookup nameName of the column internally, used for queries and templates. For custom columns this is presented here without adornment, but when referenced in a template or the like will have ‘#’ prefixed. When accessed via the calibredb command line application, will have ‘*’ prefixed.
Column headingHeading to use when presenting the field (in the book list or in the book details pane).
Column typeColumn type as defined above.
DescriptionText describing/explaining the field’s purpose and intent. May be blank.
Default valueValue to assign to the book if no value is provided by the user/on input. May be blank.

Each custom column may have additional configuration fields, as described below.

Additional FieldDescription/Notes
Allow half starsUsed by ‘Ratings’, indicates whether half-star values (3.5 stars) are allowed.
Column headingLabel to use when displaying the custom column.
ColorsText, fixed set of values can have the field colored based on the value picked. May be blank or have one named color for each value. Does not affect other columns (use the ‘column coloring’ rules via ‘Look & feel’).
Contains namesComma separated text contains a list of peoples’ names; when presented in the Tag browser they are grouped by first initial of last name and presented in order of (last name, given names).
Decimals when editingHow many digits to allow after the decimal when editing.
Format for datesHow to present the date/time, per Python formatting syntax.
Format for numbersHow to present the number (integer or floating point), per Python formatting syntax.
Interpret this column asHow to render the text in a long text column: ‘short text, like a title’ (no linebreaks), ‘plain text’ (raw text with linebreaks), ‘HTML’, or ‘markdown’.
ShowHow to render a Yes/No field: icon (check/X), text (yes/no), both.
Show as HTML in Book detailsIndicates whether a derived column should be formatted as HTML (mostly useful to allow hyperlinks to be opened in the browser).
Show checkmarksShows a green checkmark if the value is ‘checked’, ‘true’, or ‘yes’; shows a red X for ‘unchecked’, ‘false’, or ‘no’; otherwise shows nothing.
Show in comments in book detailsIndicates whether a derived column should be presented as a comment in the book details pane.
Show in tag browserIndicates whether a derived column should be displayed in the tag browser. It appears almost all other types can be, but these ones might need to be included explicitly because of what they do to performance — populating the tag browser requires calculating the values for all books.
Sort/search column byHow to interpret a derived column (text, number, date, yes/no).
TemplatePython template used to create a column built from other fields. May call functions or formatting instructions. If you want to nest functions, such as calling ‘f(g(x))’, it appears you need to have another field for the intermediate value.
ValuesComma-separated list of values allowed in a constrained column. Stored in the column definition in the database, not as a database lookup table.

This is a fairly rich set of options, when it comes to defining fields. However, I believe I see ways to make it a little more generic.

Media Library Field Definitions

It is not yet time to define actual fields, but I think I can see some common elements in the field types and definitions above that could make things more abstract and more powerful.

Primitive Data Types

All the usual suspects: string, number (integer or floating point), date, date/time (timestamp), boolean.

Complex Data Types

The only complex data type I have at the moment is ‘name’. This is used for people names, book titles, and so on. Has a ‘display value’ and a ‘sort value’.

I considered including ‘list’ here, but I feel that might be better as a data type modifier.

Lists

An entity may have a list of values of almost any type. A book might have a list of titles or of print dates. A file might have a list of fully-qualified file names (these should be unique, but would let me restore the original file if needed… or identify that the file at that URI is already loaded). Most of the other examples I thought of tonight would probably be better as a list of references (i.e. many:many relationship to another entity), but I can see uses for this.

A list should be able to have arbitrary delimiters. For some, commas might be appropriate, for others ‘:::’, others might like a ‘pipe’ (‘|’). This should probably be configurable in the field definition.

References

Some attributes will be connections to other entities. A book has authors (or ‘many people in many roles’ — authors, editors, developers, etc.) and ‘book tags’. In both cases these could be many:many relationships (just a book has multiple authors, an author may have multiple books). A book in a series would not have ‘series’ and ‘series_index’ fields, it likely would be a reference to a ‘series’ entity with an index value on the relationship.

Speaking of which, the relationship likely should identify any secondary values stored with that relationship. In the case of a series, I should be able to ask “what is the first book in this series?” as easily as “what series is this book in?”

As with lists, I should be able to define any delimiters needed. And possible a template for parsing/presenting the information.

Hierarchies

Almost any entity can be hierarchical: documents have subdocuments, files have subfiles, series have subseries, and so on.

Closing Comments

Calibre has a rich set of options when it comes to custom columns. Where calibre builds quite a bit of behavior into the column type, I think I can get something a little more powerful by going to a combination of primitives and modifiers to their behavior. This should give me more options overall, while also allowing me to generalize the implementation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top