Media Library: Just What Metadata Is There?

In looking at the data models, I’ve focused mostly on ‘user-defined columns’ in the expectation that I’d use the same mechanism for ‘built-in columns’.

Which raises the possibility of someone taking the library and bending it in strange ways, doing things I had no reason to think they would.

If you know me, you’ll recognize that I’m totally okay with this. One of the signs of a good tool is that it does well what it’s made to do… but another sign is that it can do well things that were not imagined when it was built.

Regardless, what I haven’t done is look specifically at what metadata I’m starting with (via calibre) and what I might do to extend this.

Calibre Built-In Fields

Calibre has the following built-in fields. These all apply to books, there is no built-in mechanism for other entity types (which basically are strings, just applied in various ways). The ‘order’ is the default order they will be displayed, but this can be overridden in places.

In fact, it appears order is the only field that can be changed in the built-in fields. Everything else is unchangeable.

Order	Column Header	Lookup Name	Type	Description
0	On Device	ondevice	Yes/No with text	Flag indicating whether the book is present on the attached device (i.e. ebook reader).
1	Title	title	Text	Title of the book.
2	Authors	authors	Ampersand separated text, shown in the Tag browser	List of authors, presented in (firstname lastname) order, separated by ampersands.
3	Date	timestamp	Date	Date (timestamp) book was loaded into calibre.
4	Size	size	Floating point numbers	Size of the largest book file in megabytes (i.e. if a book has a 0.90MB epub and a 0.62MB text file, this will show 0.9MB). Not shown in the document metadata editor (it is not editable) but is shown for each book file.
5	Rating	rating	Ratings, shown with stars	‘No rating’ or 1..5 stars. No decimal values.
6	Tags	tags	Comma separated text, like tags, shown in the Tag browser	List of user-specified tags (may be taken from document metadata, such as PDF keywords, but can be edited by the user).
7	Series	series	Text column for keeping series-like information	Name of the series and the book’s position within the series.
8	Publisher	publisher	Text, column shown in the Tag browser	Name of the book’s publisher.
9	Published	pubdate	Date	Date the book was published (shown as ‘MMM yyyy’ by default, can be tweaked via ‘gui_pubdate_display_format’).
10	Modified	last_modified	Date	Date the book entry was last updated. Not shown in the metadata editor (it is an internal value and not editable) but can be displayed in the book list and in book details pane.
11	Languages	languages	Comma separated text, like tags, shown in the Tag browser	Behaves much like tags, but does have some additional uses (such as the ‘Find Duplicates’ plugin taking languages into account when looking for duplicates).

There also are some other columns added to support those above.

Order	Column Header	Lookup Name	Type	Description
?	Author sort	author_sort	Text	List of author name in (last name, given names) order, separated by ampersands.
?	Comments	comments	Long text	Freeform text (HTML markup, I think) about the book.
?		cover	Boolean	True if the book has a cover image, otherwise False.
?	Identifiers	identifiers	Comma-separated text	Acts something like tags, but with special handling. Each value is ‘idtype:idvalue’ (isbn:isbn-number-goes-here), each id type may be present only once for a book (but IDs of multiple may be present for a single book), and there may be arbitrary types. There might be special handling (ISBN numbers are validated via check digit, and IDs may be used by Find Duplicates to look for duplicated books.
?		marked	Boolean	True if the book is ‘marked’ (transient status effect), otherwise False.
?		series_index	Floating point number	Index of the book in its series, if any.
?	Title Sort	title_sort	Text	Book title in ‘sorted’ format (such as leading ‘The’ and ‘A’ moved to the end, “A Brief History of Time” -> “Brief History of Time, A”).
?	uuid	uuid	Text	Machine-generated Universally Unique Identifier.

Order is shown as ‘?’ because these are not presented in the list of fields. I found them via the regular expression tab in the bulk metadata editor (powerful tool, but be careful: it comes with no safeties). Not all have column headers, in fact none of them can be displayed as columns in the book list, but the ones with ‘Column Headers’ entries can be presented in the book details pane.

Calibre Custom Fields

Of course, calibre doesn’t come with any custom fields built in. If they were built in, they’d be built-in fields.

That said, calibre does support custom field definitions. The following metadata types are supported.

Column Type	Description	Additional Fields
Text, column shown in the Tag browser		Show checkmarks
Comma separated text, like tags, shown in the Tag browser		Contains names
Long text, like comments, not shown in the Tag browser		Column heading, Interpret this column as
Text column for keeping series-like information	Has associated [Column]_index field
Text, but with a fixed set of permitted values	Hardcoded (in the database and can be changed, but it’s a hardcoded list in the column definition)	Show checkmarks, Values, Colors
Date		Format for dates
Floating point numbers		Format for numbers, Decimals when editing
Integers		Format for numbers
Ratings, shown with stars		Allow half stars
Yes/No		Show
Column built from other columns	Uses the template engine to calculate new values based on other book fields. Be warned, ‘column built from other columns’ can do horrible things to performance.	Template, Sort/search column by, Show in Tag browser, Show as HTML in Book details, Show in comments in book details, Column heading
Column built from other columns, behaves like tags		Template, Sort/search column by, Show in Tag browser, Show as HTML in Book details

Each custom column has the following configuration information. All custom columns have ‘Lookup name’, ‘Column heading’, ‘Column type’, ‘Description’, and ‘Default value’. The first three must be specified, the last two may be blank.

Built-In Field	Description/Notes
Lookup name	Name of the column internally, used for queries and templates. For custom columns this is presented here without adornment, but when referenced in a template or the like will have ‘#’ prefixed. When accessed via the calibredb command line application, will have ‘*’ prefixed.
Column heading	Heading to use when presenting the field (in the book list or in the book details pane).
Column type	Column type as defined above.
Description	Text describing/explaining the field’s purpose and intent. May be blank.
Default value	Value to assign to the book if no value is provided by the user/on input. May be blank.

Each custom column may have additional configuration fields, as described below.

Additional Field	Description/Notes
Allow half stars	Used by ‘Ratings’, indicates whether half-star values (3.5 stars) are allowed.
Column heading	Label to use when displaying the custom column.
Colors	Text, fixed set of values can have the field colored based on the value picked. May be blank or have one named color for each value. Does not affect other columns (use the ‘column coloring’ rules via ‘Look & feel’).
Contains names	Comma separated text contains a list of peoples’ names; when presented in the Tag browser they are grouped by first initial of last name and presented in order of (last name, given names).
Decimals when editing	How many digits to allow after the decimal when editing.
Format for dates	How to present the date/time, per Python formatting syntax.
Format for numbers	How to present the number (integer or floating point), per Python formatting syntax.
Interpret this column as	How to render the text in a long text column: ‘short text, like a title’ (no linebreaks), ‘plain text’ (raw text with linebreaks), ‘HTML’, or ‘markdown’.
Show	How to render a Yes/No field: icon (check/X), text (yes/no), both.
Show as HTML in Book details	Indicates whether a derived column should be formatted as HTML (mostly useful to allow hyperlinks to be opened in the browser).
Show checkmarks	Shows a green checkmark if the value is ‘checked’, ‘true’, or ‘yes’; shows a red X for ‘unchecked’, ‘false’, or ‘no’; otherwise shows nothing.
Show in comments in book details	Indicates whether a derived column should be presented as a comment in the book details pane.
Show in tag browser	Indicates whether a derived column should be displayed in the tag browser. It appears almost all other types can be, but these ones might need to be included explicitly because of what they do to performance — populating the tag browser requires calculating the values for all books.
Sort/search column by	How to interpret a derived column (text, number, date, yes/no).
Template	Python template used to create a column built from other fields. May call functions or formatting instructions. If you want to nest functions, such as calling ‘f(g(x))’, it appears you need to have another field for the intermediate value.
Values	Comma-separated list of values allowed in a constrained column. Stored in the column definition in the database, not as a database lookup table.

This is a fairly rich set of options, when it comes to defining fields. However, I believe I see ways to make it a little more generic.

Media Library Field Definitions

It is not yet time to define actual fields, but I think I can see some common elements in the field types and definitions above that could make things more abstract and more powerful.

Primitive Data Types

All the usual suspects: string, number (integer or floating point), date, date/time (timestamp), boolean.

Complex Data Types

The only complex data type I have at the moment is ‘name’. This is used for people names, book titles, and so on. Has a ‘display value’ and a ‘sort value’.

I considered including ‘list’ here, but I feel that might be better as a data type modifier.

Lists

An entity may have a list of values of almost any type. A book might have a list of titles or of print dates. A file might have a list of fully-qualified file names (these should be unique, but would let me restore the original file if needed… or identify that the file at that URI is already loaded). Most of the other examples I thought of tonight would probably be better as a list of references (i.e. many:many relationship to another entity), but I can see uses for this.

A list should be able to have arbitrary delimiters. For some, commas might be appropriate, for others ‘:::’, others might like a ‘pipe’ (‘|’). This should probably be configurable in the field definition.

References

Some attributes will be connections to other entities. A book has authors (or ‘many people in many roles’ — authors, editors, developers, etc.) and ‘book tags’. In both cases these could be many:many relationships (just a book has multiple authors, an author may have multiple books). A book in a series would not have ‘series’ and ‘series_index’ fields, it likely would be a reference to a ‘series’ entity with an index value on the relationship.

Speaking of which, the relationship likely should identify any secondary values stored with that relationship. In the case of a series, I should be able to ask “what is the first book in this series?” as easily as “what series is this book in?”

As with lists, I should be able to define any delimiters needed. And possible a template for parsing/presenting the information.

Hierarchies

Almost any entity can be hierarchical: documents have subdocuments, files have subfiles, series have subseries, and so on.

Closing Comments

Calibre has a rich set of options when it comes to custom columns. Where calibre builds quite a bit of behavior into the column type, I think I can get something a little more powerful by going to a combination of primitives and modifiers to their behavior. This should give me more options overall, while also allowing me to generalize the implementation.