In looking at the data models, I’ve focused mostly on ‘user-defined columns’ in the expectation that I’d use the same mechanism for ‘built-in columns’.
Which raises the possibility of someone taking the library and bending it in strange ways, doing things I had no reason to think they would.
If you know me, you’ll recognize that I’m totally okay with this. One of the signs of a good tool is that it does well what it’s made to do… but another sign is that it can do well things that were not imagined when it was built.
Regardless, what I haven’t done is look specifically at what metadata I’m starting with (via calibre) and what I might do to extend this.
Calibre Built-In Fields
Calibre has the following built-in fields. These all apply to books, there is no built-in mechanism for other entity types (which basically are strings, just applied in various ways). The ‘order’ is the default order they will be displayed, but this can be overridden in places.
In fact, it appears order is the only field that can be changed in the built-in fields. Everything else is unchangeable.
Order | Column Header | Lookup Name | Type | Description |
0 | On Device | ondevice | Yes/No with text | Flag indicating whether the book is present on the attached device (i.e. ebook reader). |
1 | Title | title | Text | Title of the book. |
2 | Authors | authors | Ampersand separated text, shown in the Tag browser | List of authors, presented in (firstname lastname) order, separated by ampersands. |
3 | Date | timestamp | Date | Date (timestamp) book was loaded into calibre. |
4 | Size | size | Floating point numbers | Size of the largest book file in megabytes (i.e. if a book has a 0.90MB epub and a 0.62MB text file, this will show 0.9MB). Not shown in the document metadata editor (it is not editable) but is shown for each book file. |
5 | Rating | rating | Ratings, shown with stars | ‘No rating’ or 1..5 stars. No decimal values. |
6 | Tags | tags | Comma separated text, like tags, shown in the Tag browser | List of user-specified tags (may be taken from document metadata, such as PDF keywords, but can be edited by the user). |
7 | Series | series | Text column for keeping series-like information | Name of the series and the book’s position within the series. |
8 | Publisher | publisher | Text, column shown in the Tag browser | Name of the book’s publisher. |
9 | Published | pubdate | Date | Date the book was published (shown as ‘MMM yyyy’ by default, can be tweaked via ‘gui_pubdate_display_format’). |
10 | Modified | last_modified | Date | Date the book entry was last updated. Not shown in the metadata editor (it is an internal value and not editable) but can be displayed in the book list and in book details pane. |
11 | Languages | languages | Comma separated text, like tags, shown in the Tag browser | Behaves much like tags, but does have some additional uses (such as the ‘Find Duplicates’ plugin taking languages into account when looking for duplicates). |
There also are some other columns added to support those above.
Order | Column Header | Lookup Name | Type | Description |
? | Author sort | author_sort | Text | List of author name in (last name, given names) order, separated by ampersands. |
? | Comments | comments | Long text | Freeform text (HTML markup, I think) about the book. |
? | cover | Boolean | True if the book has a cover image, otherwise False. | |
? | Identifiers | identifiers | Comma-separated text | Acts something like tags, but with special handling. Each value is ‘idtype:idvalue’ (isbn:isbn-number-goes-here), each id type may be present only once for a book (but IDs of multiple may be present for a single book), and there may be arbitrary types. There might be special handling (ISBN numbers are validated via check digit, and IDs may be used by Find Duplicates to look for duplicated books. |
? | marked | Boolean | True if the book is ‘marked’ (transient status effect), otherwise False. | |
? | series_index | Floating point number | Index of the book in its series, if any. | |
? | Title Sort | title_sort | Text | Book title in ‘sorted’ format (such as leading ‘The’ and ‘A’ moved to the end, “A Brief History of Time” -> “Brief History of Time, A”). |
? | uuid | uuid | Text | Machine-generated Universally Unique Identifier. |
Order is shown as ‘?’ because these are not presented in the list of fields. I found them via the regular expression tab in the bulk metadata editor (powerful tool, but be careful: it comes with no safeties). Not all have column headers, in fact none of them can be displayed as columns in the book list, but the ones with ‘Column Headers’ entries can be presented in the book details pane.
Calibre Custom Fields
Of course, calibre doesn’t come with any custom fields built in. If they were built in, they’d be built-in fields.
That said, calibre does support custom field definitions. The following metadata types are supported.
Column Type | Description | Additional Fields |
Text, column shown in the Tag browser | Show checkmarks | |
Comma separated text, like tags, shown in the Tag browser | Contains names | |
Long text, like comments, not shown in the Tag browser | Column heading, Interpret this column as | |
Text column for keeping series-like information | Has associated [Column]_index field | |
Text, but with a fixed set of permitted values | Hardcoded (in the database and can be changed, but it’s a hardcoded list in the column definition) | Show checkmarks, Values, Colors |
Date | Format for dates | |
Floating point numbers | Format for numbers, Decimals when editing | |
Integers | Format for numbers | |
Ratings, shown with stars | Allow half stars | |
Yes/No | Show | |
Column built from other columns | Uses the template engine to calculate new values based on other book fields. Be warned, ‘column built from other columns’ can do horrible things to performance. | Template, Sort/search column by, Show in Tag browser, Show as HTML in Book details, Show in comments in book details, Column heading |
Column built from other columns, behaves like tags | Template, Sort/search column by, Show in Tag browser, Show as HTML in Book details |
Each custom column has the following configuration information. All custom columns have ‘Lookup name’, ‘Column heading’, ‘Column type’, ‘Description’, and ‘Default value’. The first three must be specified, the last two may be blank.
Built-In Field | Description/Notes |
Lookup name | Name of the column internally, used for queries and templates. For custom columns this is presented here without adornment, but when referenced in a template or the like will have ‘#’ prefixed. When accessed via the calibredb command line application, will have ‘*’ prefixed. |
Column heading | Heading to use when presenting the field (in the book list or in the book details pane). |
Column type | Column type as defined above. |
Description | Text describing/explaining the field’s purpose and intent. May be blank. |
Default value | Value to assign to the book if no value is provided by the user/on input. May be blank. |
Each custom column may have additional configuration fields, as described below.
Additional Field | Description/Notes |
Allow half stars | Used by ‘Ratings’, indicates whether half-star values (3.5 stars) are allowed. |
Column heading | Label to use when displaying the custom column. |
Colors | Text, fixed set of values can have the field colored based on the value picked. May be blank or have one named color for each value. Does not affect other columns (use the ‘column coloring’ rules via ‘Look & feel’). |
Contains names | Comma separated text contains a list of peoples’ names; when presented in the Tag browser they are grouped by first initial of last name and presented in order of (last name, given names). |
Decimals when editing | How many digits to allow after the decimal when editing. |
Format for dates | How to present the date/time, per Python formatting syntax. |
Format for numbers | How to present the number (integer or floating point), per Python formatting syntax. |
Interpret this column as | How to render the text in a long text column: ‘short text, like a title’ (no linebreaks), ‘plain text’ (raw text with linebreaks), ‘HTML’, or ‘markdown’. |
Show | How to render a Yes/No field: icon (check/X), text (yes/no), both. |
Show as HTML in Book details | Indicates whether a derived column should be formatted as HTML (mostly useful to allow hyperlinks to be opened in the browser). |
Show checkmarks | Shows a green checkmark if the value is ‘checked’, ‘true’, or ‘yes’; shows a red X for ‘unchecked’, ‘false’, or ‘no’; otherwise shows nothing. |
Show in comments in book details | Indicates whether a derived column should be presented as a comment in the book details pane. |
Show in tag browser | Indicates whether a derived column should be displayed in the tag browser. It appears almost all other types can be, but these ones might need to be included explicitly because of what they do to performance — populating the tag browser requires calculating the values for all books. |
Sort/search column by | How to interpret a derived column (text, number, date, yes/no). |
Template | Python template used to create a column built from other fields. May call functions or formatting instructions. If you want to nest functions, such as calling ‘f(g(x))’, it appears you need to have another field for the intermediate value. |
Values | Comma-separated list of values allowed in a constrained column. Stored in the column definition in the database, not as a database lookup table. |
This is a fairly rich set of options, when it comes to defining fields. However, I believe I see ways to make it a little more generic.
Media Library Field Definitions
It is not yet time to define actual fields, but I think I can see some common elements in the field types and definitions above that could make things more abstract and more powerful.
Primitive Data Types
All the usual suspects: string, number (integer or floating point), date, date/time (timestamp), boolean.
Complex Data Types
The only complex data type I have at the moment is ‘name’. This is used for people names, book titles, and so on. Has a ‘display value’ and a ‘sort value’.
I considered including ‘list’ here, but I feel that might be better as a data type modifier.
Lists
An entity may have a list of values of almost any type. A book might have a list of titles or of print dates. A file might have a list of fully-qualified file names (these should be unique, but would let me restore the original file if needed… or identify that the file at that URI is already loaded). Most of the other examples I thought of tonight would probably be better as a list of references (i.e. many:many relationship to another entity), but I can see uses for this.
A list should be able to have arbitrary delimiters. For some, commas might be appropriate, for others ‘:::’, others might like a ‘pipe’ (‘|’). This should probably be configurable in the field definition.
References
Some attributes will be connections to other entities. A book has authors (or ‘many people in many roles’ — authors, editors, developers, etc.) and ‘book tags’. In both cases these could be many:many relationships (just a book has multiple authors, an author may have multiple books). A book in a series would not have ‘series’ and ‘series_index’ fields, it likely would be a reference to a ‘series’ entity with an index value on the relationship.
Speaking of which, the relationship likely should identify any secondary values stored with that relationship. In the case of a series, I should be able to ask “what is the first book in this series?” as easily as “what series is this book in?”
As with lists, I should be able to define any delimiters needed. And possible a template for parsing/presenting the information.
Hierarchies
Almost any entity can be hierarchical: documents have subdocuments, files have subfiles, series have subseries, and so on.
Closing Comments
Calibre has a rich set of options when it comes to custom columns. Where calibre builds quite a bit of behavior into the column type, I think I can get something a little more powerful by going to a combination of primitives and modifiers to their behavior. This should give me more options overall, while also allowing me to generalize the implementation.