Accessing Calibre database with Python

Share it!

Summary

Calibre provides a user interface to accomplish all of the common ebook library functions. But sometimes it is much faster to write a short script to get the job done, especially if you have to deal with thousands of ebooks at a time. In this post I will document some of the short python programs that I have used before I forget them. Calibre also provides a command line interface to do some of the same things that the user interface provides. Sometimes it is faster to use the command line interface instead of the GUI (Graphical User Interface).

Using sqlite3

Lets say our Calibre library is at ‘/mnt/sdc1/Library’. To access the calibre library database you would type:

sqlite3 /mnt/sdc1/library/metadata.db

To get a list of available commands type .help at the prompt. Here is some useful commands.

Command Info
.databases List names and files of attached databases
.dump ?TABLE? ... Dump the database in an SQL text format
If TABLE specified, only dump tables matching
LIKE pattern TABLE.
.exit Exit this program
.header(s) ON|OFF Turn display of headers on or off
.help Show this message
.quit Exit this program
.schema ?TABLE? Show the CREATE statements
If TABLE specified, only show tables matching
LIKE pattern TABLE.
.tables ?TABLE? List names of tables
If TABLE specified, only list tables matching
LIKE pattern TABLE.


Here is a python code that will connect to a given Calibre database. It will also walk through a directory tree visiting each file in turn. This program does not do anything useful, but it can be used as a template.

#!/bin/python
import sys, os, time, string, re
import sqlite3
 
def process_file(con, cursor, fname, dname):
    print "\n"
    print "Processing: " + fname + " in " + dname
 
if (len(sys.argv) != 3):
    print "Usage: test1.py <starting_dir> <calibre_db>"
    sys.exit(0)
 
sdir = sys.argv[1]
db = sys.argv[2] + '/metadata.db'
 
con = sqlite3.connect(db)
cursor = con.cursor()
 
 
if (not os.path.isdir(sdir)):
    print "Starting path must be a directory.."
    sys.exit(0)
 
for root, dirs, files in os.walk(sdir):
    print "(%s, %s, %s) " % (root, dirs, files)
    for f in files:
        process_file(con, cursor, f, root)
 
cursor.close()

Database Structure

If you type .tables command at the sqlite3 prompt, you will see the list of tables in Calibre database. The main table is the books table. The .schema books shows the structure of this table. The first column of every table is an integer column called ‘id’. This is incremented automatically for each entry and is guaranteed to be a unique number.

Note that we can not assume that the last ebook loaded is going to have the highest id in the books table queried by ‘select max(id) from books’. This is because when books are deleted at the end of the table, it is possible that the ebook will have a smaller id than the max(id).

Custom Columns

When we create a new custom column, a new entry is made into ‘custom_columns’ table. For example, the following table shows two custom columns.

sqlite> select * from custom_columns;
1|shadigest|Digest|text|0|1|{"use_decorations": 0}|0|1
2|orgpath|Old Path|text|0|1|{"use_decorations": 0}|0|1

The first field is the custom column number, the second is the look up name, and the third field is the display name in the tag browser. For each custom column, two other tables are created: custom_column_ID, and books_custom_column_ID_link tables. Replace the ID in the table names with the id from the custom_columns table above.

The actual data is stored in the _ID table, and a link is made pointing to the ebook ID and the data ID in the _link table.

Calculating sha256 hash value

Here is a python script to calculate sha256 hash value for each file. Secure Hash Algorithm creates a virtually unique ID for each ebook from the contents of the ebook. This value can be used as an identifier to prevent loading duplicate ebooks, or to identify a specific ebook in the Calibre library.

#!/bin/python
import sys, os, time, string, re
import sqlite3, hashlib
 
def hashfile(afile, hasher, blocksize=65536):
    buf = afile.read(blocksize)
    while len(buf) > 0:
        hasher.update(buf)
        buf = afile.read(blocksize)
    return hasher.hexdigest()
 
 
def process_file(con, cursor, fname, dname):
    print "\n"
    print "Processing: " + fname + " in " + dname
    hexd = hashfile(file(dname + '/' + fname, 'r'), hashlib.sha256())
    print hexd
    return hexd
 
if (len(sys.argv) != 3):
    print "Usage: test.py <sdir> <db>"
    sys.exit(0)
 
sdir = sys.argv[1]
db = sys.argv[2]
 
con = sqlite3.connect(db)
cursor = con.cursor()
 
 
if (not os.path.isdir(sdir)):
    print "Starting path must be a directory.."
    sys.exit(0)
 
for root, dirs, files in os.walk(sdir):
    print "(%s, %s, %s) " % (root, dirs, files)
    for f in files:
        shasum = process_file(con, cursor, f, root)
cursor.close()

Finding ebook ID for each ebook in Calibre library

The following python script visits each file in the library and uses the given regular expression to decode the book ID from the path to the ebook.

#!/bin/python
import sys, os, time, string, re
import sqlite3
 
def process_file(con, cursor, fname, dname):
    print "\n"
    print "Processing: " + fname + " in " + dname
    tkns = string.split(dname + '/' + fname, '/')
    m = re.search(r".+\((\d+)\)$", tkns[-2])
    print m.group(1)
    return m.group(1)
 
if (len(sys.argv) != 3):
    print "Usage: test.py <sdir> <calibre db>"
    sys.exit(0)
 
sdir = sys.argv[1]
db = sys.argv[2]
 
con = sqlite3.connect(db)
cursor = con.cursor()
 
 
if (not os.path.isdir(sdir)):
    print "Starting path must be a directory.."
    sys.exit(0)
 
for root, dirs, files in os.walk(sdir):
    print "(%s, %s, %s) " % (root, dirs, files)
    for f in files:
        if ((f == 'metadata.db') or (f[-4:] == '.jpg') or (f[-4:] == '.opf')):
            continue
        process_file(con, cursor, f, root)
 
cursor.close()

Updating Secure Hash Algorithm custom column for each ebook

The following python script visits each ebook and calculates the sha256 checksum for each ebook. If the ebook does not have that value set, it wil make an entry into a custom column called shadigest. Note: Create custom column called shadigest before running this python script.

#!/bin/python
import sys, os, time, string, re, copy
import sqlite3, hashlib
 
def hashfile(afile, hasher, blocksize=65536):
    buf = afile.read(blocksize)
    while len(buf) > 0:
        hasher.update(buf)
        buf = afile.read(blocksize)
    return hasher.hexdigest()
 
 
def find_digest(con, cursor, fname, dname):
    return hashfile(file(dname + '/' + fname, 'r'), hashlib.sha256())
 
def query_db(con, cursor, val, db, what):
    cursor.execute("select %s from %s where %s" % (val, db, what))
    rows = cursor.fetchone()
    return rows
 
def insertPair(con, cursor, b_id, digest, CC, BCCL):
    cursor.execute("insert into "+ CC + " (value) values ('%s')" % (digest))
    con.commit()
    time.sleep(0)
    if (True):
        cursor.execute("select max(id) from " + CC)
        row = cursor.fetchone()
        cursor.execute("insert into "+ BCCL + " (book,value) values (%s, %s)" % (b_id, row[0]))
        con.commit()
        time.sleep(0)
 
def process_file(con, cursor, fname, dname):
    print "\n"
    print "Processing: " + fname + " in " + dname
    tkns = string.split(dname + '/' + fname, '/')
    m = re.search(r".+\((\d+)\)$", tkns[-2])
    return m.group(1)
 
if (len(sys.argv) != 2):
    print "Usage: test.py <calibredir>"
    sys.exit(0)
 
sdir = sys.argv[1]
db = sys.argv[1] + '/' + 'metadata.db'
 
con = sqlite3.connect(db)
cursor = con.cursor()
 
 
if (not os.path.isdir(sdir)):
    print "Starting path must be a directory.."
    sys.exit(0)
 
def getColumn(con, cursor, label):
    cus_col = query_db(con, cursor, 'id', 'custom_columns', 'label="%s"' % (label))
    if ((cus_col == None) or (len(cus_col) == 0)):
        print "Can not access custom column "+ label + ", exiting!"
        sys.exit(0)
 
    return str(cus_col[0])
 
col = getColumn(con, cursor, 'shadigest')
BCCL = 'books_custom_column_' + col + '_link'
CC = 'custom_column_' + col
books = []
for root, dirs, files in os.walk(sdir):
    for f in files:
        if ((f == 'metadata.db') or (f[-4:] == '.jpg') or (f[-4:] == '.opf')):
            continue
        b_id = process_file(con, cursor, f, root)
 
        val = query_db(con, cursor, 'value', BCCL,  'book ='+b_id)
        print "==>", val
        if ((val != None) and (len(val))):
            print "found ", f
        else:
            digest = find_digest(con, cursor, f, root)
            print b_id, digest
            books.append(b_id)
            try:
                insertPair(con, cursor, b_id, digest, CC, BCCL)
            except:
                print "Could not insert, sorry", b_id, digest
 
f = open("/tmp/booksToAdd.txt", 'w')
for i in books:
    f.write(i + "\n")
f.close()
 
cursor.close()

Updating orgpath Custom Column field

The following script takes two arguments: Calibre directory, and the starting directory to a hierarchy of ebooks already loaded into the library. Every ebook should already have their shadigest field updated so that every ebook can be correlated (=found) to the same ebook in the Calibre library. Once the ebook is found, the custom column orgpath (=original path) is updated from the current directory path to the ebook.

Sometimes there maybe hundreds of ebooks already nicely categorized in separate directories. When these ebooks are loaded into the Calibre library, this original directory hierarchy is lost. This script will store the original path information into orgpath custom column. If this column is marked as ‘hierarcy’, the tag browser will display the original directory hierarcy.

#!/bin/python
import sys, os, time, string, re
import sqlite3, hashlib
 
def hashfile(afile, hasher, blocksize=65536):
    buf = afile.read(blocksize)
    while len(buf) > 0:
        hasher.update(buf)
        buf = afile.read(blocksize)
    return hasher.hexdigest()
 
def insertCustomColumn(con, cursor, OCC, entry):
    cursor.execute("select id from " + OCC +" where value='%s'" % (entry))
    row = cursor.fetchone()
    if ((row == None) or (len(row) == 0)):
        cursor.execute("insert into " + OCC +" (value) values ('%s')" % (entry))
        con.commit()
        time.sleep(0)
        cursor.execute("select id from " + OCC +" where value='%s'" % (entry))
        row = cursor.fetchone()
    return row[0]
 
def find_digest(con, cursor, fname, dname):
    return hashfile(file(dname + '/' + fname, 'r'), hashlib.sha256())
 
def query_db(con, cursor, val, db, what):
    cursor.execute("select %s from %s where %s" % (val, db, what))
    rows = cursor.fetchone()
    return rows
 
def insertPair(con, cursor, b_id, path, OCC, OCCL):
    p_id = insertCustomColumn(con, cursor, OCC, path)
    cursor.execute("insert into "+OCCL+" (book,value) values (%s, %s)" % (b_id, p_id))
    con.commit()
    time.sleep(0)
 
def process_file(con, cursor, fname, dname):
    print "\n"
    print "Processing: " + fname + " in " + dname
    tkns = string.split(dname + '/' + fname, '/')
    m = re.search(r".+\((\d+)\)$", tkns[-2])
    return m.group(1)
 
def getColumn(con, cursor, label):
    cus_col = query_db(con, cursor, 'id', 'custom_columns', 'label="%s"' % (label))
    if ((cus_col == None) or (len(cus_col) == 0)):
        print "Can not access custom column "+ label + ", exiting!"
        sys.exit(0)
 
    return str(cus_col[0])
 
 
 
if (len(sys.argv) != 3):
    print "Usage: test.py <calibredir> <srcdir> "
    sys.exit(0)
 
db = sys.argv[1] + '/' + 'metadata.db'
sdir = sys.argv[2]
 
con = sqlite3.connect(db)
cursor = con.cursor()
 
 
if (not os.path.isdir(sdir)):
    print "Starting path must be a directory.."
    sys.exit(0)
 
col = getColumn(con, cursor, 'shadigest')
DCC = 'custom_column_' + col
DCCL = 'books_custom_column_' + col + '_link'
col = getColumn(con, cursor, 'orgpath')
OCC = 'custom_column_' + col
OCCL = 'books_custom_column_' + col + '_link'
 
for root, dirs, files in os.walk(sdir):
    dotroot = string.replace(root, '/', '.')
    for f in files:
        if ((f == 'metadata.db') or (f[-4:] == '.jpg') or (f[-4:] == '.opf')):
            continue
        digest = find_digest(con, cursor, f, root)
        val = query_db(con, cursor, 'id', DCC,  'value ="'+digest+ '"')
        print "==>", val
        if ((val != None) and (len(val))):
            b_id = query_db(con, cursor, 'book', DCCL,  'value =' + str(val[0]))
            if ((b_id != None) and (len(b_id))):
                print "found ", f
                print "path  ", root
                try:
                    insertPair(con, cursor, b_id[0], dotroot, OCC, OCCL)
                except:
                    print "Could not insert ", b_id[0]
                print
cursor.close()
Share it!
This entry was posted in Projects and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

[keycaptcha]