
Summary
Calibre provides a user interface to accomplish all of the common ebook library functions. But sometimes it is much faster to write a short script to get the job done, especially if you have to deal with thousands of ebooks at a time. In this post I will document some of the short python programs that I have used before I forget them. Calibre also provides a command line interface to do some of the same things that the user interface provides. Sometimes it is faster to use the command line interface instead of the GUI (Graphical User Interface).
Using sqlite3
Lets say our Calibre library is at ‘/mnt/sdc1/Library’. To access the calibre library database you would type:
sqlite3 /mnt/sdc1/library/metadata.db
To get a list of available commands type .help at the prompt. Here is some useful commands.
Command | Info |
---|---|
.databases | List names and files of attached databases |
.dump ?TABLE? ... | Dump the database in an SQL text format If TABLE specified, only dump tables matching LIKE pattern TABLE. |
.exit | Exit this program |
.header(s) ON|OFF | Turn display of headers on or off |
.help | Show this message |
.quit | Exit this program |
.schema ?TABLE? | Show the CREATE statements If TABLE specified, only show tables matching LIKE pattern TABLE. |
.tables ?TABLE? | List names of tables If TABLE specified, only list tables matching LIKE pattern TABLE. |
Here is a python code that will connect to a given Calibre database. It will also walk through a directory tree visiting each file in turn. This program does not do anything useful, but it can be used as a template.
#!/bin/python import sys, os, time, string, re import sqlite3 def process_file(con, cursor, fname, dname): print "\n" print "Processing: " + fname + " in " + dname if (len(sys.argv) != 3): print "Usage: test1.py <starting_dir> <calibre_db>" sys.exit(0) sdir = sys.argv[1] db = sys.argv[2] + '/metadata.db' con = sqlite3.connect(db) cursor = con.cursor() if (not os.path.isdir(sdir)): print "Starting path must be a directory.." sys.exit(0) for root, dirs, files in os.walk(sdir): print "(%s, %s, %s) " % (root, dirs, files) for f in files: process_file(con, cursor, f, root) cursor.close()
Database Structure
If you type .tables command at the sqlite3 prompt, you will see the list of tables in Calibre database. The main table is the books table. The .schema books shows the structure of this table. The first column of every table is an integer column called ‘id’. This is incremented automatically for each entry and is guaranteed to be a unique number.
Note that we can not assume that the last ebook loaded is going to have the highest id in the books table queried by ‘select max(id) from books’. This is because when books are deleted at the end of the table, it is possible that the ebook will have a smaller id than the max(id).
Custom Columns
When we create a new custom column, a new entry is made into ‘custom_columns’ table. For example, the following table shows two custom columns.
sqlite> select * from custom_columns; 1|shadigest|Digest|text|0|1|{"use_decorations": 0}|0|1 2|orgpath|Old Path|text|0|1|{"use_decorations": 0}|0|1
The first field is the custom column number, the second is the look up name, and the third field is the display name in the tag browser. For each custom column, two other tables are created: custom_column_ID, and books_custom_column_ID_link tables. Replace the ID in the table names with the id from the custom_columns table above.
The actual data is stored in the _ID table, and a link is made pointing to the ebook ID and the data ID in the _link table.
Calculating sha256 hash value
Here is a python script to calculate sha256 hash value for each file. Secure Hash Algorithm creates a virtually unique ID for each ebook from the contents of the ebook. This value can be used as an identifier to prevent loading duplicate ebooks, or to identify a specific ebook in the Calibre library.
#!/bin/python import sys, os, time, string, re import sqlite3, hashlib def hashfile(afile, hasher, blocksize=65536): buf = afile.read(blocksize) while len(buf) > 0: hasher.update(buf) buf = afile.read(blocksize) return hasher.hexdigest() def process_file(con, cursor, fname, dname): print "\n" print "Processing: " + fname + " in " + dname hexd = hashfile(file(dname + '/' + fname, 'r'), hashlib.sha256()) print hexd return hexd if (len(sys.argv) != 3): print "Usage: test.py <sdir> <db>" sys.exit(0) sdir = sys.argv[1] db = sys.argv[2] con = sqlite3.connect(db) cursor = con.cursor() if (not os.path.isdir(sdir)): print "Starting path must be a directory.." sys.exit(0) for root, dirs, files in os.walk(sdir): print "(%s, %s, %s) " % (root, dirs, files) for f in files: shasum = process_file(con, cursor, f, root) cursor.close()
Finding ebook ID for each ebook in Calibre library
The following python script visits each file in the library and uses the given regular expression to decode the book ID from the path to the ebook.
#!/bin/python import sys, os, time, string, re import sqlite3 def process_file(con, cursor, fname, dname): print "\n" print "Processing: " + fname + " in " + dname tkns = string.split(dname + '/' + fname, '/') m = re.search(r".+\((\d+)\)$", tkns[-2]) print m.group(1) return m.group(1) if (len(sys.argv) != 3): print "Usage: test.py <sdir> <calibre db>" sys.exit(0) sdir = sys.argv[1] db = sys.argv[2] con = sqlite3.connect(db) cursor = con.cursor() if (not os.path.isdir(sdir)): print "Starting path must be a directory.." sys.exit(0) for root, dirs, files in os.walk(sdir): print "(%s, %s, %s) " % (root, dirs, files) for f in files: if ((f == 'metadata.db') or (f[-4:] == '.jpg') or (f[-4:] == '.opf')): continue process_file(con, cursor, f, root) cursor.close()
Updating Secure Hash Algorithm custom column for each ebook
The following python script visits each ebook and calculates the sha256 checksum for each ebook. If the ebook does not have that value set, it wil make an entry into a custom column called shadigest. Note: Create custom column called shadigest before running this python script.
#!/bin/python import sys, os, time, string, re, copy import sqlite3, hashlib def hashfile(afile, hasher, blocksize=65536): buf = afile.read(blocksize) while len(buf) > 0: hasher.update(buf) buf = afile.read(blocksize) return hasher.hexdigest() def find_digest(con, cursor, fname, dname): return hashfile(file(dname + '/' + fname, 'r'), hashlib.sha256()) def query_db(con, cursor, val, db, what): cursor.execute("select %s from %s where %s" % (val, db, what)) rows = cursor.fetchone() return rows def insertPair(con, cursor, b_id, digest, CC, BCCL): cursor.execute("insert into "+ CC + " (value) values ('%s')" % (digest)) con.commit() time.sleep(0) if (True): cursor.execute("select max(id) from " + CC) row = cursor.fetchone() cursor.execute("insert into "+ BCCL + " (book,value) values (%s, %s)" % (b_id, row[0])) con.commit() time.sleep(0) def process_file(con, cursor, fname, dname): print "\n" print "Processing: " + fname + " in " + dname tkns = string.split(dname + '/' + fname, '/') m = re.search(r".+\((\d+)\)$", tkns[-2]) return m.group(1) if (len(sys.argv) != 2): print "Usage: test.py <calibredir>" sys.exit(0) sdir = sys.argv[1] db = sys.argv[1] + '/' + 'metadata.db' con = sqlite3.connect(db) cursor = con.cursor() if (not os.path.isdir(sdir)): print "Starting path must be a directory.." sys.exit(0) def getColumn(con, cursor, label): cus_col = query_db(con, cursor, 'id', 'custom_columns', 'label="%s"' % (label)) if ((cus_col == None) or (len(cus_col) == 0)): print "Can not access custom column "+ label + ", exiting!" sys.exit(0) return str(cus_col[0]) col = getColumn(con, cursor, 'shadigest') BCCL = 'books_custom_column_' + col + '_link' CC = 'custom_column_' + col books = [] for root, dirs, files in os.walk(sdir): for f in files: if ((f == 'metadata.db') or (f[-4:] == '.jpg') or (f[-4:] == '.opf')): continue b_id = process_file(con, cursor, f, root) val = query_db(con, cursor, 'value', BCCL, 'book ='+b_id) print "==>", val if ((val != None) and (len(val))): print "found ", f else: digest = find_digest(con, cursor, f, root) print b_id, digest books.append(b_id) try: insertPair(con, cursor, b_id, digest, CC, BCCL) except: print "Could not insert, sorry", b_id, digest f = open("/tmp/booksToAdd.txt", 'w') for i in books: f.write(i + "\n") f.close() cursor.close()
Updating orgpath Custom Column field
The following script takes two arguments: Calibre directory, and the starting directory to a hierarchy of ebooks already loaded into the library. Every ebook should already have their shadigest field updated so that every ebook can be correlated (=found) to the same ebook in the Calibre library. Once the ebook is found, the custom column orgpath (=original path) is updated from the current directory path to the ebook.
Sometimes there maybe hundreds of ebooks already nicely categorized in separate directories. When these ebooks are loaded into the Calibre library, this original directory hierarchy is lost. This script will store the original path information into orgpath custom column. If this column is marked as ‘hierarcy’, the tag browser will display the original directory hierarcy.
#!/bin/python import sys, os, time, string, re import sqlite3, hashlib def hashfile(afile, hasher, blocksize=65536): buf = afile.read(blocksize) while len(buf) > 0: hasher.update(buf) buf = afile.read(blocksize) return hasher.hexdigest() def insertCustomColumn(con, cursor, OCC, entry): cursor.execute("select id from " + OCC +" where value='%s'" % (entry)) row = cursor.fetchone() if ((row == None) or (len(row) == 0)): cursor.execute("insert into " + OCC +" (value) values ('%s')" % (entry)) con.commit() time.sleep(0) cursor.execute("select id from " + OCC +" where value='%s'" % (entry)) row = cursor.fetchone() return row[0] def find_digest(con, cursor, fname, dname): return hashfile(file(dname + '/' + fname, 'r'), hashlib.sha256()) def query_db(con, cursor, val, db, what): cursor.execute("select %s from %s where %s" % (val, db, what)) rows = cursor.fetchone() return rows def insertPair(con, cursor, b_id, path, OCC, OCCL): p_id = insertCustomColumn(con, cursor, OCC, path) cursor.execute("insert into "+OCCL+" (book,value) values (%s, %s)" % (b_id, p_id)) con.commit() time.sleep(0) def process_file(con, cursor, fname, dname): print "\n" print "Processing: " + fname + " in " + dname tkns = string.split(dname + '/' + fname, '/') m = re.search(r".+\((\d+)\)$", tkns[-2]) return m.group(1) def getColumn(con, cursor, label): cus_col = query_db(con, cursor, 'id', 'custom_columns', 'label="%s"' % (label)) if ((cus_col == None) or (len(cus_col) == 0)): print "Can not access custom column "+ label + ", exiting!" sys.exit(0) return str(cus_col[0]) if (len(sys.argv) != 3): print "Usage: test.py <calibredir> <srcdir> " sys.exit(0) db = sys.argv[1] + '/' + 'metadata.db' sdir = sys.argv[2] con = sqlite3.connect(db) cursor = con.cursor() if (not os.path.isdir(sdir)): print "Starting path must be a directory.." sys.exit(0) col = getColumn(con, cursor, 'shadigest') DCC = 'custom_column_' + col DCCL = 'books_custom_column_' + col + '_link' col = getColumn(con, cursor, 'orgpath') OCC = 'custom_column_' + col OCCL = 'books_custom_column_' + col + '_link' for root, dirs, files in os.walk(sdir): dotroot = string.replace(root, '/', '.') for f in files: if ((f == 'metadata.db') or (f[-4:] == '.jpg') or (f[-4:] == '.opf')): continue digest = find_digest(con, cursor, f, root) val = query_db(con, cursor, 'id', DCC, 'value ="'+digest+ '"') print "==>", val if ((val != None) and (len(val))): b_id = query_db(con, cursor, 'book', DCCL, 'value =' + str(val[0])) if ((b_id != None) and (len(b_id))): print "found ", f print "path ", root try: insertPair(con, cursor, b_id[0], dotroot, OCC, OCCL) except: print "Could not insert ", b_id[0] print cursor.close()
