Script to create and update BibJSON file

Users asking other users for AppleScripts that work with Bookends.
Post Reply
zvh
Posts: 32
Joined: Sun Aug 27, 2017 12:47 am

Script to create and update BibJSON file

Post by zvh »

If you've seen the previous script I posted to continuously update a .bib file library, you'll know that one of my strategies for this is to use bib2json and jq to return a list of citekeys from my file against which to compare current Bookends references.

Now, I only ever create a .bib file for the purpose of converting it to a .json file anyway. Why? Because JSON also works with Pandoc (indeed, it works quicker with Pandoc than a .bib file) and is more easily parsable in general. There are also a myriad of tools, both on macOS and iOS, for working with JSON and relatively few for doing so with BiBTeX.

Because of this, I've asked myself why I bother at all with the BibTeX intermediary step. Why not just export my citations directly to a JSON file and continuously update this file? Due to the greater parsability, I also reckoned that, were I to use JSON, I'd be able to not only add and subtract references, but also to change recently modified references.

So here is my first attempt at doing this:

https://gist.github.com/zverhope/1b3906 ... b99088a257

This script will create a new JSON file from scratch, which can take a bit of time if you have a large library. The initial run of my 3500 item library took about 90 minutes. Updates, however, will be much, much faster. An 80 reference test of an update, which modified 10 refs and added 70, took a couple of minutes. The resulting JSON was perfectly formatted, easily parsible, and worked great with Pandoc.

Two things: you need somewhere to store the date you last updated, which is the parameter that the script uses to perform an SQL search via Bookends in order to retrieve the references that have been modified since you last ran the script. As you'll see, the script gets this variable from Keyboard Maestro when it starts to run and then fills this variable with the current date when it finishes running. If you don't have Keyboard Maestro or something similar, you could save the date in a text file that you then read and write to when running the script, or even in a given Bookends field if you prefer.

Also, the script requires the "JSON Helper" application, which I use to convert JSON to AppleScript objects (and then back again). You can download this for free through the Mac App Store.

This is only a first version, but I just wanted to put it out there incase it was useful for anyone.
iandol
Posts: 465
Joined: Fri Jan 25, 2008 2:31 pm

Re: Script to create and update BibJSON file

Post by iandol »

If anyone wants to get this to work, Pandoc has deprecated pandoc-citeproc to include citeproc directly, and now pandoc itself can convert BibTeX to CSL-JSON, so this script requires a small update to work...

@zvh -- if you're still around a question: could you save the date as an entry (like a fake refernence) directly inside the JSON file to remove the need to save it somewhere else? Also, is it possible to "bundle" the JSON helper locally or does it need to be properly installed?
zvh
Posts: 32
Joined: Sun Aug 27, 2017 12:47 am

Re: Script to create and update BibJSON file

Post by zvh »

Thanks so much for this @iandol - my year (but especially the last 4 months) has been a bit crazy, which I take to be the general experience, and I haven’t kept up to date with the forum or these scripts. I actually have a few more that I wrote up a while back that I think may be useful for other members, so I’ll revisit this script and the others (posted and unposted) to refine and update them, including for the changes in pandoc for converting BibTeX to JSON.

Your idea of saving the date in the JSON is simple and elegant. I can’t believe I didn’t think of that! I’ll be sure to include this in my update. I should have something within a week or so!
iandol
Posts: 465
Joined: Fri Jan 25, 2008 2:31 pm

Re: Script to create and update BibJSON file

Post by iandol »

I had another idea which was to use metadata for the JSON file, you can add custom metadata using xattr very easily and read it back:

Write:

Code: Select all

set myJSONFile to "/Users/ian/Desktop/Test.json"
set lastUpdate to (current date)
set dateString to lastUpdate as string
set cmd to "xattr -w 'com.apple.metadata:kMDSyncDate#S' '" & dateString & "' '" & myJSONFile & "'"
do shell script cmd
Read:

Code: Select all

set cmd to "xattr -p 'com.apple.metadata:kMDSyncDate#S' '" & myJSONFile & "'"
set syncDate to do shell script cmd
if syncDate contains "No such xattr" then
	set lastUpdate to (current date)
else
	set lastUpdate to (date syncDate)
end if
The #S means Apple should sync this tag, although other cloud systems like Dropbox doesn't, but locally this will be stable and reliable (move the file across local drives without issue). Both options store the date with the file itself, metadata may be a bit more compatible (no need to write a "fake" JSON reference to store the date) but less robust if the file goes through any intermediate travels like a SMB drive etc.
zvh
Posts: 32
Joined: Sun Aug 27, 2017 12:47 am

Re: Script to create and update BibJSON file

Post by zvh »

Ok, so I revisited my AppleScript to update for deprecated code, while also attempting to reliably update (rather than completely regenerating) the bib file on each run. I found dealing with the text to be a bit cumbersome in AppleScript, however, so decided to write a Python script that does the bulk of the work and passes off to an AppleScript helper to communicate with Bookends. You can access both scripts here: https://gist.github.com/zverhope/e48195 ... a21265902c

I've written the scripts in such a way that they can be run without any user modifications, provided the user has the requisite Python modules installed (see the imports at the top of the .py file for reference). (All of the modules but numpy and json are likely installed by default.) If you put these two scripts in any folder, it will generate Library.bib and Library.json files in that folder. If you run directly in Terminal (`python3 /path/to/your/directory/bookends-generate_bib.py`) the script will also print out the progress of the program as it goes through your references. It will generate everything from scratch on first run (which could take a little bit - my library of 3000 references took about 20 minutes) and will then only update references modified since the script last run on subsequent runs. I solved the problem of saving date when last run by actually writing the date into the beginning of the script as a variable at the end of each run. The script replaces the fourth line of its own text with the current date string, which is then used to determine when the bib and json files were last updated on the next run. This seems quite reliable on my end, but I'm not a Python professional, so can't say whether this reflects best practices.

The script will also backup your current bib and json files at the beginning of the run (by copying your current files into new files with `_backup` appended to the file names) as insurance in case something goes wrong.

The one thing you may have to change here is the path to your pandoc, which you can check by typing `which pandoc` into Terminal and then changing line 88 (where it reads `/usr/local/bin/pandoc` to reflect whatever path that command returns.

Warning: I haven't done any testing of this program outside of my own system and use case. This generated perfectly formatted bib and json files for my library of around 3000 references, and reliably updated these records when changes were made to them in Bookends. That said, be cautious (and use backups) if implementing these scripts into your own system.

Here's the code for reference:

First, the Python file:

Code: Select all

#!/usr/bin/env -S PATH="${PATH}:/usr/local/bin" PYTHONIOENCODING=UTF-8 LC_ALL=en_US.UTF-8 python
# -*- coding: utf-8 -*-

then = "20201229160152"

import re, shutil, sys
import os.path, time
from datetime import datetime as dt
import json
from subprocess import Popen, PIPE
import numpy as np
from datetime import datetime as dt

file = os.path.basename(__file__)
__location__ = os.path.realpath(
	os.path.join(os.getcwd(), os.path.dirname(__file__)))

now = dt.today().strftime("%Y%m%d%H%M%S")

time_diff = int((dt.strptime(now, "%Y%m%d%H%M%S") - dt.strptime(then, "%Y%m%d%H%M%S")).total_seconds())
time_diff = str(time_diff)

get_bibs_all = Popen(['osascript', __location__ + '/bookends-generate_bib.scpt', "all", ""], stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True)
get_bibs_mod = Popen(['osascript', __location__ + '/bookends-generate_bib.scpt', "mod", time_diff], stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True)

bibs_all, bibs_all_err = get_bibs_all.communicate()
bibs_mod, bibs_mod_err = get_bibs_mod.communicate()

bibpath = __location__ + '/Library.bib'
jsonpath = __location__ + '/Library.json'
if not os.path.exists(bibpath):
	open(bibpath, 'w').close()
	mybib = ''
else:
	f = open(bibpath, "r")
	mybib = f.read()
	f.close()
	shutil.copy(bibpath, os.path.join(__location__, "Library_backup.bib"))
	
if not os.path.exists(jsonpath):
	open(jsonpath, 'w').close()
	myjson = '[]'
else:
	g = open(jsonpath, "r")
	myjson = g.read()
	g.close()
	shutil.copy(jsonpath, os.path.join(__location__, "Library_backup.json"))

data = json.loads(myjson)
citekeys = map(lambda datum: datum['id'], data)
not_in_bibfile = np.setdiff1d(bibs_all.rstrip("\n").split(", "),list(citekeys))
new_bibs = list(filter(None, not_in_bibfile))
changed_in_bibfile = np.setdiff1d(bibs_mod.rstrip("\n").split(","),new_bibs)
mod_bibs = list(filter(None, changed_in_bibfile))
# removed_from_bibfile = np.setdiff1d(list(citekeys),bibs_all.rstrip("/n").split(", "))

def get_bib(citekey):
	get_bib_record = Popen(['osascript', __location__ + '/bookends-generate_bib.scpt', "get_bib", citekey], stdin=PIPE, stdout=PIPE, stderr=PIPE, universal_newlines=True)
	formatted_bib, formatted_bib_err = get_bib_record.communicate()
	return formatted_bib

print()
print('Getting new bibliography records...')
print()
len_new_bibs = len(new_bibs)
processed_num = 1
for citekey in new_bibs:
	bib_new = get_bib(citekey)
	print(f'Processed record {str(processed_num)} of {str(len_new_bibs)} - {citekey}')
	processed_num = processed_num + 1
	mybib = mybib + '\n' + bib_new

print()
print('Updating recently modified records...')
print()
len_mod_bibs = len(mod_bibs)
processed_num = 1
for citekey in mod_bibs:
	updated_bib = get_bib(citekey)
	mybib = re.sub(r'@\w+\{' + citekey + ',.*?(?=\}\})\}\}', updated_bib.rstrip("\n"), mybib, 0, re.DOTALL)
	print(f'Processed record {str(processed_num)} of {str(len_mod_bibs)} - {citekey}')
	processed_num = processed_num + 1

f = open(bibpath, "w")
f.write(mybib)
f.close()

os.system('cat "' + bibpath + '" | /usr/local/bin/pandoc -f biblatex -t csljson > ' + jsonpath)

now_becomes_then = dt.today().strftime("%Y%m%d%H%M%S")

file = os.path.basename(__file__)

with open(file, "r") as this_file:
	data = this_file.readlines()
	
data[3] = "then = \"" + now_becomes_then + "\"\n"

with open(file, "w") as this_file:
	this_file.writelines( data )
And then the AppleScript helper:

Code: Select all

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

on run argv
	tell application "Bookends" to tell front library window
		if item 1 of argv is "mod" then
			set matchingPubs to sql search "dateModified > datediff( now(), '01/01/1904 00:00:00', 'second' ) - " & (item 2 of argv)
			set citekeys to {}
			repeat with thePub in matchingPubs
				set end of citekeys to citekey of thePub
			end repeat
			set AppleScript's text item delimiters to ","
			return citekeys as string
		else if item 1 of argv is "all" then
			return citekey of publication items of group all
		else if item 1 of argv is "get_bib" then
			set theItem to (publication items whose citekey is (item 2 of argv))
			return format theItem using "BibTeX.fmt"
		end if
	end tell
end run
Post Reply