Script to update (rather than overwriting) bib file

Users asking other users for AppleScripts that work with Bookends.
Post Reply
zvh
Posts: 28
Joined: Sun Aug 27, 2017 12:47 am

Script to update (rather than overwriting) bib file

Post by zvh » Fri Mar 22, 2019 8:40 pm

I've written up a quick and mostly untested script in response to Dellu and iandol's conversation from another thread about updating rather than overwriting a bib file. I thought I'd post it here to foreground the explanation for anyone who might want to use it.

The script itself can be found here: https://gist.github.com/zverhope/1d4d77 ... e049623738

Rather than using a separate index of citekeys to determine what keys to add and/or remove, this script derives its citekeys from the bib file itself, using pandoc-citeproc to provide the list that the script then compares against the citekeys from your open Bookends library. If a citekey is present in Bookends but not in the bib file, it will add the record in BibTeX format to the end of your bibfile.

Two things: to make this work, you need to have both pandoc-citeproc and jq installed. This will usually be done via homebrew. You also need to specify the paths to these functions in the script. You'll notice that both of these are located in /usr/local/bin/ on my system. To find out where they are on yours (if installed) type

Code: Select all

which pandoc-citeproc
and

Code: Select all

which jq
into your Terminal and then update these in the script (along with the path to your bib file) accordingly.

Finally and perhaps most importantly: the most difficult part of making the script was actually figuring out how to make Bookends print newlines in the output so that these could be read by grep in a shell script. This was necessary to figure out where a citekey was in order to remove it from the bib file if no longer present in your Bookends library. I also didn't want to have to include a custom BibTeX.fmt file with this, which would unnecessarily complicate things. So, as it is currently working (for me), each new entry in the bib file constitutes a single line. This means that the remove function searches for the citekey, returns the line, and then removes the line, which is actually the entire bib entry for that record. BUT if your current bib file doesn't have one line for each entry, this won't work properly on your end. In that case, you might also try regenerating the bib file from scratch (which this script will do) or turning off the remove function. You can also try it out to see what happens. The script should backup your bib file to "your.bib.bak" before attempting to remove anything, but do your own backup as well before using this for the first time. Perhaps someone else can come up with a more elegant solution for the removal function, but this works for me right now.

Hope others find this useful. Enjoy!

(Also, credit to iandol and kseggleton for a few of the bones -- particularly iterations by counts of 25 -- of the current script.)

Dellu
Posts: 143
Joined: Sun Mar 27, 2016 5:30 am

Re: Script to update (rather than overwriting) bib file

Post by Dellu » Sun Mar 24, 2019 10:50 am

First of all: thank you so very much for your script dear Zach. This is going to be a very useful script.

I have been playing with it. I couldn't get it finish the process because I have a large library (about 12,000 references).

The current script took over 46 minutes to process my references;(and I then gave up and stopped it).

Here is a screenshot of the timer in Script Debugger.
Image

From your description, I am understanding that the script collects the CiteKeys of the main database and compares them with those in BE.
Rather than using a separate index of citekeys to determine what keys to add and/or remove, this script derives its citekeys from the bib file itself, using pandoc-citeproc to provide the list that the script then compares against the citekeys from your open Bookends library. If a citekey is present in Bookends but not in the bib file, it will add the record in BibTeX format to the end of your bibfile.

I was thinking if query would be reversed. The objective here is to push a small set of updated/added references in BE into a large Bib database. The references that will be pushed will be only those that are created/modified after a specific time frame (a perfect workflow would run the script only on those references that have been modified or created after the previous running of this same script).


So, I felt that the the processes would be more efficient:
if the script:
a) target a specific group in BE: a group, for example that contains all the references modified/added in the last 2 weeks.
b) then collect the keys inside this group and search/compare them with those in the main bib database. If KeyX exists in the main database, it skips/overwrites; of KeyY doesn't exist, the reference would be added to the database.
Because the search will be from a small set of keys/references to the large set, it could be faster (more efficient).

(please correct me if I don't understand what you are doing with the script: or, simpler way of speeding it up).
turning off the remove function
I don't understand which part of the script does that. I want to turn it off it can speed up the process. I am not interested in deleting the references that are not in my library. A less costly way to remove them would be to do a complete overwrite of the bib library once in a couple of years.

zvh
Posts: 28
Joined: Sun Aug 27, 2017 12:47 am

Re: Script to update (rather than overwriting) bib file

Post by zvh » Sun Mar 24, 2019 4:55 pm

The version of the script below won't look to delete any entries from your bib file, however this won't speed the script up very much. The only time the script should take a long, long time is if you're building your bib file with every record from scratch. Otherwise, the script does precisely what you're thinking it should do: it only adds whatever entries aren't already in the bib file. Which shouldn't take more than a minute or two at most, assuming there are only a dozen or so new references each time.

Code: Select all

set myBibFile to "/Users/zhope/Dropbox/Sundry/Library.bib"

if myBibFile exists then
	set currentBibs to do shell script "/usr/local/bin/pandoc-citeproc --bib2json " & quoted form of myBibFile & " | /usr/local/bin/jq -r ' .[] | \"\\(.id)\"'"
	set currentBibs to paragraphs of currentBibs
end if

set bibFile to POSIX file myBibFile

my write_to_file("", bibFile, true)

set bibs2add to {}

tell application "Bookends"
	tell front library window
		set theIDs to get citekey of publication items of group all
		repeat with x from 1 to count of items of theIDs
			set n to item x of theIDs
			if n is in theIDs and n is not in currentBibs then set end of bibs2add to n
		end repeat
		set bibCount to count of bibs2add
		set steps to 25
		set nLoop to round (bibCount / steps) rounding up
		set thisLoop to 1
		repeat while thisLoop is less than or equal to nLoop
			-- set the batch index range
			set startindex to (steps * thisLoop) - (steps - 1)
			set endindex to (steps * thisLoop)
			if endindex is greater than bibCount then
				set endindex to bibCount
			end if
			set thisListItems to items startindex thru endindex of bibs2add
			set myBibs to ""
			repeat with theKey in thisListItems
				set matchingPub to (publication items whose citekey is theKey)
				-- set theID to id of first item of matchingPub
				set theRecord to (format matchingPub using "BibTeX.fmt") as string
				-- set theRecord to «event ToySGUID» theID given «class RRTF»:"false", string:"bibtex"
				set myBibs to myBibs & theRecord & linefeed
			end repeat
			my write_to_file(myBibs, bibFile, true)
			set thisLoop to thisLoop + 1
		end repeat
	end tell
end tell

-- Sub-Routine: credit to https://www.macosxautomation.com/applescript/sbrt/sbrt-09.html but modified for utf-8 encoding

on write_to_file(this_data, target_file, append_data)
	try
		set the target_file to the target_file as string
		set the open_target_file to open for access file target_file with write permission
		if append_data is false then set eof of the open_target_file to 0
		write this_data to the open_target_file as «class utf8» starting at eof
		close access the open_target_file
		return true
	on error
		try
			close access file target_file
		end try
		return false
	end try
end write_to_file

Post Reply