Retrieving Call Num at LoC
Posted: Thu Apr 04, 2013 8:57 pm
I wanted to update the "Call Num" field (user5) in my database with the Library of Congress Classification Number (LCCN). Sorting on "Call Num" would then put my database into subject heading order.
I first built a a new format in the Format Manager to output just the ISBN. I created a text file (ISBN.txt) that held these numbers each on their own line.
I then built a Perl script that used the Z39.50 gateway to interrogate the Library of Congress (LoC) and return a MARC record, which was parsed to extract the LCCN. The ISBN and the LCCN were combined as output to an SQL file. The appropriate SQL command is: <UPDATE thereferences SET user5 = 'LCCN' WHERE user6 = 'ISBN' ; >. This SQL file can be used by Valentina-DB (http://www.valentina-db.com/) which is the underlying data engine to Bookends. There is one line in the SQL file for each successful retrieval.
The Perl script was very difficult to write, the Z39.50 module was difficult to install, the script ran slowly (over 4 hours for 3221 ISBNs with 2359 successful LCCNs retrieved). The success rate suggested that I needed to look elsewhere for a solution. [4.5 hrs / 3221 records = 5 seconds/query]
My neighbor, who is a librarian, suggested I research WorldCat. I found an experimental API (http://oclc.org/developer/documentation ... /using-api).
I built an Applescript that moves through the ISBNs (just like before) and queries the OCLC url. WorldCat returns XML code that can be parsed to extract the LCCN. I built another set of ISBNs (most of which did not have records at the LoC). These 828 ISBNs were used by my Applescript which after a successful retrieval built the appropriate SQL command (just like above). The Applescript ran in 20 minutes. [ 20 min / 828 records = 1.5 seconds/query]
The Applescript routine is easily to understand and modify; the underlying XML modules are freely available on the web; and it's so much faster!
The SQL files run in less that 1 second! The Valentina-DB engine is just awesome!
There's are several drawbacks: (1) You need to buy Valentina-DB ($199), (2) you need to find version 4.9.1 to work with Bookends 11 (easy to do), and (3) you need to work with a copy of your database just incase something doesn't work (cumbersome but necessary).
Here is the Applescript itself (the comments should clarify the installation):
# This script needs XML Tools.osax installed in ~/Library/ScriptingAdditions.
# It is freely available from: http://www.latenightsw.com/freeware/XML ... index.html.
# the following routines were cut and pasted from:
# http://www.latenightsw.com/freeware/XML ... ities.html
on getAnElement(theXML, theElementName)
-- find and return a particular element (this presumes there is only one instance of the element)
repeat with anElement in XML contents of theXML
if class of anElement is XML element and ¬
XML tag of anElement is theElementName then
return contents of anElement
end if
end repeat
return missing value
end getAnElement
on getElementFromPath(theXML, theElementPath)
if theElementPath is {} then
return theXML
else
local foundElement
set foundElement to getAnElement(theXML, item 1 of theElementPath)
if foundElement is not missing value and ¬
class of foundElement is XML element then
return getElementFromPath(foundElement, rest of theElementPath)
else
return missing value
end if
end if
end getElementFromPath
on getElementValue(theXML)
if theXML is missing value or theXML is {} then
return ""
else if class of theXML is string then
return theXML
else
try
return item 1 of XML contents of theXML
on error number -1728
return ""
end try
end if
end getElementValue
# This Applescript assumes that all the files are on the Desktop
# I like the Desktop because it encourages you to trash files that are no longer being used
# You will need to change 'Fritz' (my account name) to your account name
# ISBN.txt should already have been created by Bookends
# Work with a small number of ISBNs at first, say 10-20
# so that you know this routine will work for you
# Temporary.XML will vanish nicely in the last iteration of looping over the ISBNs
# SQLquery.sql will be the input file for 'Load DUMP" in Valentina
set input_id to open for access ¬
file "Macintosh HD:Users:Fritz:Desktop:ISBN.txt"
set output_id to open for access ¬
file "Macintosh HD:Users:Fritz:Desktop:SQLquery.sql" with write permission
set eof output_id to 0
set fileContents to paragraphs of (read input_id) # read whole input file into a list with a final empty paragraph
set ISBN_count to length of fileContents
repeat with i from 1 to ISBN_count - 1 #.........start major loop
set ISBN to item i of fileContents
if i mod 10 = 0 then # print out progress every 10th record
display alert "Working on ISBN record #" & i buttons {} giving up after 1 # very cheap progress meter
end if
# See e.g. http://oclc.org/developer/documentation ... /using-api
set theURL to "http://classify.oclc.org/classify2/Classify?isbn=" & ISBN & "&summary=true"
# curl -s (silent no progress meter) -L (redirection) -o (output)
# curl will create a file for output but it will append data to that file on each iteration
# hence unix rm to delete after each use
do shell script "curl -s -L -o /Users/Fritz/Desktop/Temporary.xml " & theURL
set theXML to parse XML alias "Macintosh HD:Users:Fritz:Desktop:Temporary.xml"
set responseCode to code of XML attributes of getAnElement(theXML, "response")
if responseCode as integer ≤ 2 then # good XML response back; otherwise skip
set elementXML to getAnElement(getAnElement(theXML, "recommendations"), "lcc")
if elementXML is not missing value then # no LCC tag and therefore no LCC
set elementXML to getAnElement(elementXML, "mostPopular")
set LCCN to sfa of XML attributes of elementXML
set LCCN to LCCN & " OCLC" # add OCLC suffix to identify source
set output_string to "UPDATE thereferences SET user5 = '" & LCCN & "' where user6 = '" & ISBN & "' ;"
write output_string & return to output_id
end if
end if
do shell script "rm /Users/Fritz/Desktop/Temporary.xml" # clean up for next iteration
end repeat #..........................................end major loop
close input_id
close output_id
display alert "Successful Completion!"
I first built a a new format in the Format Manager to output just the ISBN. I created a text file (ISBN.txt) that held these numbers each on their own line.
I then built a Perl script that used the Z39.50 gateway to interrogate the Library of Congress (LoC) and return a MARC record, which was parsed to extract the LCCN. The ISBN and the LCCN were combined as output to an SQL file. The appropriate SQL command is: <UPDATE thereferences SET user5 = 'LCCN' WHERE user6 = 'ISBN' ; >. This SQL file can be used by Valentina-DB (http://www.valentina-db.com/) which is the underlying data engine to Bookends. There is one line in the SQL file for each successful retrieval.
The Perl script was very difficult to write, the Z39.50 module was difficult to install, the script ran slowly (over 4 hours for 3221 ISBNs with 2359 successful LCCNs retrieved). The success rate suggested that I needed to look elsewhere for a solution. [4.5 hrs / 3221 records = 5 seconds/query]
My neighbor, who is a librarian, suggested I research WorldCat. I found an experimental API (http://oclc.org/developer/documentation ... /using-api).
I built an Applescript that moves through the ISBNs (just like before) and queries the OCLC url. WorldCat returns XML code that can be parsed to extract the LCCN. I built another set of ISBNs (most of which did not have records at the LoC). These 828 ISBNs were used by my Applescript which after a successful retrieval built the appropriate SQL command (just like above). The Applescript ran in 20 minutes. [ 20 min / 828 records = 1.5 seconds/query]
The Applescript routine is easily to understand and modify; the underlying XML modules are freely available on the web; and it's so much faster!
The SQL files run in less that 1 second! The Valentina-DB engine is just awesome!
There's are several drawbacks: (1) You need to buy Valentina-DB ($199), (2) you need to find version 4.9.1 to work with Bookends 11 (easy to do), and (3) you need to work with a copy of your database just incase something doesn't work (cumbersome but necessary).
Here is the Applescript itself (the comments should clarify the installation):
# This script needs XML Tools.osax installed in ~/Library/ScriptingAdditions.
# It is freely available from: http://www.latenightsw.com/freeware/XML ... index.html.
# the following routines were cut and pasted from:
# http://www.latenightsw.com/freeware/XML ... ities.html
on getAnElement(theXML, theElementName)
-- find and return a particular element (this presumes there is only one instance of the element)
repeat with anElement in XML contents of theXML
if class of anElement is XML element and ¬
XML tag of anElement is theElementName then
return contents of anElement
end if
end repeat
return missing value
end getAnElement
on getElementFromPath(theXML, theElementPath)
if theElementPath is {} then
return theXML
else
local foundElement
set foundElement to getAnElement(theXML, item 1 of theElementPath)
if foundElement is not missing value and ¬
class of foundElement is XML element then
return getElementFromPath(foundElement, rest of theElementPath)
else
return missing value
end if
end if
end getElementFromPath
on getElementValue(theXML)
if theXML is missing value or theXML is {} then
return ""
else if class of theXML is string then
return theXML
else
try
return item 1 of XML contents of theXML
on error number -1728
return ""
end try
end if
end getElementValue
# This Applescript assumes that all the files are on the Desktop
# I like the Desktop because it encourages you to trash files that are no longer being used
# You will need to change 'Fritz' (my account name) to your account name
# ISBN.txt should already have been created by Bookends
# Work with a small number of ISBNs at first, say 10-20
# so that you know this routine will work for you
# Temporary.XML will vanish nicely in the last iteration of looping over the ISBNs
# SQLquery.sql will be the input file for 'Load DUMP" in Valentina
set input_id to open for access ¬
file "Macintosh HD:Users:Fritz:Desktop:ISBN.txt"
set output_id to open for access ¬
file "Macintosh HD:Users:Fritz:Desktop:SQLquery.sql" with write permission
set eof output_id to 0
set fileContents to paragraphs of (read input_id) # read whole input file into a list with a final empty paragraph
set ISBN_count to length of fileContents
repeat with i from 1 to ISBN_count - 1 #.........start major loop
set ISBN to item i of fileContents
if i mod 10 = 0 then # print out progress every 10th record
display alert "Working on ISBN record #" & i buttons {} giving up after 1 # very cheap progress meter
end if
# See e.g. http://oclc.org/developer/documentation ... /using-api
set theURL to "http://classify.oclc.org/classify2/Classify?isbn=" & ISBN & "&summary=true"
# curl -s (silent no progress meter) -L (redirection) -o (output)
# curl will create a file for output but it will append data to that file on each iteration
# hence unix rm to delete after each use
do shell script "curl -s -L -o /Users/Fritz/Desktop/Temporary.xml " & theURL
set theXML to parse XML alias "Macintosh HD:Users:Fritz:Desktop:Temporary.xml"
set responseCode to code of XML attributes of getAnElement(theXML, "response")
if responseCode as integer ≤ 2 then # good XML response back; otherwise skip
set elementXML to getAnElement(getAnElement(theXML, "recommendations"), "lcc")
if elementXML is not missing value then # no LCC tag and therefore no LCC
set elementXML to getAnElement(elementXML, "mostPopular")
set LCCN to sfa of XML attributes of elementXML
set LCCN to LCCN & " OCLC" # add OCLC suffix to identify source
set output_string to "UPDATE thereferences SET user5 = '" & LCCN & "' where user6 = '" & ISBN & "' ;"
write output_string & return to output_id
end if
end if
do shell script "rm /Users/Fritz/Desktop/Temporary.xml" # clean up for next iteration
end repeat #..........................................end major loop
close input_id
close output_id
display alert "Successful Completion!"