The below script extracts metadata for the frontmost Safari page and creates a new Bookends publication with these metadata.
The createBookendsPublication() handler shows how a Bookends publication can be created & populated from AppleScript.
The script should work well with journal article pages from publishers like PLoS, Nature, Biomed Central, PubMed, PubMed Central, etc, and it should work ok with news article publishers like New York Times. However, since every site offers different metadata & value formats, the script could benefit from a lot more testing & tweaking. The JavaScript query patterns likely will need to get adopted to your most-used sources.
As an alternative approach, if you're able to extract a common bibliographic identifier (like a DOI, PMID, etc) from your web site, then it may be better to use Bookends' own
"Quick Add" feature which can also be scripted.
Code: Select all
-- Extracts page metadata for the frontmost web page that's currently displayed in Safari and
-- creates a new Bookends "Internet" or "Journal article" publication with these metadata.
-- by Matthias Steffens, keypoints.app
-- TODO: better extraction of multiple authors & keywords
-- TODO: create publication types other than "Internet" or "Journal article" based on the given metadata
-- TODO: support more metadata, e.g. from <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03351-2>
-- Defines the JavaScript query patterns for website metadata to be extracted.
-- Each list may contain several patterns which will be executed first to last until
-- there's a pattern that returns content.
property pageTitleQuery : {"document.querySelector('meta[name=\"dc.title\"]').content", "document.querySelector('meta[name=\"citation_title\"]').content", "document.querySelector('meta[property=\"og:title\"]').content", "document.title"}
property authorQuery : {"document.querySelector('meta[name=\"citation_authors\"]').content.replace(/( [A-Z]+)\\b/g, ',$1').split(';').join('\\n').trim()", "document.querySelector('meta[name=\"author\"]').content", "document.querySelector('meta[name=\"dc.creator\"]').content", "document.querySelector('meta[name=\"citation_author\"]').content", "document.querySelector('meta[name=\"dc.contributor\"]').content"} -- TODO: multiple authors must be on a separate lines & ideally formatted as: Surname, First name(s) or Initials
property writerQuery : {"document.querySelector('meta[name=\"writer\"]').content"} -- TODO: currently not used by `createBookendsPublication()`
property institutionQuery : {"document.querySelector('meta[name=\"institution\"]').content", "document.querySelector('meta[name=\"citation_author_institution\"]').content"} -- TODO: currently not used by `createBookendsPublication()`
property publicationQuery : {"document.querySelector('meta[name=\"prism.publicationName\"]').content", "document.querySelector('meta[name=\"citation_journal_title\"]').content"}
property publicationYearQuery : {"document.querySelector('meta[name=\"dc.date\"]').content.match(/^[0-9]{4}/)[0]", "document.querySelector('meta[property=\"article:published_time\"]').content.split('-')[0]", "document.querySelector('meta[name=\"pubdate\"]').content.match(/^[0-9]{4}/)[0]", "document.querySelector('meta[name=\"citation_publication_date\"]').content.match(/^[0-9]{4}/)[0]"}
property publicationDateQuery : {"document.querySelector('meta[name=\"dc.date\"]').content", "document.querySelector('meta[property=\"article:published_time\"]').content.match(/^[0-9]{4}-[0-9]{2}-[0-9]{2}/)[0]", "document.querySelector('meta[name=\"pubdate\"]').content.replace(/^([0-9]{4})([0-9]{2})([0-9]{2})/, '$1-$2-$3')", "document.querySelector('meta[name=\"citation_date\"]').content"}
property publicationVolumeQuery : {"document.querySelector('meta[name=\"prism.volume\"]').content", "document.querySelector('meta[name=\"citation_volume\"]').content"}
property publicationIssueQuery : {"document.querySelector('meta[name=\"prism.number\"]').content", "document.querySelector('meta[name=\"citation_issue\"]').content"}
property publicationFirstPageQuery : {"document.querySelector('meta[name=\"prism.startingPage\"]').content", "document.querySelector('meta[name=\"citation_firstpage\"]').content"}
property publicationLastPageQuery : {"document.querySelector('meta[name=\"prism.endingPage\"]').content", "document.querySelector('meta[name=\"citation_lastpage\"]').content"}
property pageDescriptionQuery : {"document.querySelector('meta[name=\"dc.description\"]').content", "document.querySelector('meta[name=\"description\"]').content", "document.querySelector('meta[property=\"og:description\"]').content"}
property pageKeywordsQuery : {"document.querySelector('meta[name=\"keywords\"]').content.split(/[,;] */).map(s => s.trim()).join('\\n')", "document.querySelector('meta[name=\"news_keywords\"]').content.split(/[,;] */).map(s => s.trim()).join('\\n')", "document.querySelector('meta[name=\"dc.subject\"]').content"}
property publisherQuery : {"document.querySelector('meta[name=\"dc.publisher\"]').content", "document.querySelector('meta[name=\"DC.Publisher\"]').content", "document.querySelector('meta[name=\"citation_publisher\"]').content", "document.querySelector('meta[property=\"og:site_name\"]').content", "document.querySelector('meta[name=\"publisher\"]').content"}
property issnQuery : {"document.querySelector('meta[name=\"prism.issn\"]').content", "document.querySelector('meta[name=\"citation_issn\"]').content"}
property doiQuery : {"document.querySelector('meta[name=\"DOI\"]').content", "document.querySelector('meta[name=\"citation_doi\"]').content", "document.querySelector('meta[name=\"prism.doi\"]').content.replace(/^doi:(.+)/, '$1')", "document.querySelector('meta[name=\"dc.identifier\"]').content.replace(/^doi:(.+)/, '$1')"}
property pmidQuery : {"document.querySelector('meta[name=\"citation_pmid\"]').content"}
-- These two lists map the metadata keys to their corresponding JavaScript query patterns, i.e.,
-- the first item in `keysList` defines the metadata key name for the first item in `queriesList` etc.
-- NOTES:
-- - Both lists must have an equal item count.
-- - If you add more keys & patterns, you also need to add support for these in `createBookendsPublication()`
property keysList : {"pageTitle", "author", "writer", "institution", "publication", "publicationYear", "publicationDate", "publicationVolume", "publicationIssue", "publicationFirstPage", "publicationLastPage", "pageDescription", "pageKeywords", "publisher", "issn", "doi", "pmid"}
property queriesList : {pageTitleQuery, authorQuery, writerQuery, institutionQuery, publicationQuery, publicationYearQuery, publicationDateQuery, publicationVolumeQuery, publicationIssueQuery, publicationFirstPageQuery, publicationLastPageQuery, pageDescriptionQuery, pageKeywordsQuery, publisherQuery, issnQuery, doiQuery, pmidQuery}
use framework "Foundation"
use scripting additions
on run
if (count of keysList) ≠ (count of queriesList) then
display alert "Incorrect metadata <-> query mapping" message "Please open this script and edit the properties `keysList` and `queriesList` so that they have matching elements." as critical buttons {"OK"} default button "OK" giving up after 10
return
end if
set pageMetadata to my pageMetadataFromSafari()
if pageMetadata is not {} then
set bookendsPublication to my createBookendsPublication(pageMetadata)
end if
end run
-- Extracts page metadata for the frontmost web page that's currently displayed in Safari.
on pageMetadataFromSafari()
set accessDate to my formattedDateString(current date)
set pageMetadata to {accessDate:accessDate}
tell application "Safari"
set pageURL to front document's URL
if pageURL is missing value then
display alert "Missing Safari content" message "Please open a website in Safari and run this script again." as critical buttons {"OK"} default button "OK" giving up after 10
return {}
end if
set pageMetadata to pageMetadata & {pageURL:pageURL}
set metadataValues to {}
repeat with theQueries in queriesList
set theResult to my executeJavascript(theQueries)
copy theResult to end of metadataValues
end repeat
set pageMetadata to pageMetadata & (my recordFromKeys:keysList andValues:metadataValues)
end tell
return pageMetadata
end pageMetadataFromSafari
-- Creates a new Bookends "Internet" or "Journal article" publication with the given metadata.
on createBookendsPublication(pubData)
tell application "Bookends"
tell front library window
set aPub to make new publication item with properties {type:16, user3:pubData's accessDate, url:pubData's pageURL}
end tell
set pubTitle to my valueForKey:"pageTitle" inRecord:pubData
if pubTitle is not missing value and pubTitle is not "" then set aPub's title to pubTitle
set author to my valueForKey:"author" inRecord:pubData
if author is not missing value and author is not "" then set aPub's authors to author
set pubDate to my valueForKey:"publicationDate" inRecord:pubData
if pubDate is missing value or pubDate is "" then set pubDate to my valueForKey:"publicationYear" inRecord:pubData
if pubDate is not missing value and pubDate is not "" then set aPub's publication date string to pubDate
set pubJournal to my valueForKey:"publication" inRecord:pubData
if pubJournal is not missing value and pubJournal is not "" then
set aPub's journal to pubJournal
set aPub's type to 9
end if
set pubVolume to my valueForKey:"publicationVolume" inRecord:pubData
set pubIssue to my valueForKey:"publicationIssue" inRecord:pubData
if pubVolume is not missing value and pubVolume is not "" then
if pubIssue is not missing value and pubIssue is not "" then set pubVolume to pubVolume & "(" & pubIssue & ")"
set aPub's volume to pubVolume
end if
set publicationPages to my valueForKey:"publicationFirstPage" inRecord:pubData
set publicationLastPage to my valueForKey:"publicationLastPage" inRecord:pubData
if publicationPages is not missing value and publicationPages is not "" then
if publicationLastPage is not missing value and publicationLastPage is not "" then set publicationPages to publicationPages & "-" & publicationLastPage
set aPub's pages to publicationPages
end if
set pubAbstract to my valueForKey:"pageDescription" inRecord:pubData
if pubAbstract is not missing value and pubAbstract is not "" then set aPub's abstract to pubAbstract
set pubKeywords to my valueForKey:"pageKeywords" inRecord:pubData
if pubKeywords is not missing value and pubKeywords is not "" then set aPub's keywords to pubKeywords
set pubPublisher to my valueForKey:"publisher" inRecord:pubData
if pubPublisher is not missing value and pubPublisher is not "" then set aPub's publisher to pubPublisher
set pubISSN to my valueForKey:"issn" inRecord:pubData
if pubISSN is not missing value and pubISSN is not "" then set aPub's user6 to pubISSN
set pubDOI to my valueForKey:"doi" inRecord:pubData
if pubDOI is not missing value and pubDOI is not "" then set aPub's doi to pubDOI
set pubPMID to my valueForKey:"pmid" inRecord:pubData
if pubPMID is not missing value and pubPMID is not "" then set aPub's user18 to pubPMID
end tell
end createBookendsPublication
-- Executes the given JavaScript snippet(s) in the frontmost Safari document and returns
-- the first result. Returns an empty string if the executed JavaScript didn't return anything.
on executeJavascript(theQueries)
if theQueries is {} then return ""
repeat with theQuery in theQueries
if theQuery is not "" then
tell application "Safari"
tell front document
try
set theResult to do JavaScript theQuery
if theResult is missing value then error
on error
set theResult to ""
end try
if theResult is not "" then return theResult
end tell
end tell
end if
end repeat
return ""
end executeJavascript
-- Returns the given date as a string formatted as "YYYY-MM-DD".
on formattedDateString(theDate)
if theDate is missing value then set theDate to current date
set accessYear to year of theDate
set accessMonth to (month of theDate as integer)
if length of (accessMonth as string) is 1 then set accessMonth to "0" & accessMonth
set accessDay to day of theDate
if length of (accessDay as string) is 1 then set accessDay to "0" & accessDay
return "" & accessYear & "-" & accessMonth & "-" & accessDay
end formattedDateString
-- Creates a Cocoa dictionary using the given lists of keys and values
-- and returns the resulting dictionary as an AppleScript record.
on recordFromKeys:keys andValues:values
set theResult to current application's NSDictionary's dictionaryWithObjects:values forKeys:keys
return theResult as record
end recordFromKeys:andValues:
-- Returns the value of the given key in the given record, or `missing value` if the key was not found.
-- NOTE: This currently only works with text (NSString) values.
on valueForKey:theKey inRecord:theRecord
set theDict to current application's NSDictionary's dictionaryWithDictionary:theRecord
set theResult to theDict's valueForKey:theKey
if theResult = missing value then
return missing value
else
return theResult as text
end if
end valueForKey:inRecord: