Export PDF Notes and Highlights

Users asking other users for AppleScripts that work with Bookends.
Post Reply
Ken
Posts: 17
Joined: Mon Aug 03, 2015 5:22 am

Export PDF Notes and Highlights

Post by Ken »

Hi all,

I am experimenting with a way of annotating PDFs and taking notes in Bookends and have them available in Obsidian. I use Highlights app to create a sidecar markdown file with the highlights and notes, and Hazel to sync the sidecar files to my Obsidian vault folder.

It works ok, but I wonder if there is an easier way to do this (without needing Highlights app and Hazel, both of which I have to pay for). So my question is if anyone knows of, or can create, a script to export PDF Notes and PDF Highlights from Bookends to a markdown file in a specified folder?

I have looked at the thread on exporting note cards viewtopic.php?f=6&t=4078. The script by kseggleton there comes close. But what I would like is to export the PDF notes and highlights (not bookends notecards), plus I'd like to have the folder it saves to specified in the script so I don't have to choose folder every time I run it.

Thanks in advance for any help.

Cheers,
Ken
postdoctoral
Posts: 23
Joined: Mon Dec 27, 2021 9:38 pm

Re: Export PDF Notes and Highlights

Post by postdoctoral »

Ken -- I wonder if you're still looking for an answer to this and/or using this workflow.

I just wanted to say that I am in a similar situation. Recently I have been looking to find solutions for the same problem and, like you, I have come up with Highlights 2 and Hazel as part of my workflow.

To my knowledge the weak spot of the whole equation is the PDF-annotations-to-markdown transition. I only know of two decent ways: Highlights's sidecar file and Zotero's extract annotations function. The latter requires that you use Zotero, which has tons of other problems. Otherwise, I think in Automator there is a very poor extract PDF annotation function. Other apps that do that don't use standard PDF annotations, so they're banned from my computer. In Bookends, you have to manually add each annotation to the note stream, which is obviously not workable (especially if you want to retrieve annotations from 1000s of old annotated PDFs).

What I've come up for myself is this:

1) As a one-off, an AppleScript bulk imports my entire Bookends library (or whatever selection I make in it) to Obsidian by creating a markdown literature note for each BE item, with custom fields, link backs, author links... the works. Each note has a section for notes and comments that I write from Obsidian, and another one for annotations and highlights extracted from the PDF.

2) I read and annotate PDFs in whatever app I want, mostly Bookends for iPad. The PDFs are in Bookends' iCloud folder.

3) when a change is made on a PDF (for example I highlight something on the iPad), Hazel on the iMac notices it, and fires off a quick script that tells Highlights for Mac to launch, create the side car file and close again.

4) another Hazel rule detects the sidecar file, runs a script that reads it, trims it, edits is automatically to my liking, takes the extracted highlights, locates the corresponding markdown literature note in obsidian, and non-destructively edits it, by placing the extracted annotations in the dedicated section, without messing any other comments that may have written in other sections of the literature note.

5) when the Bookends metadata is amended, or if a new item is added to the Bookends library, another AppleScript checks to see if there is already a literature note for it in Bookends. If there is, it non-destructively updates it while leaving the comments and extracted annotations alone. If there isn't, it makes it from scratch.

All of this could be simplified greatly if only:
a) Highlights became scriptable
b) Bookends implemented an automatic way to have PDF annotations move into the note stream
c) the Keypoints.app project, which looked so promising, got going again.

Meanwhile, all of the above has been working well for me.

The main pain points have been 1) that I cannot find a good way to automatically trigger an AppleScript when there is a change in the Bookends library, like a correction in the metadata for an item and 2) that the process of batch creating over 1000 sidecar files from Highlights has been tricky (I managed to overload my M1 iMac a couple of times!).
Jon
Site Admin
Posts: 10038
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Export PDF Notes and Highlights

Post by Jon »

Sorry, I missed this when it was first posted.

As of version 14.0.4 Bookends lets you export PDF annotations. This isn't exactly what you are asking for, but it seems like it gets you much of what you want.

Jon
Sonny Software
Jon
Site Admin
Posts: 10038
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Export PDF Notes and Highlights

Post by Jon »

Bookends now has a robust annotation extraction feature as well as the ability to format citations in markdown syntax. You might look at this recent thread for ways to utilize this with Obsidian.

viewtopic.php?t=5944

Jon
Sonny Software
postdoctoral
Posts: 23
Joined: Mon Dec 27, 2021 9:38 pm

Re: Export PDF Notes and Highlights

Post by postdoctoral »

Jon, I've been away from work for a few months (first baby!) and it looks like a lot has changed in the Bookends & PDF workflow "scene". I look forward to exploring what's new and no doubt I'll reach out to you and the forum members here with any follow up questions. Meanwhile thank you as always for the relentless improvements to the app!
Jon
Site Admin
Posts: 10038
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Export PDF Notes and Highlights

Post by Jon »

Congratulations on the baby! When you catch up on your sleep deficits you should check out the Version History page and view our video tutorial on hypertext links. And there is a nice array of UI/usability improvements coming to annotations and the note stream next month.

Jon
Sonny Software
postdoctoral
Posts: 23
Joined: Mon Dec 27, 2021 9:38 pm

Re: Export PDF Notes and Highlights

Post by postdoctoral »

Thank you!

A quick first question: is the new "extract PDF annotations" command scriptable (other than via scripting UI instructions?)
Jon
Site Admin
Posts: 10038
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Export PDF Notes and Highlights

Post by Jon »

Except for your caveat, no. In another thread DrJWWMac mentioned creating a KM script and/or an Applescript to do this, though, presumably with UI scripting.

Jon
Sonny Software
postdoctoral
Posts: 23
Joined: Mon Dec 27, 2021 9:38 pm

Re: Export PDF Notes and Highlights

Post by postdoctoral »

Got it. Well, at a first glance it seems to me that the new features are awesome and I am 99% confident that I’ll be able to eliminate Highlights app from my automated workflow. I will have to see how reliably I can use apple script to get BE to generate the Extracted Annotations text file, copy the text from that front window, and close that window before continuing, in an orderly manner. UI scripting with Highlights was a nightmare, but that’s in part because I don’t ordinarily have Highlights running on the Mac,so when the Script kicked into action it would yank me away from whatever I was doing to open Highlights. BookEnds at least is always running on the Mac already, so it should work better.

Still I will say that, for my use case in particular, either one of the two following tweaks or new features would really make a big difference. I feel they’re tantalizingly close now:

Option 1) the easiest, cleanest thing would be to have an option to always have PDF annotations automatically be copied/synced into the notecard. This could be a library-wide setting. That would be HUGE for me, because then I could use the note field in apple script to pull annotations. This would simplify my workflow enormously by cutting out all the unpleasant and finicky steps in one go. Of course others may want to keep such an option switched off, if they think of annotations and notes as different concepts. And I see that with the BE for iPad app in the picture, this would have to be implemented across both iPad and Mac apps to be usable (ie, new annotations made on the iPad app would also have to find their way to the notecard)

Option 2) would be to streamline a little bit the creation of the extracted annotation text file. Right now you have to first create it, and then save it, and the file name is not prepopulated. Two separate commands to UI-script and you also have to manually set file name. It would be awesome if all that could be replaced by a single “make extracted annotation text file” click — the parameters such as file name and formats of the contents would have to be set in much the same way as you do now, but in a preference pane rather than on a ad hoc basis. This streamlined command would be easier and more reliable to UI-script, even if it wasn’t natively scriptable. (Of course, the ability to do an AppleScript such as tell bookends to make extracted annotation text file for reference X would be even better!)

Just some ideas — but in the meantime I will say that the recent changes are already huge for me!
Jon
Site Admin
Posts: 10038
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Export PDF Notes and Highlights

Post by Jon »

Thanks for the suggestions, I'll add them to the requested features list. The idea of having the configuration window setting moved into preferences is interesting, perhaps as a "Configure" submenu, like we have for the watch folder.

You say that the save dialog is not prepopulated with the file name. I'm not sure what you mean. The proposed file name is the library name folloed by "PDF annotations". What do you see?

Finally, automatically adding annotation contents to the Bookends database is not as straightforward as you might think. For one thing, once the data are in the library they no longer are associated with that PDF or that annotation. So, for example, you might import an annotations content. But if you edit the annotation in the PDF, what will happen? It can't be updated because there is no link between them. Also, you might have edited the note in Bookends and wouldn't want to overwrite it. It could be automatically added as a new note, but now you have two notes that are largely similar from the same annotation. Of course if the annotation has content, you can edit it in the note stream and the PDF will be updated. But it's still in the PDF, not the Bookends database. That's one simple example, there are dozens of problems. FWIW, you can import content from multiple annotations at once by selecting them all in the note stream and right-clicking. But it is user-initiated and user-curated.

Jon
Sonny Software
postdoctoral
Posts: 23
Joined: Mon Dec 27, 2021 9:38 pm

Re: Export PDF Notes and Highlights

Post by postdoctoral »

Jon wrote: Tue Jul 04, 2023 8:57 am You say that the save dialog is not prepopulated with the file name. I'm not sure what you mean. The proposed file name is the library name folloed by "PDF annotations". What do you see?
The proposed filename I see is "[Library Name] PDF Annotations" and the default location is the Desktop. It would be neat to be able to preset some preferences. In my case, I would love to be able to have the extracted annotations file be generated as "[citekey of selected reference].md" and to specify a folder where it should go.
Jon wrote: Tue Jul 04, 2023 8:57 am Finally, automatically adding annotation contents to the Bookends database is not as straightforward as you might think. For one thing, once the data are in the library they no longer are associated with that PDF or that annotation. So, for example, you might import an annotations content. But if you edit the annotation in the PDF, what will happen? It can't be updated because there is no link between them. Also, you might have edited the note in Bookends and wouldn't want to overwrite it. It could be automatically added as a new note, but now you have two notes that are largely similar from the same annotation. Of course if the annotation has content, you can edit it in the note stream and the PDF will be updated. But it's still in the PDF, not the Bookends database. That's one simple example, there are dozens of problems. FWIW, you can import content from multiple annotations at once by selecting them all in the note stream and right-clicking. But it is user-initiated and user-curated.
I don't pretend that I understand even a fraction of the complexities involved here. For what it's worth, though, in my own scripts and routine I also came to the conclusion that I have to separate what I call "notes" and what I call "annotations," but I do want both of those copied into my writing environment, which is a library of markdown notes, automatically. When my scripts scan Bookends for any updates to the library and they scan the PDF folder for any new annotations, they know to always preserve the "notes" section. But the "annotations" section is deleted and overwritten with the latest extraction from the PDF.

The context for this is that I am in the humanities (mostly! and a bit in neuroscience). Many scientists, I think, read a paper, take what they need to know from it, and then they are potentially done with it forever. This scenario lends itself to a "annotation->note card" one-off transcription. But I find that I have been re-reading some chapters and papers for two decades now, and as such there are many layers of annotations in the PDF. Each of those layers is valuable and tells a story in its own right. So I am never "done" with the PDF, but I do want to have all annotations be readable and accessible from my writing platform, and know that if I go back to the PDF and annotate more, those new thoughts will be carried over too. All this material is separate from my "notes" about a PDF, which are often secondary reflections about what I say in the annotations and/or the way the article in question fits within my thinking in general.

In principle, there is no reason why going forward I could not get into the habit of manually adding notes to the notecard in BookEnds. But you can see how, re-reading a 300 page book that has already been heavily annotated, it could get hard to track down what is already been copied into the note card and what hasn't, to say nothing of how difficult it would be to retroactively do this for ~1500 PDFs, many of which have been annotated.

So personally for me, it looks like what I would have to do is something like:

1. have the script detect when a PDF in the Bookends library has been annotated (already up and running, via Hazel)
2. script the BookEnds UI to fetch that PDF, generate the extracted annotations text, copy it (this was the part that I used to do via Highlights)
... and then I can either:
3A) have the script manipulate that copied text into my notes library directly (using my existing scripts) or
3B) have the script put that text in the notes field of BookEnds, where it would be detected by other scripts I have set up whose job it is to detect any changes in the BE library and update my markdown notes library accordingly. These scripts are currently set ignore the notes field in BE.

There are no substantial differences between A and B, other than with B I would keep a constantly updated record of the PDF annotations within BE, which could give me more flexibility in the future.
Anything that BE could do to streamline step 2 -- basically making the process of generating the extracted annotations text scriptable or at least faster via UI -- would be awesome.
DrJJWMac
Posts: 342
Joined: Sat Jun 22, 2019 8:04 am
Location: Alabama USA

Re: Export PDF Notes and Highlights

Post by DrJJWMac »

@postdoctoral -- Reading your processing steps on the PDF annotations, you could script this way

1) Detect that a PDF has been annotated (already completed)
2) Fire off a script that ...
2a) runs the extract PDF annotations menu using selected references as default ... Set this up by selecting a PDF, running the Extract PDF Annotations menu, and choosing the options that you want for automation. All future extractions will start with those settings by default ... NOTE to Jon -- perhaps the Extract PDF Annotations dialog box could have a check box option to "Save settings as default ...". This would allow automation but avoid that any changes during a manual processing step would overwrite a desired automation default.
2b) pauses for a moment (to allow the extraction time to show the results in the preview window)
2c) copies all text from frontmost window (now the Preview window)
2d) pastes the pasteboard text into your notes library (with OVERWRITE permission on)

One comment: The only reason I can see to automatically update an external file with annotations + notes *immediately after the PDF is changed* is if you are viewing that external file side-by-side with BE, for example in Obsidian. Otherwise, I suggest that a better approach is to avoid looking for changes to a PDF to fire a script that updates the external annotations + notes files. Instead, immediately after you finish annotating a PDF, add the annotated PDF to the Hits list in BE (manually or via AppleScript). Then, before you close the library, fire off a script that collects annotations and notes on all references in the Hits list (clearing the Hits list at the end of the script).

I personally no longer prefer to put PDF Notes (annotations) also into the BE notes. I understand that, in the past, when extracting PDF annotations directly to markdown was not possible, this was probably the only way to get the annotations out (other than to use a different app such as Highlights). I suggest that you can now remove this step from your processing workflow. The overhead of managing information in two places can be cumbersome (as you may agree since you imply that you have multiple AppleScripts just to do the various parts of all this). The abilities in BE have greatly improved to generate a markdown file that contains annotations, notes, and hyperlinks back to the BE source as well as into the source PDF itself.

I agree with you that streamlining the Extract + Save PDF Annotations would be helpful. I previously suggested including a checkbox that said [ ] Save Extracted File (Directly) to PDF (to avoid the preview window entirely). Whether this checkbox is or is not included in a future update to BE, the menu sequences to Extract + Save can be set up via AppleEvents in AppleScript to create only one button click. The bonus in using AppleScript for Extract + Save is that you can pre-populate the desired filename and path to save the extracted file.

My only other thought on your comments is tangential to your request that BE have additional automation, e.g. to detect specific events and run subsequent actions after them. This idea reminded that some programming level apps allow users to attach scripts on "hooks" to key events in their processing. Following on this, perhaps Jon would entertain the idea for users to set AppleScripts to run on the following two "hook events" in BE:

* After Library Opens --> run this ApplesScript after a library is opened
* Before Library Closes --> run this AppleScript before the library closes

I can see for example utility in the second hook event being used to fire off an AppleScript that updates a markdown file containing extracted annotations and notes across the most-recently processed references in the library.
--
JJW
postdoctoral
Posts: 23
Joined: Mon Dec 27, 2021 9:38 pm

Re: Export PDF Notes and Highlights

Post by postdoctoral »

DrJJWMac wrote: Wed Jul 05, 2023 10:53 am 1) Detect that a PDF has been annotated (already completed)
2) Fire off a script that ...
2a) runs the extract PDF annotations menu using selected references as default ... Set this up by selecting a PDF, running the Extract PDF Annotations menu, and choosing the options that you want for automation. All future extractions will start with those settings by default ... NOTE to Jon -- perhaps the Extract PDF Annotations dialog box could have a check box option to "Save settings as default ...". This would allow automation but avoid that any changes during a manual processing step would overwrite a desired automation default.
2b) pauses for a moment (to allow the extraction time to show the results in the preview window)
2c) copies all text from frontmost window (now the Preview window)
2d) pastes the pasteboard text into your notes library (with OVERWRITE permission on)
That is exactly what I plan to do; it's also very similar to what my scripts do as of today, but with Highlights (they make Highlights.app open the PDF that has been modified, wait 1 sec, save it, and close it, thereby generating the "sidecar" md file).
In fact, I think you forget the important final step: to tell BE to close that frontmost window, so the cycle can be repeated with the next PDF, should there be one in the queue.
DrJJWMac wrote: Wed Jul 05, 2023 10:53 am One comment: The only reason I can see to automatically update an external file with annotations + notes *immediately after the PDF is changed* is if you are viewing that external file side-by-side with BE, for example in Obsidian. Otherwise, I suggest that a better approach is to avoid looking for changes to a PDF to fire a script that updates the external annotations + notes files. Instead, immediately after you finish annotating a PDF, add the annotated PDF to the Hits list in BE (manually or via AppleScript). Then, before you close the library, fire off a script that collects annotations and notes on all references in the Hits list (clearing the Hits list at the end of the script).
It sounds like our use case scenario is different. I made these scripts because I tend to be annotating PDFs or writing somewhere using my iPad Pro 12.9", while the iMac is back at home. So the apps I access to on mobile, both Obsidian and Bookends, are severely limited compared to their Mac counterparts.
But what happens is that when I close a PDF on my iPad after having annotated it, the change is picked up by Hazel on the Mac at home, and it fires off Highlights, generates the sidecar file, and integrates the annotations in my Obsidian library. In practice, if it all goes smoothly, I already have the new annotations in my Obsidian iPad app by the time I open it.

When the process fails, it's usually Highlights' fault: either the app has forgotten that I have a Pro subscription, which it seems to forget every few weeks, and so refuses to generate a Sidecar file, or the Apple Script interacting with the Highlights UI fails, perhaps because some other pop-up from Mac OS has gotten in the way. The fact that this is all happening remotely means that it can be frustrating, if I don't see the updates coming and I have to log into the iMac remotely to check what's going on.
This is also why I am disappointed that I will have to continue using UI interactions in my AppleScript after I cut out Highlights from the process.
Having the AppleScript command to extract annotations from a reference would be the neatest thing, it would mean no more UI manipulation at all.
DrJJWMac wrote: Wed Jul 05, 2023 10:53 am I personally no longer prefer to put PDF Notes (annotations) also into the BE notes. I understand that, in the past, when extracting PDF annotations directly to markdown was not possible, this was probably the only way to get the annotations out (other than to use a different app such as Highlights). I suggest that you can now remove this step from your processing workflow. The overhead of managing information in two places can be cumbersome (as you may agree since you imply that you have multiple AppleScripts just to do the various parts of all this). The abilities in BE have greatly improved to generate a markdown file that contains annotations, notes, and hyperlinks back to the BE source as well as into the source PDF itself.
Yes, I don't use the native BE notes. Sometimes I brainstorm whether I should use them as a way to get the annotations out of BE, since the notes field is properly scriptable. But as per the conversation above, it always turns out to be impractical -- the BE notes were simply built for a different use.
As of right now I use Highlight for that part of the sequence. We'll see if I can get better reliability by transferring that step over to BE.
DrJJWMac
Posts: 342
Joined: Sat Jun 22, 2019 8:04 am
Location: Alabama USA

Re: Export PDF Notes and Highlights

Post by DrJJWMac »

You have a rather sophisticated workflow. Nicely done!

I expect one limitation may eventually be in the reliability of your iCloud sync.

I do not see a way to bring a reference on the iPad into the Hits list on macOS. For this reason, I foresee a complicated process in the effort to create an automatic workflow to annotate PDFs on the iPad and fire off Extract PDF Annotations and Save PDF Annotations on BE/macOS.

* annotate PDF on BE/iPad
* close annotation on PDF on BE/iPad
* put a star on the reference or put a special label on the reference (see below for why)
* !! force iCloud sync between BE/iPad and BE/macOS !!
* (Hazel on macOS recognizes change in PDF annotation date)
* Hazel fires off the AppleScript to work on the frontmost BE library
* AppleScript finds references with a star or a special label
* AppleScript moves found references to the Hits list (IMPORTANT STEP)
* AppleScript calls Extract PDF Annotations and so forth using AppleEvents menu calls, where the defaults have been set to Extract PDF Annotations from Hits list
* ...

The best improvement that Jon could consider is to include Extract PDF Annotations and Save PDF Annotations into one AppleScript call

ExportPDFAnnotations from All/Hits/Selected/ReferenceList to filepath with/without overwriting

The next best could be to have the dialog for Extract PDF Annotations include the ability to extract PDF annotations from references using more selection criteria. One option might be to include labels as an additional selection criteria, e.g. using my current labels All/Hits/Selected//Not Relevant/Active/Significant/Issues/Orphaned/New. Another option (my preference) would be to allow the extraction dialog to happen from a selected Static Group. In this case, before you sync your iPad to macOS, you would make sure that your newly annotated PDFs on your iPad are in the static group that should be extracted on macOS.
--
JJW
Post Reply