Page 2 of 3
Posted: Tue Aug 22, 2006 7:31 am
by Jon
Hi nicka,
This thread is about finding and attaching pdfs that already exist on your hard drive but aren't attached to references you entered into Bookends. I'm not sure that's particularly useful, but some people clearly think it is.
Since you mentioned it, we do have some nice things coming for PubMed users, though...
Jon
Sonny Software
Posted: Tue Aug 22, 2006 7:58 am
by nicka
This thread is about finding and attaching pdfs that already exist on your hard drive but aren't attached to references you entered into Bookends. I'm not sure that's particularly useful, but some people clearly think it is.
That's what I meant. I have quite a few pdfs on my hard drive that I would attach to Bookends references if I had time. The current workflow is -
1) open pdf,
2) look for title and author,
3) search on Google Scholar,
4) import reference details from Google Scholar into Bookends (luckily a one-click operation with Firefox, but has to be done one reference at a time)
5) attach pdf to the new reference
6) repeat stages 1 to 5 several hundred times!
7) tidy up new references
It would be fantastic if there were a way to drag a load of pdfs onto Bookends and say: go look these up on Google Scholar, import papers that match and attach the correct pdf to the reference. That would reduce the whole operation to two steps:
1) dump pdfs on Bookends
2) tidy up new references
I don't know if this is feasible -- as you say, the info in pdfs is unstructured. But the author and title are almost always near the beginning.
It would be a very good feature to bring new users to Bookends, since almost every academic now has lots of pdfs on their hard drive.
It would also be wonderful for pdfs that are emailed to you. Intuitively, the workflow is the wrong way round at the moment: you get a pdf and want to 'put it into' Bookends, but currently it is necessary to create the reference manually (or import it from Google Scholar or another database), then attach the pdf to the new reference. It would be better if it were possible to drag the pdf from the email straight onto Bookends, which would then create a new reference, attach it to that new reference and go find the details of the reference online.
I hope that clears up what I meant.
Nick
Posted: Tue Aug 22, 2006 8:21 pm
by timyu
nicka wrote:
It would be a very good feature to bring new users to Bookends, since almost every academic now has lots of pdfs on their hard drive.
Exactly -- every one of my colleagues has many folders of dozens to hundreds of PDFs, and they would love a tool like this. To my knowledge this feature doesn't exist in any reference management package.
So how to implement this? Granted that it may be at this point still hard to AUTOMATICALLY extract a PDF file's contents and retrieve the relevant record, how about this for an intermediate solution:
1. Drag and drop a batch of PDFs into Bookends. These become automatically incorporated as attachments to "placeholder" references which are given arbitrary names.
2. Select one of these PDF/placeholder references, and the attachment inspector comes up displaying the first page in a separate window. Underneath also pops up a search form for Pubmed, Google Scholar, (your favorite search engine] with fields for title, author, journal volume/page, etc.
3. Look at the PDF which is open and in full view, and type in the relevant fields. Upon hitting "Retrieve," the topmost hit is automatically imported as the reference for this PDF (or you are presented with a menu of the top five hits, and you choose the one you want).
Tim
Posted: Tue Aug 22, 2006 8:59 pm
by nicka
Granted that it may be at this point still hard to AUTOMATICALLY extract a PDF file's contents and retrieve the relevant record
I'm not sure that it is so hard for typical academic papers. I just did a test. I opened a pdf at random -- it happened to be a recent paper from the journal Mind and Language -- and selected the first ten lines. I copied and pasted that into TextWrangler and removed the line breaks, so that I had all of the words from the first ten lines of the paper on one line. I copied and pasted that mess into the search field on Google Scholar's page in Firefox and pressed return.
The paper in question was the top result. Better still, it was the only result.
By doing this I also found out that Google Scholar limits its searches to 32 words, not counting words that it ignores, like 'of', 'the' and 'this'. So it would probably be best if Bookends were to give Google the first 50 or so words and if the title and author are in those first fifty words, then success seems quite likely.
Anyway, something of this sort seems feasible. Still, there will be lots of cases where Google (or Pubmed or whatever) gets it wrong or produces no result at all, so there would need to be prompts to the user to check each reference against its pdf and to enter details manually or search more for ones that are wrong or where nothing was found.
Nick
Posted: Tue Aug 22, 2006 10:12 pm
by timyu
Thanks Nick. I just tried the same with google scholar and four articles from the cancer genomics literature. Four for four! Better than the etBLAST tool that I had posted about earlier.
Now, has Google opened up their search API for Scholar? I just did a quick search that suggested that they have not. However, if you visit
www.libx.org you see an example of a Firefox extension that interfaces with Google Scholar. So it ought to be possible...
Tim
Posted: Tue Aug 22, 2006 11:43 pm
by K1
I didn't think such a thing would be feasible, but I would love something like this as well. I've got hundreds of academic papers on my HD, and when I download papers, I don't always bother exporting and then importing the reference (often from JSTOR, and most of the time I won't need it anyway). Any way to make it easier to get them into Bookends would indeed be a killer feature.
Posted: Wed Aug 23, 2006 5:08 am
by tom
There is a small application called "cb2bib06" (GPL, freeware) that converts pdf files into references (BibTeX) with more or less success (actually you have to edit each reference by hand: an editor and an internet search tool makes it easier...). However, doing a quick PubMed search with the two first author names is more convenient to me
tom
Posted: Wed Aug 23, 2006 6:06 am
by rward
I've been concentrating more on getting pdfs for new references in Bookends, not entering new references and finding existing pdfs on the hard disk. Is this really something that's generally wanted?
Very much! Colleagues, postdocs will send pdfs to me, I might get them off google scholar or direct from author when unavailable at our library. Plus I had a collection before I started using bookends.
Also, either I'm doing something wrong, or there still seems like an extra step in the new reference->pdf process. In Bookends, I do the pubmed search, find a reference, copy it into my database, good so far. I request the fulltext, and then usually have to do a click or two to download the pdf. That's good too. But then to manually link up the downloaded pdf to the reference is a slight pain: I have to find the file, which means going to the download manager and clicking "Show", or command clikcing the icon in the preview window showing the pdf. Then frequently organizing my windows so that I can carefully drag the pdf icon to the reference list entry. I know it doesn't sound like much, but it's an obstacle so that I frequently don't have my pdf's attached as I would like to the reference entries in bookends.
A feature like the above would let me use bookends as a reliable way to get at my library of pdfs. I find relevant pdfs, drag them into bookends, and I'm done.
Posted: Wed Aug 23, 2006 8:12 am
by Jon
timyu wrote:2. Select one of these PDF/placeholder references, and the attachment inspector comes up displaying the first page in a separate window. Underneath also pops up a search form for Pubmed, Google Scholar, (your favorite search engine] with fields for title, author, journal volume/page, etc.
Hi,
Just so you know, you can more or less do this now. Open the pdf, say in Preview, and with it in view enter the search terms in the Internet Search window. Download the reference you find. Then drag and drop the proxy icon from the pdf window onto the reference and Bookends will attach it.
I am aware that this is not the final solution that people are asking for, just pointing out that drag and drop of the proxy icon makes it easier to attach an existing pdf than might otherwise be the case.
Jon
Sonny Software
Posted: Wed Aug 23, 2006 8:16 am
by Jon
rward wrote:But then to manually link up the downloaded pdf to the reference is a slight pain: I have to find the file, which means going to the download manager and clicking "Show", or command clikcing the icon in the preview window showing the pdf.
FYI, you can often just drag and drop the proxy icon for the URL (in the browser window) onto Bookends and Bookends will download and attach the pdf (the URL must end in .pdf for this to work -- the User Guide has details).
Also, the next Bookends update will have a new feature for dealing with pdfs for PubMed users that I'm sure you will like.
Jon
Sonny Software
linking PDF files
Posted: Fri Aug 25, 2006 12:45 am
by rmelamed
Hi Jon, I have talked to you about linking existing pdfs for a few years. Since others are requesting this feature I will again make the suggestion that perhaps Bookends could replicate how most humans (I think) recognize the title, it uses the largest font. Searching Pubmed with the title works well.
Also, OS x now lets you save files with really long names. I find it useful to see these titles as file names. When i rename old file as I manually link them, if you copy and paste the title from the PDFs to use as the file title it gets screwed up with line breaks, but copying from Bookends works well.
Posted: Sat Aug 26, 2006 9:50 am
by Gerson
Just another plea for some means of aiding accession of PDFs on disk.
Posted: Sat Aug 26, 2006 11:14 am
by Jon
There are some very nice features coming for pdf retrieval and attachment in the next update. Stay tuned.
Jon
Sonny Software
Existing PDFs, but not PubMed
Posted: Fri Sep 22, 2006 8:11 pm
by jeremydouglass
Like Nicka. It would be a nice feature, although not a huge priority.
autolinking pdfs and BE would be great
Posted: Sun Sep 24, 2006 10:31 pm
by Harry Lime
Having a way to automagically link one's large library of pdfs with the bookends database would be great. I am sure that most of the pdfs I want linked are already present in my database, just unlinked. So how about taking the abstract field from bookends and searching (w/ spotlight?) the folder of pdfs and looking for a match?
I would like some help with a related issue. If I have pdfs attached and I use bookends on my desktop and notebook computers, and synchronize the database and a folder full of pdfs, the links get messed up since the files do not have the exact same path.
Thanks!