To make this a killer app: convert PDF to reference

timyu · Post by **timyu** » Thu Aug 17, 2006 10:57 am

I have just d/l'd the demo for Bookends and love it. The groups feature is essential for anyone who is not only maintaining a reference database for bibliography purposes, but wants to be able to use their library to maintain course syllabi, topical papers, lists of papers to read, etc.

One feature that I think would absolutely seal the deal would be if one could drag and drop a PDF file onto the app, and have it (1) retrieve the pubmed abstract, (2) create the reference link and (3) attach the PDF file to the resulting record.

This would allow me to convert several hundred PDF files I already have stored on my machine to a truly useful database. I'm sure many, many other users would love a feature like this.

It would require the program to be able to search a PDF file and extract useful identifiers, then query the search engine of choice (which would be user-definable; in my case, pubmed).

Does this exist? Is this feasible?

Thanks.

Post by **Jon** » Thu Aug 17, 2006 11:04 am

The problem is that pdfs contain unstructured data. There is no unambiguous way to distinguish one bit of information from another that will work with any particular pdf.

BTW, Bookends does give you tools to go the other way -- locate the reference in PubMed and then retrieve and attach the pdf.

Jon
Sonny Software

timyu · Post by **timyu** » Thu Aug 17, 2006 11:14 am

Jon wrote:The problem is that pdfs contain unstructured data. There is no unambiguous way to distinguish one bit of information from another that will work with any particular pdf.

BTW, Bookends does give you tools to go the other way -- locate the reference in PubMed and then retrieve and attach the pdf.

Jon
Sonny Software

Would it be feasible to implement this in a "fuzzy" way? To take the first 300 words of the PDF file, search it against pubmed, and take the highest relevance hit? I would still be really excited with a "sloppy" version with even 90% accuracy.

Post by **Jon** » Thu Aug 17, 2006 12:06 pm

It's something to think about. But in my experience PubMed is pretty fussy -- if you throw 300 words at it I think it's unlikely to find a match. I'll play with it, but try it yourself and see.

Jon
Sonny Software

timyu · Post by **timyu** » Thu Aug 17, 2006 3:17 pm

Hi Jon,

Thanks for your reply. You're right, pubmed does tend to choke on those.

There is actually a very nice academic service called "ETBlast" that allows one to search pubmed on the basis of entire paragraphs of text, which could be perfect for what we're discussing. It's available via UT Southwestern for free:

http://invention.swmed.edu/etblast/index.shtml

I've tested it with four random PDF files -- I pasted in the abstracts and it was able to find the correct reference for 3 out of the 4 articles that I tried. I'll try some more and see if the settings can be tweaked at all to improve the accuracy. Promising?

Tim[/quote]

Post by **Jon** » Thu Aug 17, 2006 3:25 pm

Well, it's interesting. But we can't hook into their service for this (and if we could, I wouldn't). I'll keep thinking about it.

Jon
Sonny Software

timyu · Post by **timyu** » Fri Aug 18, 2006 5:50 pm

Thanks for your quick responses. I've been thinking further about how to convert my PDF library to BookEnds format -- had one other idea to run by you.

It would be nice if there were a streamlined way to put an existing PDF files into the BookEnds database. Right now, this requires me to look up citations for all of my existing PDFs, enter them / import them into Bookends, select each one individually, and then drag the PDF file onto the correct reference to make the link. This means I have to pair up the BookEnds entry with an individual PDF file for every article.

One way to make this a little bit faster (short of the fully automatized way we've been discussing) would be to make use of the Pubmed ID. There is an OS X app out there (that will go unnamed) that allows one to rename your PDF file "abcdefg.pdf" where abcdefg is the unique PubmedID for the paper in question. You can then drag & drop a whole batch of PDFs named with that convention, and it will automatically import them into the library and link them to the corresponding pubmed reference.

That would make my conversion much faster, although it might still require a few steps (open my PDF file, find the corresponding PMID from www.pubmed.org, rename the PDF file, then when done, drag a whole batch of PDF files to your app).

Is the PMID field currently included in the Bookends database? This would predominantly be for the bioscience community (and not particularly the humanities; I don't know what the equivalent document identifier would be).

Thx, Tim Yu

Post by **Jon** » Fri Aug 18, 2006 6:32 pm

Hi,

I've been concentrating more on getting pdfs for new references in Bookends, not entering new references and finding existing pdfs on the hard disk. Is this really something that's generally wanted?

To answer your questions: the PMID is entered into the URL field when you use the PubMed import filter. That's where Bookends looks for it.

As for using the PMID as the file name -- it seems kind of cumbersome. I'd probably let Spotlight do the work (kind of how OpenURL works on the Internet).

Jon
Sonny Software

timyu · Post by **timyu** » Sat Aug 19, 2006 2:32 am

Jon, thanks for your reply and pointers. I'll read a little about openURL as I'm not sure what you were referring to (about using the PMID and Spotlight). With respect to the first question,

Jon wrote:Hi,
I've been concentrating more on getting pdfs for new references in Bookends, not entering new references and finding existing pdfs on the hard disk. Is this really something that's generally wanted?
**clip**

Personally, I tend to keep up with the literature in my field (neuroscience & medicine) by either

#1 browsing tables of contents directly on journal websites (where I can be enticed by pretty pictures and catchy titles, and login with my personal or institutional subscription to open PDFs), or

#2 via web portals to pubmed, or pubmed-derivative search engines. Some of these allow for very slick, sophisticated search functionality
(e.g., www.hubmed.org, or etBLAST); others I use because they are my institutional portals to journals licensed by our academic library.

I find that I rarely lean on search features of my reference management software -- I do use them, but usually just to pull up articles that I already know exist via #1 and #2 (and therefore may already have PDFs for). So I often have the literature in hand, and am looking to use reference manager software to organize it.

That's my personal workflow -- can't say definitively how representative it is.

Best,
Tim

Post by **Jon** » Sat Aug 19, 2006 7:15 am

Hi Tim,

I do much the same, but I do download the pdf at the time I download the reference info into the database. So I was just wondering how widespread the need for people to find and attach pdfs they already had before actually using Bookends (I know have it).

One thing that was added in 9.0.7 and that may be helpful to you -- there is an option (Refs -> PubMed menu) to immediately see the full text of an article in your browser (assuming you have permission to see it) when the reference is selected. In some (many?) cases that may obviate the need to actually store the pdf locally.

Jon
Sonny Software

thecritic · Post by **thecritic** » Mon Aug 21, 2006 1:15 pm

Jon wrote:So I was just wondering how widespread the need for people to find and attach pdfs they already had before actually using Bookends (I know have it).

I definitely have this need, too.

Would it be possible to download a reference AND an associated pdf (if available) at the same time?

Post by **Jon** » Mon Aug 21, 2006 1:52 pm

Not in the current release.

Jon
Sonny Software

joewiz · Post by **joewiz** » Mon Aug 21, 2006 2:41 pm

Jon wrote:I was just wondering how widespread the need for people to find and attach pdfs they already had before actually using Bookends (I know have it).

I have this need.

ozean · Post by **ozean** » Mon Aug 21, 2006 4:28 pm

joewiz wrote:
Jon wrote:I was just wondering how widespread the need for people to find and attach pdfs they already had before actually using Bookends (I know have it).
I have this need.

Me too

nicka · Post by **nicka** » Tue Aug 22, 2006 6:30 am

Me too, but the vast majority of the papers I have as pdfs are not on PubMed. If this could work with Google Scholar or some other comprehensive database it would be fantastic.