Harvesting references from web pages

A place for users to ask each other questions, make suggestions, and discuss Bookends.
Post Reply
MBurer
Posts: 21
Joined: Fri Nov 12, 2010 9:34 am

Harvesting references from web pages

Post by MBurer »

I am a new Bookends user coming from Nota Bene. NB had a great feature called Archiva which essentially was a reference harvester. It would watch the clipboard and anytime a bibliographic reference was pasted there (from Amazon, LOC, Ebsco, etc.) it would put that reference into a temporary database so the user could at a later time find it, put it into a permanent database, etc. Does Bookends have anything like this?

Many thanks in advance for the help.
Jon
Site Admin
Posts: 10296
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Harvesting references from web pages

Post by Jon »

Bookends has a variety of ways of fetching references and PDFs from the Internet.

1. Direct Online Search and import.
2. Import from a text file output from a site like Ebsco.
3. Import from the clipboard reference info output from a site like Ebsco
4. Bookends Browser, which will let you import references from JSTOR, Web of Science, and any site that supports CoINS (such as CiteULike).

Jon
Sonny Software
Jon
Site Admin
Posts: 10296
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Harvesting references from web pages

Post by Jon »

I just looked at the Archiva online video, and it's similar to Bookends Browser. Bookends Browser covers fewer aggregators, but will let you import a pdf from any site. I'd recommend you use Bookends Online Search as your major tool for getting references/pdfs, and Bookends Browser as you second. You can use a regular browser and "export to citation manager" for situations where neither work (the export format is almost always in RIS format, which Bookends can import).

Jon
Sonny Software
nicka
Posts: 226
Joined: Thu Feb 03, 2005 6:56 pm
Location: Oslo
Contact:

Re: Harvesting references from web pages

Post by nicka »

Depending on the academic field you work in, Bookends' online search may not find very many papers, although it is excellent for scholarly books, if you set it to search a university library. See here for some recent tips on getting reference info into Bookends.

Short summary:
For papers (unless you are in biological/medical sciences). Best: download pdf with a doi in it and let Bookends fill in the reference info when you attach the pdf;
Fall-back: download both the pdf and ris citation data from the journal's site. Import the reference data from the ris file, then attach the pdf.

For books: Bookends Online Search, choosing Cambridge U[niversity], for example.
Jon
Site Admin
Posts: 10296
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Harvesting references from web pages

Post by Jon »

Hm, for non-biomedical papers JSTOR is pretty good, no? And Google Scholar?

Jon
Sonny Software
nicka
Posts: 226
Joined: Thu Feb 03, 2005 6:56 pm
Location: Oslo
Contact:

Re: Harvesting references from web pages

Post by nicka »

In my experience, JStor has only partial coverage. While Google Scholar is great for finding papers, it's not good as a source of reference info. It has a lot of horribly messed-up metadata, including things like incorrect issue numbers for papers, but mainly just basic data missing (issue numbers, sometimes dates, often author first names).

As far as I know, there is still nothing comparable to PubMed for academic papers in general. But the option to look for metadata via doi lookup has more or less solved the problem of gathering metadata for recent papers. In my experience (again) all journals now put dois in their pdfs.

It's great that Bookends has a lot of different ways of getting reference info in. Best practice (i.e. what produces the best results for reasonable effort) seems to have changed every couple of years over the last decade, if I'm remembering correctly, and it probably varies a bit across academic fields, still. So it's good to have the options.
ozean
Posts: 461
Joined: Fri Mar 04, 2005 11:53 am
Location: Norway
Contact:

Re: Harvesting references from web pages

Post by ozean »

My experience is similar to nicka’s. When I need to import stuff I use a library database for books (either Columbia U or University of California, since these offer tables of contents more often than other libraries) and I download and import a .ris file and then the PDF for articles. As nicka says: the bibliographic data in Google Scholar is a mess, it is much better for .ris files.

Both procedures usually need clean up too, however:

It seems, for example, that all SAGE provided RIS files cause a double entry of the journal name, which is kind of a bother to remove.

In addition (sorry for going off topic here), I somehow have the impression that library database / Z39.50 /LOC imports have started to often have dots of the end of several fields – this is something that I did not notice before. (For example, there will be output like 2009. in the date field or Heidegger, Martin. in the name field. I guess I wouldn’t have any objections to automatically removing these dots/full stops at the end of a field…)
macula
Posts: 167
Joined: Mon Oct 19, 2009 1:14 pm

Re: Harvesting references from web pages

Post by macula »

If only EBSCO (and other repositories of similar magnitude, e.g. ProQuest) would support COinS, life would be so much easier…

As I told Jon off-forum, let's create an online petition urging our institutional libraries and their partner organizations to adopt COinS.
Any ideas as to the best way to go about this (without sending people spam mail)?
Post Reply