Harvesting references from web pages

MBurer · Post by **MBurer** » Fri Nov 12, 2010 9:50 am

I am a new Bookends user coming from Nota Bene. NB had a great feature called Archiva which essentially was a reference harvester. It would watch the clipboard and anytime a bibliographic reference was pasted there (from Amazon, LOC, Ebsco, etc.) it would put that reference into a temporary database so the user could at a later time find it, put it into a permanent database, etc. Does Bookends have anything like this?

Many thanks in advance for the help.

Post by **Jon** » Fri Nov 12, 2010 10:21 am

Bookends has a variety of ways of fetching references and PDFs from the Internet.

1. Direct Online Search and import.
2. Import from a text file output from a site like Ebsco.
3. Import from the clipboard reference info output from a site like Ebsco
4. Bookends Browser, which will let you import references from JSTOR, Web of Science, and any site that supports CoINS (such as CiteULike).

Jon
Sonny Software

Post by **Jon** » Fri Nov 12, 2010 10:58 am

I just looked at the Archiva online video, and it's similar to Bookends Browser. Bookends Browser covers fewer aggregators, but will let you import a pdf from any site. I'd recommend you use Bookends Online Search as your major tool for getting references/pdfs, and Bookends Browser as you second. You can use a regular browser and "export to citation manager" for situations where neither work (the export format is almost always in RIS format, which Bookends can import).

Jon
Sonny Software

nicka · Post by **nicka** » Fri Nov 12, 2010 2:38 pm

Depending on the academic field you work in, Bookends' online search may not find very many papers, although it is excellent for scholarly books, if you set it to search a university library. See here for some recent tips on getting reference info into Bookends.

Short summary:
For papers (unless you are in biological/medical sciences). Best: download pdf with a doi in it and let Bookends fill in the reference info when you attach the pdf;
Fall-back: download both the pdf and ris citation data from the journal's site. Import the reference data from the ris file, then attach the pdf.

For books: Bookends Online Search, choosing Cambridge U[niversity], for example.

Post by **Jon** » Fri Nov 12, 2010 4:14 pm

Hm, for non-biomedical papers JSTOR is pretty good, no? And Google Scholar?

Jon
Sonny Software

nicka · Post by **nicka** » Fri Nov 12, 2010 4:28 pm

In my experience, JStor has only partial coverage. While Google Scholar is great for finding papers, it's not good as a source of reference info. It has a lot of horribly messed-up metadata, including things like incorrect issue numbers for papers, but mainly just basic data missing (issue numbers, sometimes dates, often author first names).

As far as I know, there is still nothing comparable to PubMed for academic papers in general. But the option to look for metadata via doi lookup has more or less solved the problem of gathering metadata for recent papers. In my experience (again) all journals now put dois in their pdfs.

It's great that Bookends has a lot of different ways of getting reference info in. Best practice (i.e. what produces the best results for reasonable effort) seems to have changed every couple of years over the last decade, if I'm remembering correctly, and it probably varies a bit across academic fields, still. So it's good to have the options.

ozean · Post by **ozean** » Sat Nov 13, 2010 9:25 am

My experience is similar to nicka’s. When I need to import stuff I use a library database for books (either Columbia U or University of California, since these offer tables of contents more often than other libraries) and I download and import a .ris file and then the PDF for articles. As nicka says: the bibliographic data in Google Scholar is a mess, it is much better for .ris files.

Both procedures usually need clean up too, however:

It seems, for example, that all SAGE provided RIS files cause a double entry of the journal name, which is kind of a bother to remove.

In addition (sorry for going off topic here), I somehow have the impression that library database / Z39.50 /LOC imports have started to often have dots of the end of several fields – this is something that I did not notice before. (For example, there will be output like 2009. in the date field or Heidegger, Martin. in the name field. I guess I wouldn’t have any objections to automatically removing these dots/full stops at the end of a field…)

macula · Post by **macula** » Sat Nov 13, 2010 11:58 am

If only EBSCO (and other repositories of similar magnitude, e.g. ProQuest) would support COinS, life would be so much easier…

As I told Jon off-forum, let's create an online petition urging our institutional libraries and their partner organizations to adopt COinS.
Any ideas as to the best way to go about this (without sending people spam mail)?

Sonny Software

Harvesting references from web pages

Harvesting references from web pages

Re: Harvesting references from web pages

Re: Harvesting references from web pages

Re: Harvesting references from web pages

Re: Harvesting references from web pages

Re: Harvesting references from web pages

Re: Harvesting references from web pages

Re: Harvesting references from web pages