Page 1 of 1

Problem with pdf file import

Posted: Mon May 02, 2016 7:13 am
by Reinhard
Since scientific journals do not care about doi extraction and the doi of the article can be on the first or last page (or missing), it would be great if Bookends could provide a higher level of control (import log).
Currently, the references and pdfs are imported automatically from the watchfolder...and if something goes wrong, you cannot undo the changes to your library, because you don't know the changes, which have been made to the library.

Today I imported 50 pdfs: Some pdfs resulted in wrong reference data, others did not show up in the reference list at all although the pdfs were added to the attachment folder by Bookends (with wrong names). E.g. multiple pdfs were named after a wrong reference + a varying number...and these references did not show up in the Bookends library. I assume that some references are causing a hiccup (Bookends) and after that things go wrong.

To repair the references with the wrong metadata, one has to delete the URL, the ISSN, PMID and the DOI, add the correct DOI or PMID and do a "autocomplete from internet". In addition, one has to use the option rename pdf.
This is a complex procedure and I wonder, if a new feature "repair wrong metadata" could do the job. It should open a window, where one can enter either the DOI or the PMID and Bookends does the repair automatically (delete other "critical" identifiers and renames the pdf).

For the problem with the abandoned pdfs with wrong names, I do not know any good solution. Maybe, it would be better, if Bookends puts all pdfs of a recent import in one folder, so that one can control what has been done...and if it is ok, the user confirms...and the pdfs are moved to the main attachment folder of Bookends.

Currently, it is better not to use the watchfolder and to add pdfs manually one by one...

Re: pdf import

Posted: Mon May 02, 2016 7:34 am
by Jon
Given the range of problem you had, I'd Rebuild your library.

Jon
Sonny Software

Re: pdf import

Posted: Mon May 02, 2016 7:41 am
by Reinhard
Ok...I did a rebuilt/repair...just to be sure.
Next time, I will report again ;-)

Re: pdf import

Posted: Mon May 02, 2016 9:34 am
by Reinhard
Just an update...

I imported the same 54 pdfs into an empty Bookends library (v12.7) on another Mac...same problems.
A message appeared during the import "cant search this pdf for a DOI. Please check to see if this pdf is corrupted". Yet, the file name was not provided and the message appeared 10-20x. All pdfs are ok, most of them are Nature and Science pdfs...all open fine.

54 pdfs were imported and renamed. Strangely, the Bookends library showed 74 references. Some had no metadata at all, several contained the wrong data and a few references appeared several times with the same metadata, which were wrong.

So, I think the pdf import might not work as intended...

Re: pdf import

Posted: Mon May 02, 2016 9:41 am
by Jon
Why not pare that down to, say, 3 pdfs that give a problem when importing from the watch folder. Then send them to me. Do not send 54 pdfs, just 3 (at most). Thank you.

Jon
Sonny Software

Re: pdf import

Posted: Mon May 02, 2016 10:13 am
by Reinhard
If I add the same pdf files in groups of 5 to 10 files to the watchfolder (and wait), it works fine. All 54 pdfs (except for a few, which have no metadata)!
If I add them all at once, it doesn't. Tested on two different Macs!

So, I won't be able to send you a pdf, which causes the problem ;-)

EDIT: Is it possible that it is a matter of quantity not quality?

Re: Problem with pdf file import

Posted: Sun May 08, 2016 6:59 am
by Reinhard
Ok...I think I found a pdf file, which might cause a hiccup of Bookends.

The attached file Saunders-2007 caused a problem, when I tried to import 41 pdf files simultaneously (>watchfolder).
When I examined the pdf file, I found a format issue with the DOI within the document.
If you manually copy the DOI from the bottom of the first page, it gives you a "10.1073 pnas.0611347104" instead of a "10.1073/pnas.0611347104".
Yet, it displays correctly within the pdf document. Could this be the problem?

In my case, the reference was not displayed in my library, but the pdf file was added...the following file during the import was named "Saunders-2007" although it wasn't...the third file during the import, was added to the attachment folder, but not to the library (Suh et al), the fourth file during the import was again named Su et al + a number, although it wasn't...and it wasn't added to the library.

After that, the import of the following files worked without issues.
Yet, I ended up with 38 newly added references (out of 41), whereas 41 pdf files showed up in the attachment folder.

Again, I don't know if this is a problem of specific files or the consequence of the import of a large number of files.

Re: Problem with pdf file import

Posted: Sun May 08, 2016 8:38 am
by Jon
The issue with old PNAS PDFs being incorrectly encoded, which results in the loss of the slash and therefore an incorrect DOI. is known. They fixed it at some point, and modern PNAS PDFs have correct DOIs. I'll take a look at the issue of this affecting subsequent attachments.

Jon
Sonny Software

Re: Problem with pdf file import

Posted: Sun May 08, 2016 8:53 am
by Jon
It must be more complicated than that. I added your pdf and one of mine via the watch folder. Yours was imported to an empty reference with the title set to

Attached file “Saunders-2007 2.pdf”. Reference metadata not found online.

which is correct. Mine was imported correctly.

If you want to follow up on this please do so directly with tech support, off-forum.

Jon
Sonny Software

Re: Problem with pdf file import

Posted: Sun May 08, 2016 10:08 am
by Reinhard
Thanks Jon for your quick reply on a Sunday!!!
Yes, it seems the problem is not caused by specific files.
Anyway...some sort of control over the pdf import would make Bookends even more superior to Endnote (than it already is) :wink:

Currently, it is hard to find any errors related to the pdf import:
One has to list the pdfs in the attachment folder by "date added" and compare the list with every reference in the Bookend library (which has to be ordered by ref#).
I just assumed that it is Saunders-2007...

Have a nice weekend!