Suggestion: "Clean-up" of Attachments folder

A place for users to ask each other questions, make suggestions, and discuss Bookends.
Post Reply
prop
Posts: 64
Joined: Sun Feb 06, 2005 6:36 pm

Suggestion: "Clean-up" of Attachments folder

Post by prop »

Jon,

I have quite a few orphaned PDFs in my Attachments folder, from refs that I've deleted but whose PDFs were left behind.

Is it possible for Bookends to scan the folder and delete those PDFs that are not linked to any of the refs within Bookends?

Let me know what you think of this idea.
Jon
Site Admin
Posts: 10291
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

Hi,

It's not a bad idea, but of course you can have many databases. If the attachment isn't found in the db you are using to search, it may belong to another.

Jon
Sonny Software
prop
Posts: 64
Joined: Sun Feb 06, 2005 6:36 pm

Post by prop »

I suppose this could be dealt with by including warnings in a dialog box before any deletions occur that reminds the user that other databases may indeed be linking to the presumed orphaned PDFs.

For those with only one database, this would be a useful feature, and I guess for those who don't they'd probably never use it, but this must also be true of several features within any software program, no?

Thanks for considering it!
ricodelfuego
Posts: 11
Joined: Mon Jun 12, 2006 2:48 am
Location: SoCal
Contact:

Post by ricodelfuego »

I was about to post with this exact question when I found this old one....

So here is my situation, maybe i'm doing things inefficiently (??). I have a database called PDF_library that contains all references for which I have actual papers on disk. I consider this to be my "master database". I also create other databases for individual projects, and drag any desired refs from PDF_library into the project database.

When doing Pubmed searches, I generally have an "Inbox" database. Although some refs that i grab from Pubmed come with pdfs, others don't. After I grab all the refs from Pubmed and copy them to Inbox, I'll copy them all to the project database, and then copy all that have attachments to my PDF_library database.

Doing it this way, I often end up with duplicates from both my PDF_library and project database, which I then remove from both databases. However, now i've got tons of duplicate pdfs in my attachments folder. I use Skim to mark them up, so it can be tricky to tell which copies to throw away (I generally want to keep the oldest / original one that has all my notes etc)

Since this is kind of a pain, and since nobody else but me and Prop seem to have this problem, there must be a better way to deal with it? any ideas?
ricodelfuego
Posts: 11
Joined: Mon Jun 12, 2006 2:48 am
Location: SoCal
Contact:

Post by ricodelfuego »

Jon wrote:Hi,

It's not a bad idea, but of course you can have many databases. If the attachment isn't found in the db you are using to search, it may belong to another.

Jon
Sonny Software
When I have Bookends download a PDF, it puts it in the attachments folder and gives it a label, say "Hartman et al 2002 12451108.pdf"... if, sometime later, I have another database open, do a search, and download it again (often happens during "mass" downloads), I end up with multiple duplicate files in the attachments folder
Hartman et al 2002 12451108.pdf
Hartman et al 2002 12451108 838.pdf
Hartman et al 2002 12451108 3317.pdf
Hartman et al 2002 12451108 4252.pdf

Is there a way to tell Bookends to just not download a pdf that is already in the attachments folder? And what is the number AFTER the PMID in those PDFs?
Jon
Site Admin
Posts: 10291
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

No, Bookends has no idea if the pdf already exists or not. You have to tell Bookends not to download the pdf again by unchecking Get PDF.

The number after the PMID is a random number to distinguish the pdfs.

Jon
Sonny Software
ricodelfuego
Posts: 11
Joined: Mon Jun 12, 2006 2:48 am
Location: SoCal
Contact:

Post by ricodelfuego »

Jon wrote:No, Bookends has no idea if the pdf already exists or not. You have to tell Bookends not to download the pdf again by unchecking Get PDF.

The number after the PMID is a random number to distinguish the pdfs.

Jon
Sonny Software
is that random number assigned by bookends? if so, then bookends "knows" that that file is already there, right? maybe i'm misunderstanding?

i guess the bottom line is that managing a large folder of downloaded pdfs will take a considerable amount of manual effort, unless I'm missing something obvious...
Jon
Site Admin
Posts: 10291
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

Bookends assigns the random number, and how it finds it when you open an attachment.

You seem to think that Bookends "knows" that 2 pdfs are the same. It does not. All it knows are the names, which can be assigned arbitrarily. If you don't want duplicate pdfs, don't download them twice. And if you do, it does no harm.

Jon
Sonny Software
ricodelfuego
Posts: 11
Joined: Mon Jun 12, 2006 2:48 am
Location: SoCal
Contact:

Post by ricodelfuego »

ok - i guess that's how it has to be.... but it either 1. causes a huge buildup of duplicate pdfs in my attachments folder, or 2. substantially increases my work effort by not allowing me to simply grab all the references (with pdfs) that pubmed pulls down and then use "remove duplicates" to get rid of any dups from the database. it seems like if there is a PMID in the filename that it should be able to be uniquely identified, but i guess that's why i'm a biologist and not a computer programmer.
Jon
Site Admin
Posts: 10291
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

I'm a biologist, too.

You're thinking just of pdfs downloaded from PubMed automatically. There are thousands of sources of pdfs, including pdfs downloaded by others and sent to you, or pdfs you downloaded with your browser. They won't have the PMID as part of the name.

Jon
Sonny Software
aGautier
Posts: 1
Joined: Fri Dec 07, 2012 3:39 pm

Re:

Post by aGautier »

Jon wrote:Hi,

It's not a bad idea, but of course you can have many databases. If the attachment isn't found in the db you are using to search, it may belong to another.

Jon
Sonny Software
Well, BE would just need to know which databases I am using to figure out if a given pdf can be deleted. Hard to imagine someone needing 1000's dbs. Any news on the request of "cleaning up" attachment folders ?

Another suggestion would be to let BE modify the "last accessed" date of pdf (touch) files for a given DB, let the user do that for relevant DBs and then use the finder to delete PDFs that have older "last accessed" date.

What do you think ?!

ps. A quick hack that seems to do it : from BE export all attachments to a new folder (repeat for all DBs), close BE and replace the attachment folder with the new one.
Jon
Site Admin
Posts: 10291
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: Suggestion: "Clean-up" of Attachments folder

Post by Jon »

Since that request was initially made, we added the ability to confirm dialog when deleting references: move attachments to the Trash.

As for Bookends setting the last modified date of a pdf you read to the date read, that doesn't seem like a good idea. The last modified date has a purpose outside of Bookends.

Your workaround is a good one, BTW.

Jon
Sonny Software
Post Reply