More flexible citekey generation

A place for users to ask each other questions, make suggestions, and discuss Bookends.
Jon
Site Admin
Posts: 10070
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

More flexible citekey generation

Post by Jon »

A user in another thread requested that Bookends generate citekeys that will be unique even if there is more than one library.

viewtopic.php?f=2&t=5734

I'm certainly open to making citekey generation flexible, simple, and effective. I'm proposing to provide these options in a preferences pop-up menu:

Author+Year -> Smith2022
Author+Year+UniqueID -> Smith2022_32345
Author+Year+Title(15) -> Smith2022First-15-chars
Author+Year+LibraryName -> Smith2022_Library1

All options will be unique within a library (Bookends appends letters to make them so), Options 2 and 4 will be always be unique between libraries as well.

Does anyone feel strongly about another candidate?

Jon
Sonny Software
Dellu
Posts: 268
Joined: Sun Mar 27, 2016 5:30 am

Re: More flexible citekey generation

Post by Dellu »

These are wonderful options; and I am pretty sure they are going to make any user satisfied.

I am very satisfied already.

I have a suggestion about the 3rd choice.

1) Is it possible to replace the 15 characters with what Jabref calls [VeryShortTitle] (https://docs.jabref.org/setup/citationkeypatterns)? This is to mean that the first substantive word of the title. I am pretty sure that the 3rd choice is going to be the most popular of all because having the first word on the Key has also mnemonic functions during insertion of citations. It would be even more beautiful if BE can suffix the first substantive word into the Key. Even if you have problem of filtering out the substantive words, I think picking the first word is still a better choice than the character count. The reason for this spell checkers. The truncated-and-concatenated words are going to annoy the spell checkers in the latex document.

2) if that is difficult to implement, it would be nice to reduce the number a bit lower: like 5/6, and at most 10 characters because too long characters makes the key complex; making the latex ugly. \cite{John2012Noun-p} is much shorter and nicer than \cite{John2012Noun-phrase-argum}. The probably of having a duplicate after sufficing 5/6 letters is pretty low. If that happens, it will probably with tens out of thousands of entries. A shorter number of characters also reduces the problem of spell checkers I mentioned above.
Jon
Site Admin
Posts: 10070
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: More flexible citekey generation

Post by Jon »

Finding the first substantive word isn't trivial, especially given that any language can be used. I suppose it could be something like the first word with 4 or more characters. But this is getting way beyond the problem this was meant to resolve. I'm not asking for anyone to suggest a more sophisticated naming algorithm, simply if there is some other combination of data that anyone thinks would be essential. i don't see any that are needed, I'm just asking so I don't miss something important.

Jon
Sonny Software
Dellu
Posts: 268
Joined: Sun Mar 27, 2016 5:30 am

Re: More flexible citekey generation

Post by Dellu »

Jon wrote: Sun Apr 10, 2022 2:00 pm Finding the first substantive word isn't trivial, especially given that any language can be used.
Jon
Sonny Software
That is fair; i understand.
I am sorry if I am being nosy here. This feature is extremely important for me; I spend a lot of time juggling reference stuff. That is why I am asking too much. I want this one to be solved for once and for all.

Yes, the first word+ few characters is great idea.

How about just the first word? I think that could be sufficient.

I hope other experienced latex users can comment on this.
Last edited by Dellu on Sun Apr 10, 2022 2:42 pm, edited 1 time in total.
Jon
Site Admin
Posts: 10070
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: More flexible citekey generation

Post by Jon »

As I said, starting with the first 4+ letter word might suffice in most cases. At least it will avoid starting with most prepositions.

Jon
Sonny Software
Dellu
Posts: 268
Joined: Sun Mar 27, 2016 5:30 am

Re: More flexible citekey generation

Post by Dellu »

Jon wrote: Sun Apr 10, 2022 2:40 pm As I said, starting with the first 4+ letter word might suffice in most cases. At least it will avoid starting with most prepositions.

Jon
Sonny Software
That is even better. If the you can program it so that the first word will be 4 or more letters, that is wonderful. You don't need anything else. Mostly, it is "the" which often shows up in titles. Your algorithm will filter it out.
DrJJWMac
Posts: 348
Joined: Sat Jun 22, 2019 8:04 am
Location: Alabama USA

Re: More flexible citekey generation

Post by DrJJWMac »

I like the options you are providing with this caveat. A citation key with author + year + 15 characters can be a pain to track manually when building a LaTeX document. Please consider cutting the title to be no more than 5 characters. I also suggest a suffix with the journal abbreviation.

Thank you for your efforts on this.
--
JJW
Jon
Site Admin
Posts: 10070
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: More flexible citekey generation

Post by Jon »

I'll truncate the title to the first 5 characters. Here's a typical example

Awram2001Ident

The problem with the short journal name is that (1) it requires that you use a glossary, and (2) the short name must be there. Otherwise, Bookends would resort to the full journal name, which can be quite long. If appending the journal name is important to you, I suggest making the option like the title, appending the first 5 characters of the name entered in the journal field. Of course, this wouldn't be helpful for names beginning with Journal or Annals or the like.

Jon
Sonny Software
DrJJWMac
Posts: 348
Joined: Sat Jun 22, 2019 8:04 am
Location: Alabama USA

Re: More flexible citekey generation

Post by DrJJWMac »

Fair enough. All this is a good improvement.
--
JJW
iandol
Posts: 465
Joined: Fri Jan 25, 2008 2:31 pm

Re: More flexible citekey generation

Post by iandol »

I personally don't have any requirement or preference (I prefer to use a single library and authoryear works well for me). But I thought I would at least add some comments about other attempts for cite key generation for context.

Papers (I think before they were assimilated by the Borg, um I mean acquired) made a "standard" method to generate unique keys which they made available to others:

https://github.com/cparnot/universal-citekey-js

Their original hope I think was a noble one, a universal citekey standard. The main problem they wanted to solve was to make the citekey "unique" to a particular reference. This means different users, or different libraries should be able to uniquely cite the same paper without collisions. Their solution was to hash the DOI or title. A criticism of this hashing was that two characters are not sufficient for guaranteeing no collisions...

Both Jabref and Zotero (via betterbibtex) defer the choice to the user by giving them a whole bunch of fields from which to compose:

https://docs.jabref.org/setup/citationkeypatterns
https://retorque.re/zotero-better-bibtex/citing/

This seems over-engineered, I'm imagine 99% of users stick to a similar pattern.
Dellu
Posts: 268
Joined: Sun Mar 27, 2016 5:30 am

Re: More flexible citekey generation

Post by Dellu »

Thank you for implementing this feature.
This is so great.

I have been experimenting with the patterns. I find the pattern Author+year+LibraryName very attractive. I think I am going to settle on this pattern.

If it is not too much to ask, can you remove the underscore and directly suffix the Library name to the year?'

a) John2012Lib This is the title
b) John2012_Lib This is the title


The pattern in (a) appears simpler and more elegant than the one in (b).
Jon
Site Admin
Posts: 10070
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: More flexible citekey generation

Post by Jon »

I'm not changing anything now. Personally, I think it makes a lot of sense to use the underscore because, unlike the author and date, the library is not part of the reference metadata. If others have opinions please voice them in this thread.

Jon
Sonny Software
DrJJWMac
Posts: 348
Joined: Sat Jun 22, 2019 8:04 am
Location: Alabama USA

Re: More flexible citekey generation

Post by DrJJWMac »

I would argue as Jon that author + year is essential and all else is supplemental. I would add for consistency that any supplemental information after the year should be prefaced by a dash. This includes ShortTitle. Finally, I would argue that white space be removed from all supplemental content to condense the cite key.

Author + Year: Smith2022
Author + Year + UniqueID: Smith2022_4444
Author + Year + ShortTitle: Smith2022_Somedaysago
Author + Year + LibraryName: Smith2022_MyLibrary

Multiple cite keys will stand out better.

As shown by various authors \cite{Smith2022, Jones2010_2222, Bocker2021_Astimegoeson, Fubar2020_HistoryofMan}, this style is easier to read.
--
JJW
Dellu
Posts: 268
Joined: Sun Mar 27, 2016 5:30 am

Re: More flexible citekey generation

Post by Dellu »

makes a lot of sense to use the underscore because, unlike the author and date, the library is not part of the reference metadata.
I understand the logic why you want it separated from the other units.

First, I would like to emphasize that this discussion is just an addendum; not a big deal; I am dropping it here just as a small remark...because the main task has been done quite effectively. I am very satisfied with the results.


I suggest the removal of the underscore: first on aesthetic basis.

1. The Camel case is more attractive:

The camel case feels more attractive to see in the bibtex database.



Smith2022Title
Smith2022Mylib


They are also clear enough to tell that they are parts merged together. Note also that the library names can be fine tuned to match the required pattern.

I am renaming my libraries so that I will have simple Keys, for example.

My articles library is going to be named α, and my books library is going to be β. I am making the library names this short because I want the suffix to be simple.

2. Spell and grammar checkers:

- simpler is better because the spell checkers would not pick it up.

Long and complex CiteKeys cause problems for grammar and spelling checkers in the Latex document.

In the current system the Keys will look like the following:
Smith2022_α
Smith2022_β


This is already great. But, if we have the underscore removed:

Smith2022α
Smith2022β


this appears much simpler.

3. The underscore has special meaning in Latex:

The underscore also has a special meaning in Latex. One cannot directly write it in tex document; needs to be escaped with \. Indeed, it won't cause a direct error; but it can be confused with the special meaning in some instances. For that, I would be glad if it doesn't make it into the Keys. A hyphen (-) is probably better if we have to have a separator.


As to DrJJWMac's point of consistently, I don't think consistency is important here because once a user decides on any of the four choices, all the libraries are going to have that pattern consistently anyways. All the choices are going give consistent results for the end user anyways.


For some of us, this setup is going to be once and for all. For me, once I decide on any of the options, I am not going to change it easily because my file-renaming system also depends on it.

Changing it means that I have to a renaming to all of my pdfs, in addition to the changing the keys. It also means that I have to re-index my files in my searching softwares such as Foxtrot and Devonthink. That is why I am not going to immediately jump into it and turn it on. It will wreck my whole system; and is going to take me days to get everything back up and running.

After all, I am just a user: it is the majority who wins (thanks for giving us the option dear Jon). I will wait and see what others will say about it.
Jon
Site Admin
Posts: 10070
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: More flexible citekey generation

Post by Jon »

Thank you both for this discussion. I'm agnostic on this, I'm happy to substitute a hyphen for the underscore or eliminate it altogether, as it's largely a matter of personal taste. I don't want to overcomplicate the options with lots of minor variations, so I'd like to settle on one. The default is to leave it as is, but if those who use BibTeX on a regular basis can come to a consensus I'd be happy to adjust this to meet their wish.

Jon
Sonny Software
Post Reply