CJK text and 'live' searches via main search field

A place for users to ask each other questions, make suggestions, and discuss Bookends.
Post Reply
shuyi
Posts: 7
Joined: Thu Apr 05, 2007 2:41 pm

CJK text and 'live' searches via main search field

Post by shuyi »

(Please note that this post has been moved from the diacritics and search thread)

A question for all of you that use CJK text frequently.

I use Chinese and Japanese source material quite frequently and have a number of these sources in Bookends. I've noticed that I am unable to consistently perform a search trying to input CJK text via the search field in the main window. Example: if I attempt to input 'yi' via the Simplified Chinese Pinyin input method in order to get a keyword that I have that is a particular Chinese character, the search will assume I want to search on the Roman characters 'yi' and will not give me a chance to select the proper Chinese character. Thus, I cannot perform the search. Occasionally, if I try to perform a similar search using a traditional Chinese input method or a Japanese input method, I can sometimes get the search to work as expected, but rarely especially in the case of traditional Chinese (it often defaults to the first character in the list that pops up after inputing, say, 'hua'; I have no chance to choose the character in question before the search is performed by Bookends).

Relatedly, when attempting to switch input methods when the cursor is in the search field in the main window, the input source will often change on its own. So if I choose, say, the simplified Chinese input method, as I start to type it will switch to a Japanese input method of its own accord (I could, of course, turn off the other input methods, but I use more than one on a regular basis and would prefer not to do this). This makes it difficult to attempt any search via the search field. Perhaps this is the reason I often can't perform CJK searches effectively when using the search field?

Neither of these is a problem, however, when searching a database via Refs menu -> Find... It only happens in the search field in the main window . This is a satisfactory work around, but I'd prefer to use the search field in the main window if at all possible.

So, my question is, is there a way to run searches consistently via the search field? Do any of you that use CJK text searches regularly have the same issue that I do? Jon pointed out that this shouldn't be an input method issue (and I wouldn't think it is either), but my experiences are different for each input method for whatever reason. I haven't had this problem in other apps nor when performing searches via Spotlight. This problem seems to be related to the 'live' search in Bookends, at least as it works with CJK text, as Jon pointed out in response to my query in the original thread.

Thanks in advance for any insight you might be able to provide. If any clarification is needed, please let me know.

Chris
joewiz
Posts: 67
Joined: Sun Feb 27, 2005 2:27 pm

Post by joewiz »

My experience is that the title bar's search is not finding the same results as the Refs > Find function. Here's my experience (Bookends 10.1, Leopard):

I have many entries with the word "hygiene" in Chinese and Japanese characters (respectively, the 2-character phrase in these languages is "weisheng" and "eisei", and the word's unicode values are the same in both languages. A number of my entries' titles begin with these characters, so they're not being missed due to their being in the middle of a long string. I won't type the actual characters here for fear of forum-mangling.)

When I type the word into the search bar, I get no results. When I type it into the Find dialog, I get 75 hits. This is repeatable whether I'm using the Japanese input method (Hiragana), the Chinese input method (QIM), or simply pasting the characters in while the "U.S." keyboard is selected.

I've tried several variations on this test with other characters, but they all boil down to this same issue.
Jon
Site Admin
Posts: 10066
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

I don't think this is the same issue at all. But in case it is related, try it with the next Bookends update and see if it persists.

Jon
Sonny Software
joewiz
Posts: 67
Joined: Sun Feb 27, 2005 2:27 pm

Post by joewiz »

Jon wrote:I don't think this is the same issue at all.
Yes, to be explicit, I had no problem typing characters into any search box in Bookends.
Jon
Site Admin
Posts: 10066
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

Hi,

I think I've tracked down this issue with Live Search, and the fix will be in the next update. Note that because of how text is indexed certain Asian characters are treated as whole words. In these instances, Bookends will find a reference containing the character even if it appears in the middle of a string of characters. In any case, the Live Search will behave in the same was as the Find command.

Jon
Sonny Software
shuyi
Posts: 7
Joined: Thu Apr 05, 2007 2:41 pm

Post by shuyi »

Jon wrote:Hi,

I think I've tracked down this issue with Live Search, and the fix will be in the next update. Note that because of how text is indexed certain Asian characters are treated as whole words. In these instances, Bookends will find a reference containing the character even if it appears in the middle of a string of characters. In any case, the Live Search will behave in the same was as the Find command.

Jon
Sonny Software
That's great. As long as the search works, irrespective of whether or not it grabs a character (i.e. hanzi or kanji) in the middle of the word, that's all that matters to me. Thanks for finding a fix!

Just out of curiosity, and if it's not too much trouble to ask, was the issue that CJK languages are "double byte" languages? Search string requirements for alphabetic and non-alphabetic scripts are, I think, different. It seems that a different framework is used for languages like Chinese and Japanese and Korean, since these languages are "double byte". Every "character" (kanji/hanzi, or less satisfying "ideograms" or "graphic character" in English) or Japanese "kana" or Korean "hangul" is encoded in two bytes rather than one due to the nature of these languages (a single "graphic character"/kanji/hanzi can represent a whole "word," for instance, and two bytes are required to represent the sounds, combinations of consanants and vowels, that make up kana and Hangul syllabuls). So, a search for a single "graphic character"/kanji/hanzi or a Japanese kana or Korean Hangul syllabul is actually a search for "two bytes" and, sometimes, a search for a whole word. Is/was this the issue?

Thanks again for all your help on this!

Chris
Jon
Site Admin
Posts: 10066
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

Hi,

No, "two byte" fonts are a thing of the past. Bookends (and all modern apps) use Unicode now. As it happens, the characters I used to diagnose the problem actually used three bytes each.

In any case, the issue was the way in which Bookends creates indexes and how it determines word boundries. It's too technical to go into detail, but I'll say that now the Live Search checks to make sure that it's using the correct search logic to find the Asian text in the index. The Find method was always correct in this regard.

Jon
Sonny Software
shuyi
Posts: 7
Joined: Thu Apr 05, 2007 2:41 pm

Post by shuyi »

Jon wrote:Hi,

No, "two byte" fonts are a thing of the past. Bookends (and all modern apps) use Unicode now. As it happens, the characters I used to diagnose the problem actually used three bytes each.

In any case, the issue was the way in which Bookends creates indexes and how it determines word boundries. It's too technical to go into detail, but I'll say that now the Live Search checks to make sure that it's using the correct search logic to find the Asian text in the index. The Find method was always correct in this regard.

Jon
Sonny Software
Ah, I see. That makes sense. I wasn't thinking clearly re: unicode. Thanks for the clarification and happy to see that this was resolved!

Chris
Post Reply