RE: z39.50 access to Japanese libraries

Users asking other users for import filters.
Post Reply
Shayne
Posts: 87
Joined: Fri Mar 11, 2005 1:35 pm

RE: z39.50 access to Japanese libraries

Post by Shayne »

Hi,

I am looking for filters or even port information for Japanese libraries. I would like to be able to download data from Japanese libraries (in Japanese). A quick internet search did not reveal much, so I thought I would ask the people that know.

Cheers,
Shayne
joewiz
Posts: 67
Joined: Sun Feb 27, 2005 2:27 pm

Re: RE: z39.50 access to Japanese libraries

Post by joewiz »

The amazon.co.jp filter works pretty well.

Most likely an 'internet search' filter for a Japanese library won't work, but if you can download records in MARC format, you can construct your own input filter to process that. So I would search the libraries in question for 'export as MARC', etc.

Please let us know what you find.
Shayne
Posts: 87
Joined: Fri Mar 11, 2005 1:35 pm

Post by Shayne »

Thanks for the suggestion, Joe.

I tried downloading MARC records containing Japanese from UC Melvyl. The advantage to using these over Japanese library records is that they also contain romanization. I seem to have hit a wall, however. Below is a sample record. This seems quite different from standard MARC. Jon suggested that it should be possible to create a filter when 9.0.7 comes out due to a change to be implemented therein.

Record number: 1
FMT BK
LDR 01146nam 2200301Ia 4500
001 GLAD184903711-B
005 20060225234926.0
008 020729s1962 ja 000 1 jpn d
035 |a ocm33644476
040 |a LWU |c LWU |d UIU |d CUY
066 |c $1
1001 |6 01 |a Kurata, Hyakuzō, |d 1891-1943.
1001 |6 01 |a 倉田百三, |d 1891-1943.
24510 |6 02 |a Shukke to sono deshi / |c Kurata Hyakuzō saku.
24510 |6 02 |a 出家とその弟子 / |c 倉田百三作.
250 |6 03 |a Kaihan.
250 |6 03 |a 改版.
260 |6 04 |a Tōkyō : |b Iwanami Shoten, |c Shōwa 37 [1962]
260 |6 04 |a 東京 : |b 岩波書店, |c 昭和 37 [1962]
300 |a 230 p. ; |c 15 cm.
4900 |6 05 |a Iwanami Bunko ; |v 63-64
4900 |6 05 |a 岩波文庫 ; |v 63-64
500 |a Based on the first ed. 1917.
500 |a Previously published in 1927.
60000 |6 06 |a Shinran, |d 1173-1263 |v Drama.
60004 |6 06 |a 親鸞, |d 1173-1263 |v Drama.
CAT |c 20051102 |l CDL90 |h 1827
CAT |c 20051120 |l CDL90 |h 0932
CAT |c 20060226 |l CDL90 |h 1551
SID |b GLAD |c 184903711
852 |a GLAD |b NRLF |h PL810.U7 |i S58 1962 |p B 4 741 921
901 |a GLAD |b 184903711
PL |a Tōkyō :
PL |a 東京 :
PU |b Iwanami Shoten,
PU |b 岩波書店,
DP |c Shōwa 37 [1962]
DP |c 昭和 37 [1962]
TYP |a BK |b Book
LNG |a Japanese
YR |a 1962
SYS 026308778
LO No. Regional Library Facility NRLF PL810.U7 S58 1962 B 4 741 921 NRLF
ZZ


This is really quite beyond me. If anybody else has any other suggestions, please post them. I note a similar thread for Chinese; I wonder if you have had any luck.

Kindest regards,
Shayne
joewiz
Posts: 67
Joined: Sun Feb 27, 2005 2:27 pm

Post by joewiz »

Shayne wrote:I seem to have hit a wall, however.
What's the wall that you hit? Is your filter not working at all? Are only some tags being recognized but not others? From what you've heard from Jon, is it best to just wait for 9.0.7, or are there things you can work on for this filter in the meantime?

- Joe
Shayne
Posts: 87
Joined: Fri Mar 11, 2005 1:35 pm

Post by Shayne »

joewiz wrote:What's the wall that you hit?
I suspect the wall of my own ignorance. The MARC records are very different to those explained in the Bookends guide, and that confused me. As you say, perhaps we should simply wait for 9.0.7.

I tried something else, however. Importing an OCLC FirstSearch (WorldCat) record. This was a step forward. I can import both characters and romanization into the appropriate fields. The problem (another wall) I have hit is that I seem to have trouble configuring the Source Tags.

The author appears as 三浦清宏, 1930-Miura, Kiyohiro, 1930-
I would like this to be imported as: Miura Kiyohiro 三浦清宏, dropping the dob and rearranging the order. I experimented with the Source Tags, but could not get them to change this.

Likewise, Title: 長男の出家 /Ch̄nan no shukke / (it imports in unicode, but not, seemingly, to the forum).
This should be rearranged. The same thing applies for publication details such as 東京 : 福武書店, 1988.Tōkyō : Fukutake, but I was left confused. Can one only use Source Tags if it is a MARC record?

Lost,
Shayne
joewiz
Posts: 67
Joined: Sun Feb 27, 2005 2:27 pm

Post by joewiz »

Shayne wrote:The MARC records are very different to those explained in the Bookends guide, and that confused me.
I'm hitting my head against this wall now too with some records from a library in Taiwan... I think the only way is to look for what you need from the record - and reconstruct a filter from that, giving you just what you need.
Shayne wrote:... I can import both characters and romanization into the appropriate fields. The problem (another wall) I have hit is that I seem to have trouble configuring the Source Tags.

The author appears as 三浦清宏, 1930-Miura, Kiyohiro, 1930-
I would like this to be imported as: Miura Kiyohiro 三浦清宏, dropping the dob and rearranging the order.
I think Bookends can only import in the order that the information appears, and since the tags for these two author lines are identical, there's really nothing Bookends can do to distinguish them. I noticed the exact same phenomenon when I looked at CJK records from Harvard. For example:

1001 |6 01 |a Li, Yuanhui.
1001 |6 01 |a 李園會.

On the web interface of this record, this section appears as:

Author: Li, Yuanhui.
李園會.

Ideally, I would like Bookends to know how to sort the romanized author information into one field and the CJK information into another field, but I don't really see how it possibly could.

I just read the section in the Bookends manual about Source Tags, and this sounds like a promising possibility. Perhaps it could split the author field after the first period. (There are currently only 2 Source Tags; I have set up separate fields for romanized and original CJK versions of: 1. author 2. article/book title 3. journal title 4. publisher, so it is only of limited utility for this application.)

Let's keep pounding at this wall.
Shayne
Posts: 87
Joined: Fri Mar 11, 2005 1:35 pm

RE: Partial success importing Japanese records from Melvyl

Post by Shayne »

Although still somewhat buggy, with the help of Jon I have been able to get a filter to import Japanese records from UC Melvyl whilst mainitaining not only the diacritics but also the kanji. If I knew how to attach it to this forum I would.

Unfortunately it is somewhat of a lengthy process.
1. Export MARC record from Melvyl.
2. Copy the record into a Plain text document. I use JEditX, but other applications should also work.
3. Perform global delete on the 5 spaces before every tag.
4. Save as UTF-8.
5. Import into BE.

I include a few of Jon's suggestions for the filter:
MARC filtering should be on.
Fields end with any valid tag (should be character in or before column 6).
Tags should indicate a subfield (1001 should be 1001a).

Two problems that I do not seem to be able to resolve:

A single author's name such as "Nakanishi Inosuke 中西伊之助" ends up being imported as two lines (and two names) as follows:
Nakanishi, Inosuke
中西伊之助

I would like to know how to fix this. The MARC record is as follows:
1001 |6 01 |a Nakanishi, Inosuke, |d 1887-1958.
1001 |6 01 |a 中西伊之助, |d 1887-1958.

Even if I cannot get delete the comma after Nakanishi, I would like to remove the carriage return which makes BE think that this is two authors.

I have the same problem with titles:

24510 |6 02 |a Buzaemon Ikki / |c Nakanishi Inosuke ; [kaisetsu Maeda Kakuzō].
24510 |6 02 |a 武左衛門一揆 / |c 中西伊之助 ; [解說前田角藏].

becomes:

Buzaemon Ikki
武左衛門一揆
whereas it should be Buzaemon Ikki 武左衛門一揆

The ability to use the Source Tag parser to put the Japanese into a separate field would also be great, but for the life of me I cannot get the parser to work (ever).

Other minor problems include the fact that the date of publication is imported twice:

260 |6 04 |a Tōkyō : |b Yumani Shobō, |c 2004.
260 |6 04 |a 東京 : |b ゆまに書房, |c 2004.

gives:

2004 2004

Apart from these (and other) minor import issues, it works quite well. Any suggestions on how it might be improved would be most welcome.

Kindest regards,
Shayne
joewiz
Posts: 67
Joined: Sun Feb 27, 2005 2:27 pm

Re: RE: Partial success importing Japanese records from Melv

Post by joewiz »

Shayne wrote:Although still somewhat buggy, with the help of Jon I have been able to get a filter to import Japanese records from UC Melvyl whilst mainitaining not only the diacritics but also the kanji. If I knew how to attach it to this forum I would.
Congrats, Shayne. Would you mind sending your filter along with a test record (I assume you're working with file/clipboard importing as opposed to the Internet Search?) to cksweb@mac.com? I'd like to try it out.
ahmiller
Posts: 9
Joined: Wed Jun 29, 2005 4:12 pm

Post by ahmiller »

Shayne,

It seems like you've figured most of the problems out for MARC from US libraries, so you should be able to use the same filter for Japanese libraries. I know Waseda daigaku uses OCLC for creating their records, in other words, it uses MARC. I think you should be able to find a list of participating universities on WorldCat, and from there you can identify the Japanese libraries easiest to search using your new filter.

As for combining the kanji and romanization on one line in Bookends, good luck. I'd like that feature, too, but since MARC enters that information in distinct fields, it seems difficult to overcome this feature as BE relies on the return to recognize new tags.
ahmiller
Posts: 9
Joined: Wed Jun 29, 2005 4:12 pm

Post by ahmiller »

Shayne,

I had another thought about being able to merge the transcription and kanji on one line in BE. Perhaps, Jon could add a feature in the filters that would allow one to specify how identical fields behave in certain kinds of records. I know the UCLA library is somehow able to distinguish the roman text from the kanji--the latter ends up in 880 fields in the Voyager catalog while the former will retain its original field number.

In MARC, the 066 field identifies if and what kind of non-roman text is present in the record, so maybe BE could use this field in filtering fields with identical information but with different characters.

For more information about MARC coding, see

http://www.oclc.org/bibformats/
Jon
Site Admin
Posts: 10071
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Post by Jon »

Hi,

Bookends will add data with identical tags to the same field. How it does so varies a bit for different fields. For authors, editors, and keywords, it should add them as new lines.

Jon
Sonny Software
Shayne
Posts: 87
Joined: Fri Mar 11, 2005 1:35 pm

Post by Shayne »

ahmiller wrote:It seems like you've figured most of the problems out for MARC from US libraries, so you should be able to use the same filter for Japanese libraries. I know Waseda daigaku uses OCLC for creating their records, in other words, it uses MARC. I think you should be able to find a list of participating universities on WorldCat, and from there you can identify the Japanese libraries easiest to search using your new filter.
Andrew,

I do not understand what is going on with Japanese libraries at all. I cannot find a single one that actually offers z39.50 access. How can this be so? I cannot find a valid z39.50 address for Waseda. Although they do make their records available through OCLC, have you found a way to use these?

I have also noticed that some Japanese universities seem to use the Limedio gateway:
http://s-opac.sap.hokkyodai.ac.jp/limed ... op_en.html

This would be great as it provides UTF-8 access to NII, if only we could figure a way to get the information into BE.

Cheers,
Shayne
Post Reply