File encodings on scan

A place for users to ask each other questions, make suggestions, and discuss Bookends.
Post Reply
freney
Posts: 12
Joined: Wed Dec 14, 2016 7:31 pm

File encodings on scan

Post by freney »

Hi.

I'm trying to put a document scan into a plain text work flow (Markdown, actually). I'm running into two separate problems, which may well both be related to text encoding.

1) If I put an en dash (or another unicode character) in the temporary citation page range, e.g. a page range #6543@23–34, this gets saved on output as Halliday, *An Introduction to Functional Grammar*, 23‚Äì34.

Now the input file is UTF-8 encoding, as is the output. Other unicode characters in the file, and unicode in the bibliography entries themselves, generally get transmitted across properly. But the page range messes up.

(This is the same issue as viewtopic.php?f=2&t=3116&p=14003&hilit= ... ash#p14003, I believe, but it doesn't seem to be resolved in a plain text scan)

2) If I save the bibliography to Disk, it gets saved with non-UTF8 encoding.

Code: Select all

$ file Bibliography.txt 
Bibliography.txt: Non-ISO extended-ASCII text, with very long lines
So I can't then reliably concatenate that onto the previous file (which is what I want to do in order to run it through a pandoc workflow), or necessarily open it up properly.

Instead of, for example:
Himes, Paul A, “Why Did Peter Change the Septuagint? A Reexamination of the Significance of the Use of Τίθημι in 1 Peter 2:6.” *BBR* 26/2 (2016): 227–44
I get:
Himes, Paul A, ?Why Did Peter Change the Septuagint? A Reexamination of the Significance of the Use of ?????? in 1 Peter 2:6.? *BBR* 26/2 (2016): 227?44.
Now I can work around both of these issues (first by using '--' rather than '—'; second by manually pasting it into the file from the clipboard) but I'm trying to build it up to be a bit more bulletproof than currently.

Any suggestions?

Thanks,

Sam.
Jon
Site Admin
Posts: 10291
Joined: Tue Jul 13, 2004 6:27 pm
Location: Bethesda, MD
Contact:

Re: File encodings on scan

Post by Jon »

In case 1, what app are you opening the file in to read it. If the file has (correct) unicode encoding and the other app thinks it's ASCII, it will render like that. If you don't want the unicode, use a plain dash in your cited pages.

Jon
Sonny Software
Post Reply