File encodings on scan
Posted: Wed Dec 14, 2016 7:47 pm
Hi.
I'm trying to put a document scan into a plain text work flow (Markdown, actually). I'm running into two separate problems, which may well both be related to text encoding.
1) If I put an en dash (or another unicode character) in the temporary citation page range, e.g. a page range #6543@23–34, this gets saved on output as Halliday, *An Introduction to Functional Grammar*, 23‚Äì34.
Now the input file is UTF-8 encoding, as is the output. Other unicode characters in the file, and unicode in the bibliography entries themselves, generally get transmitted across properly. But the page range messes up.
(This is the same issue as viewtopic.php?f=2&t=3116&p=14003&hilit= ... ash#p14003, I believe, but it doesn't seem to be resolved in a plain text scan)
2) If I save the bibliography to Disk, it gets saved with non-UTF8 encoding.
So I can't then reliably concatenate that onto the previous file (which is what I want to do in order to run it through a pandoc workflow), or necessarily open it up properly.
Instead of, for example:
Any suggestions?
Thanks,
Sam.
I'm trying to put a document scan into a plain text work flow (Markdown, actually). I'm running into two separate problems, which may well both be related to text encoding.
1) If I put an en dash (or another unicode character) in the temporary citation page range, e.g. a page range #6543@23–34, this gets saved on output as Halliday, *An Introduction to Functional Grammar*, 23‚Äì34.
Now the input file is UTF-8 encoding, as is the output. Other unicode characters in the file, and unicode in the bibliography entries themselves, generally get transmitted across properly. But the page range messes up.
(This is the same issue as viewtopic.php?f=2&t=3116&p=14003&hilit= ... ash#p14003, I believe, but it doesn't seem to be resolved in a plain text scan)
2) If I save the bibliography to Disk, it gets saved with non-UTF8 encoding.
Code: Select all
$ file Bibliography.txt
Bibliography.txt: Non-ISO extended-ASCII text, with very long lines
Instead of, for example:
I get:Himes, Paul A, “Why Did Peter Change the Septuagint? A Reexamination of the Significance of the Use of Τίθημι in 1 Peter 2:6.” *BBR* 26/2 (2016): 227–44
Now I can work around both of these issues (first by using '--' rather than '—'; second by manually pasting it into the file from the clipboard) but I'm trying to build it up to be a bit more bulletproof than currently.Himes, Paul A, ?Why Did Peter Change the Septuagint? A Reexamination of the Significance of the Use of ?????? in 1 Peter 2:6.? *BBR* 26/2 (2016): 227?44.
Any suggestions?
Thanks,
Sam.