1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
|
_I_n_p_u_t_/_o_u_t_p_u_t_ _o_f_ _b_i_n_a_r_y_ _d_a_t_a
From version 2.044 Miranda stdenv.m includes a function
readb :: [char]->[char]
and new sys-message constructors
Stdoutb :: [char]->sys_message
Tofileb :: [char]->[char]->sys_message
Appendfileb :: [char]->[char]->sys_message
These behave similarly to (respectively) read, Stdout, Tofile,
Appendfile but are needed in a UTF-8 locale for reading/writing binary
data (for further explanation see below). In a non UTF-8 locale they do
not behave differently from read, Stdout etc but you might still prefer
to use them for handling binary data, for portability reasons.
The notation $:- is used for the binary version of the standard input.
In a non UTF-8 locale $:- and $- will produce the same results. It is
an error to access both $:- and $- in the same evaluation.
_E_x_p_l_a_n_a_t_i_o_n
The locale of a UNIX process is a collection of settings in the
environment which specify, among other things, what character encoding
is in use. To see this information use `locale' as a shell command.
The analogous concept in Windows is called a "code page".
UTF-8 is a standard for encoding text from a wide variety of languages
as a byte stream, in which ascii characters (codes 0..127) are
represented by themselves while other symbols are represented by a
sequence of two or more bytes: a `multibyte character'.
The Miranda type `char' consists of characters in the range (0..255)
where the codes above 127 represent various accented letters etc
according to the conventions of Latin-1 (i.e. ISO-8859-1, commonly used
for West European languages). There are national variants on Latin-1
but since Miranda source, outside comments and string and character
constants, uses only ascii this does not normally cause a problem.
In a UTF-8 locale: on reading string/character literals or text files
Miranda has to translate multibyte characters to the corresponding point
in the Latin-1 range (128-255). If the text does not conform to the
rules of UTF-8, or includes a character not present in Latin-1, an
"illegal character" error occurs. On output, Miranda strings are
translated back to UTF-8.
If data being read/written is not text, but binary data of some kind,
translation from/to UTF-8 is not appropriate and could cause "illegal
character" errors, and/or corruption of data. Whence the need for the
byte oriented I/O functions readb etc, which transfer data without any
conversion from/to UTF-8.
In a non UTF-8 locale read and readb, Tofile and Tofileb, etc. do not
differ in their results.
|