summaryrefslogtreecommitdiff
path: root/miralib/manual/31/9
diff options
context:
space:
mode:
authorJakob Kaivo <jkk@ung.org>2022-03-04 12:32:20 -0500
committerJakob Kaivo <jkk@ung.org>2022-03-04 12:32:20 -0500
commit55f277e77428d7423ae906a8e1f1324d35b07a7d (patch)
tree5c1c04703dff89c46b349025d2d3ec88ea9b3819 /miralib/manual/31/9
import Miranda 2.066 from upstream
Diffstat (limited to 'miralib/manual/31/9')
-rw-r--r--miralib/manual/31/954
1 files changed, 54 insertions, 0 deletions
diff --git a/miralib/manual/31/9 b/miralib/manual/31/9
new file mode 100644
index 0000000..8e6bb2d
--- /dev/null
+++ b/miralib/manual/31/9
@@ -0,0 +1,54 @@
+_I_n_p_u_t_/_o_u_t_p_u_t_ _o_f_ _b_i_n_a_r_y_ _d_a_t_a
+
+From version 2.044 Miranda stdenv.m includes a function
+ readb :: [char]->[char]
+and new sys-message constructors
+ Stdoutb :: [char]->sys_message
+ Tofileb :: [char]->[char]->sys_message
+ Appendfileb :: [char]->[char]->sys_message
+
+These behave similarly to (respectively) read, Stdout, Tofile,
+Appendfile but are needed in a UTF-8 locale for reading/writing binary
+data (for further explanation see below). In a non UTF-8 locale they do
+not behave differently from read, Stdout etc but you might still prefer
+to use them for handling binary data, for portability reasons.
+
+The notation $:- is used for the binary version of the standard input.
+In a non UTF-8 locale $:- and $- will produce the same results. It is
+an error to access both $:- and $- in the same evaluation.
+
+_E_x_p_l_a_n_a_t_i_o_n
+
+The locale of a UNIX process is a collection of settings in the
+environment which specify, among other things, what character encoding
+is in use. To see this information use `locale' as a shell command.
+The analogous concept in Windows is called a "code page".
+
+UTF-8 is a standard for encoding text from a wide variety of languages
+as a byte stream, in which ascii characters (codes 0..127) are
+represented by themselves while other symbols are represented by a
+sequence of two or more bytes: a `multibyte character'.
+
+The Miranda type `char' consists of characters in the range (0..255)
+where the codes above 127 represent various accented letters etc
+according to the conventions of Latin-1 (i.e. ISO-8859-1, commonly used
+for West European languages). There are national variants on Latin-1
+but since Miranda source, outside comments and string and character
+constants, uses only ascii this does not normally cause a problem.
+
+In a UTF-8 locale: on reading string/character literals or text files
+Miranda has to translate multibyte characters to the corresponding point
+in the Latin-1 range (128-255). If the text does not conform to the
+rules of UTF-8, or includes a character not present in Latin-1, an
+"illegal character" error occurs. On output, Miranda strings are
+translated back to UTF-8.
+
+If data being read/written is not text, but binary data of some kind,
+translation from/to UTF-8 is not appropriate and could cause "illegal
+character" errors, and/or corruption of data. Whence the need for the
+byte oriented I/O functions readb etc, which transfer data without any
+conversion from/to UTF-8.
+
+In a non UTF-8 locale read and readb, Tofile and Tofileb, etc. do not
+differ in their results.
+