diff options
Diffstat (limited to 'miralib/manual/12')
-rw-r--r-- | miralib/manual/12 | 86 |
1 files changed, 86 insertions, 0 deletions
diff --git a/miralib/manual/12 b/miralib/manual/12 new file mode 100644 index 0000000..a464473 --- /dev/null +++ b/miralib/manual/12 @@ -0,0 +1,86 @@ +_T_o_k_e_n_i_s_a_t_i_o_n_ _a_n_d_ _l_a_y_o_u_t + +A Miranda script or expression is regarded as being composed of _t_o_k_e_n_s, +separated by _l_a_y_o_u_t. + +A token is one of the following - an identifier, a literal, a type +variable, or a delimiter. Identifiers and literals each have their own +manual section. A type variable is a sequence of one or more stars, +thus * ** *** etc. (see basic type structure). Delimiters are the +miscellaneous symbols, such as operators, parentheses, and keywords. A +formal definition of the syntax of tokens, including a list of all the +delimiters in given under `Miranda lexical syntax'. + +_R_U_L_E_S_ _A_B_O_U_T_ _L_A_Y_O_U_T + +Layout consists of white space characters (spaces, tabs, newlines and +formfeeds), and comments. A comment consists of a pair of adjacent +vertical bars, together with all the text to the right of the bars on +the same line. Thus + || this is a comment +Layout is not permitted inside tokens (except in char and string +constants, where it is significant) but may be inserted freely between +tokens to make scripts more readable. Layout is ignored by the compiler +except in two respects: + +1) At least one space (or other layout characters) must be present +between two tokens that would otherwise form an instance of a single +larger token. For example in + f 19 'b' +we have a function, f, applied to a number and a character, but if we +were to omit the two intervening spaces, the compiler would read this as +a single six-character identifier, because both digits and single-quotes +are legal characters in an identifier. (Where it is not required to +force the correct tokenisation, or because of the offside rule, see +below, the presence of layout between tokens is optional.) + +2) Certain syntactic objects (roughly, the right hand sides of +declarations -- for an exact account see those entities followed by a +`(;)' in the formal syntax) obey Landin's _o_f_f_s_i_d_e _r_u_l_e [Landin 1966]. +This requires that every token of the object lie either directly below +or to the right of its first token. A token which breaks this rule is +said to be `offside' with respect to that object and terminates its +parse. For example in + x = 2 < a + y = f q +the 'y' is offside with respect to the right hand side of the definition +of 'x' (because it is to the left of the initial '2'). In such a case +the trailing semicolon may be omitted from the right hand side of the +equation for x. + +It is because of the offside rule that Miranda scripts do not normally +contain explicit semicolons as terminators for definitions. The same +rule enables the compiler to determine the scopes of nested _w_h_e_r_e's by +looking at their indentation levels. For example in + f x = g y z + _w_h_e_r_e + y = (x+1)*(x-1) + z = p x (q y) + g r = groo (r+1) + +it is the offside rule which makes it clear that the definition of 'g' +is not local to the right hand side of the definition of 'f', but those +of 'y' and 'z' are. + +It is always possible to terminate a right hand side by an EXPLICIT +semicolon, instead of relying on the offside rule. For example the +above script could be written all in one line, as + f x = g y z _w_h_e_r_e y = (x+1)*(x-1); z = p x (q y);; g r = groo (r+1); + +Notice that we need TWO semicolons after the definition of z - the first +terminates the rhs of the definition of `z', and the second terminates +the larger rhs of which it is a part, namely that of the definition of +`f'. If we put only one semicolon at this point, the definition of `g' +would be local to that of `f'. + +This example should convince the reader that code using layout +information to show the block structure is much more readable, and this +is the normal practise. + +[_R_e_f_e_r_e_n_c_e P.J. Landin "The Next 700 Programming Languages", CACM vol 9 +pp157-165 (March 1966).] + +Note that an additional comment convention applies in scripts whose +first character is a `>'. See separate manual entry on `literate +scripts'. + |