diff options
Diffstat (limited to 'miralib/manual/30')
-rw-r--r-- | miralib/manual/30 | 261 |
1 files changed, 261 insertions, 0 deletions
diff --git a/miralib/manual/30 b/miralib/manual/30 new file mode 100644 index 0000000..b28e6ee --- /dev/null +++ b/miralib/manual/30 @@ -0,0 +1,261 @@ +_S_o_m_e_ _g_u_i_d_e_l_i_n_e_s_ _o_n_ _g_o_o_d_ _p_r_o_g_r_a_m_m_i_n_g_ _s_t_y_l_e_ _i_n_ _M_i_r_a_n_d_a + +We give here a series of suggested guidelines for good programming style +in Miranda. The list is not meant to be exhaustive. These rules are +also not intended to be followed in all cases, regardless of conflicting +considerations. That is why they are only suggestions for good style +and not grammar rules. + +_A_v_o_i_d_ _t_h_e_ _i_n_d_i_s_c_r_i_m_i_n_a_t_e_ _u_s_e_ _o_f_ _r_e_c_u_r_s_i_o_n + A Miranda script that consists of large number of functions which call +each other in an apparently random fashion is no easier to understand +than, say, a piece of FORTRAN code which is written as a rat's nest of +GOTO statements. An excessive reliance on recursion (especially mutual +recursion) can be an indication of a weak programming style. Some +pointers: + +Use list comprehensions, `..' lists, and library functions, in +preference to ad-hoc recursion. For example it is probably clearer to +define factorial by writing + fac n = product[1..n] + +than to define it from first principles, as + fac 0 = 1 + fac (n+1) = (n+1) * fac n + +and to define the cartesian product of two lists by a list +comprehension, thus + cp x y = [(a,b)|a<-x;b<-y] + +is certainly a lot clearer than the recursive definition, + cp (a:x) y = f y ++ cp x y + where + f (b:y) = (a,b): f y + f [] = [] + cp [] y = [] + +The standard environment contains a number of useful list processing +functions (eg map filter reverse foldr foldl) with whose properties it +is worth becoming familiar. They capture common patterns of recursion +over lists, and can often be used to simplify your code, and reduce the +reliance on `ad-hoc' recursion. Programs using list comprehensions and +standard functions are also likely to run faster (on the current +implementation) than equivalent programs using ad-hoc recursion. + +The standard environment is only a basic collection of useful general +purpose functions. As you get used to programming in Miranda you will +probably begin to discover other useful functions that express common +patterns of recursion (perhaps over data structures other than lists). +It is a good practice to collect such functions in libraries (together +with some explanations of their properties) so that you can reuse them, +and share them with others. Not all of them will survive the test of +time, but it cannot hurt to experiment. + +To cause the definitions from such a library to be in scope in another +script you would use a `%include' directive (see manual section on +library directives). + +_A_v_o_i_d_ _u_n_n_e_c_e_s_s_a_r_y_ _n_e_s_t_i_n_g_ _o_f_ _d_e_f_i_n_i_t_i_o_n_s + Scripts that get deeply nested in where-clauses are harder to +understand, harder to reason about formally, harder to debug (because +functions defined inside where's cannot be exercised seperately) slower +to compile, and generally more difficult to work with. + +A well structured script will consist of a series of top-level +definitions, each of which (if it carries a where-clause at all) has a +fairly small number of local definitions. A third level of definition +(where inside where) should be used only very occasionally. [And if you +find yourself getting nested four and five levels deep in block +structure you can be pretty sure that your program has gone badly out of +control.] + +A function should normally be placed inside a where clause only if it is +logically necessary to do so (which will be the case when it has a free +variable which is not in scope outside the where clause). If your +script consists, of say six functions, one of which solves a problem and +the other five of which are auxiliary to it, it is probably not a good +style to put the five subsidiary functions inside a where clause of the +main one. It is usually better to make all six top level definitions, +with the important one written first, say. + +There are several reasons for this. First that it makes the program +easier to read, since it consists of six separate chunks of information +rather than one big one. Second that the program is much easier to +debug, because each of its functions can be exercised separately, on +appropriate test data, within a Miranda session. Third that this +program structure is more robust for future development - for example if +we later wish to add a second `main' function that solves a different +problem by using the same five auxiliary functions in another way, we +can do so without having to restructure any existing code. + +There is a temptation to use `where' to hide information that is not +relevant at top-level. This may be misguided (especially if it leads to +code with large and complex where-clauses). If you don't wish all of +your functions or data structures to be "visible" from outside, the +proper way to do this is to include a `%export' directive in the script. + +Note also that (in the current implementation) functions defined inside +a "where" clause cannot have their types explicitly specified. This is +a further reason to avoid putting structure inside a where clause that +does not logically have to be there. + +_S_p_e_c_i_f_y_ _t_h_e_ _t_y_p_e_s_ _o_f_ _t_o_p_ _l_e_v_e_l_ _i_d_e_n_t_i_f_i_e_r_s + The Milner type discipline is an impressive advance in compiler +technology. It is also a trap for the unwary. The fact that the +Miranda compiler will accept several hundred lines of code without a +single type specification, and correctly infer the types of all the +identifiers does NOT mean that it is sensible to write code with no type +information. (Compare: compilers will also accept large programs with +no comments in, but that doesn't make such programs sensible.) + +For other than fairly small scripts it is good style to insert an +explicit specification of the type of any top level identifier whose +type is not immediately apparent from its definition. Type +specifications look like this + ack::num->num->num +says that `ack' is a function taking two numbers and returning a number. +A type specification can occur anywhere in a script, either before or +after the definition of the corresponding identifier, but common sense +suggests that the best place for it is just before the corresponding +definition. + +If in doubt it is always better to put in a type specification than to +leave it out. The compiler may not need this extra type information but +human beings definitely do. The extra type information becomes +particularly important when your code reaches the level of complexity at +which you start to make type errors. + +If your script contains a type error it is unreasonable to expect the +compiler to correctly locate the real source of the error in the absence +of explicit type declarations. A type error means different parts of +your code are inconsistent with one another in their use of identifiers +- if you have not given the compiler any information about the intended +use of an identifier, you cannot expect it to know which of several +conflicting uses are the `wrong' ones. In such a case it can only tell +you that something is wrong, and indicate the line on which it first +deduced an inconsistency - which may be many lines later than the `real' +error. Explicit type declarations make it much more likely that the +compiler will spot the `real error' on the line where it actually +occurs. + +Code containing explicit type information is also incomparably easier +for other people to read. + +_U_s_e_ _s_a_f_e_ _l_a_y_o_u_t + This is a point to do with the operation of the offside rule. It is +most easily explained by means of an example. Consider the following +definition, here assumed to be part of a larger script + + hippo = (rhino - swan)/piglet + _w_h_e_r_e + piglet = 17 + rhino = 63 + swan = 29 + +Some time after writing this we carry out a global edit to expand +`hippo' to `hippopotamus'. The definition now looks like this. + + hippopotamus = (rhino - swan)/piglet + _w_h_e_r_e + piglet = 17 + rhino = 63 + swan = 29 + +the where-clause has become offside, and the definition will no longer +compile. Worse, it is possible (with a little ingenuity) to construct +examples of layout where changing the length of an identifier will move +a definition from one level of scope to another, so that the script +still compiles but now has a different meaning!!! Replacing an +identifier by a shorter one can cause similar difficulties with layout. + +The layout of the `hippo' definition was unsafe, because the level of +indentation depended on the length of an identifier. There are several +possible styles of `safe' layout. The basic rule to follow is: + + Whenever a right hand side goes on for more than one line + (because it consists of a set of guarded cases, or because it + carries a where clause, or just because it is an expression too + big to fit on one line), you should take a newline BEFORE + starting the rhs, and indent by some standard amount (not + depending on the width of the lhs). + +There are two main styles of safe layout, depending on whether you take +the newline before or after the `=' of the definition. Here are two +possible safe layouts for the `hippo' definition + + hippo = + (rhino - swan)/piglet + _w_h_e_r_e + piglet = 17 + rhino = 63 + swan = 29 + + hippo + = (rhino - swan)/piglet + _w_h_e_r_e + piglet = 17 + rhino = 63 + swan = 29 + +The reason that either style can be used is that the boundary, for +offside purposes, of a right hand side, is set by the first symbol of +the rhs itself, and not by the preceding `=' sign. + +Both of these layouts have the property that the parse cannot be +affected by edits which alter the lengths of one or more identifiers. +Either of these layout styles also have the advantage that successive +levels of indentation can move to the right by a fixed step - this makes +code easier to read and lessens the danger that your layout will `fall +off' the right hand edge of the screen (although if you follow the +advice given earlier about avoiding deeply nested block structure this +is in any case unlikely to be a problem). + +It would be convenient if there was a program for reformatting Miranda +scripts with a standard layout. Apart from ensuring that the layout was +`safe' in the above sense, it might make it easier for people to read +each other's code. A layout program of this kind may be provided in +later releases of the system. + +Acknowledgement: The `hippopotamus' example (and the problem of unsafe +layout) was first pointed out by Mark Longley of the University of Kent. + +_W_r_i_t_e_ _o_r_d_e_r_ _i_n_d_e_p_e_n_d_e_n_t_ _c_o_d_e + When defining functions by pattern matching it is best (except in a few +cases where it leads to real clumsiness of expression) to make sure the +patterns are mutually exclusive, so it does not matter in what order the +cases are written. + +For the same reason it is better style to use sets of guards which are +composed of mutually exclusive boolean expressions. The keyword +`otherwise' sometimes helps to make this less painful. + +By way of illustration of some of the issues here is a definition of a +function `merge' which combines two already sorted lists into a single +sorted result, eliminating duplicates in the process + merge [] y = y + merge (a:x) [] = (a:x) + merge (a:x) (b:y) + = a:merge x (b:y), if a<b + = b:merge (a:x) y, if a>b + = a:merge x y, otherwise + +Note the use of mutually exclusive sets of patterns (it was tempting to +write `merge x [] = x' as the second case, but the above is probably +better style). + +A related issue to these is that where a function is not everywhere +defined on its argument type, it is good practice to insert an explicit +error case. For example the definition given in the standard +environment for `hd', the function which extracts the first element of a +list, is + hd (a:x) = a + hd [] = error "hd []" + +Of course if a function is applied to an argument for which no equation +has been given, the Miranda system will print an error message anyway, +but one advantage of putting in an explicit call to `error' is that the +programmer gets control of the error message. The other (and perhaps +main) advantage is that for someone else reading the script, it +explicitly documents the fact that a certain use of the function is +considered an error. + |