1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
|
_S_o_m_e_ _g_u_i_d_e_l_i_n_e_s_ _o_n_ _g_o_o_d_ _p_r_o_g_r_a_m_m_i_n_g_ _s_t_y_l_e_ _i_n_ _M_i_r_a_n_d_a
We give here a series of suggested guidelines for good programming style
in Miranda. The list is not meant to be exhaustive. These rules are
also not intended to be followed in all cases, regardless of conflicting
considerations. That is why they are only suggestions for good style
and not grammar rules.
_A_v_o_i_d_ _t_h_e_ _i_n_d_i_s_c_r_i_m_i_n_a_t_e_ _u_s_e_ _o_f_ _r_e_c_u_r_s_i_o_n
A Miranda script that consists of large number of functions which call
each other in an apparently random fashion is no easier to understand
than, say, a piece of FORTRAN code which is written as a rat's nest of
GOTO statements. An excessive reliance on recursion (especially mutual
recursion) can be an indication of a weak programming style. Some
pointers:
Use list comprehensions, `..' lists, and library functions, in
preference to ad-hoc recursion. For example it is probably clearer to
define factorial by writing
fac n = product[1..n]
than to define it from first principles, as
fac 0 = 1
fac (n+1) = (n+1) * fac n
and to define the cartesian product of two lists by a list
comprehension, thus
cp x y = [(a,b)|a<-x;b<-y]
is certainly a lot clearer than the recursive definition,
cp (a:x) y = f y ++ cp x y
where
f (b:y) = (a,b): f y
f [] = []
cp [] y = []
The standard environment contains a number of useful list processing
functions (eg map filter reverse foldr foldl) with whose properties it
is worth becoming familiar. They capture common patterns of recursion
over lists, and can often be used to simplify your code, and reduce the
reliance on `ad-hoc' recursion. Programs using list comprehensions and
standard functions are also likely to run faster (on the current
implementation) than equivalent programs using ad-hoc recursion.
The standard environment is only a basic collection of useful general
purpose functions. As you get used to programming in Miranda you will
probably begin to discover other useful functions that express common
patterns of recursion (perhaps over data structures other than lists).
It is a good practice to collect such functions in libraries (together
with some explanations of their properties) so that you can reuse them,
and share them with others. Not all of them will survive the test of
time, but it cannot hurt to experiment.
To cause the definitions from such a library to be in scope in another
script you would use a `%include' directive (see manual section on
library directives).
_A_v_o_i_d_ _u_n_n_e_c_e_s_s_a_r_y_ _n_e_s_t_i_n_g_ _o_f_ _d_e_f_i_n_i_t_i_o_n_s
Scripts that get deeply nested in where-clauses are harder to
understand, harder to reason about formally, harder to debug (because
functions defined inside where's cannot be exercised seperately) slower
to compile, and generally more difficult to work with.
A well structured script will consist of a series of top-level
definitions, each of which (if it carries a where-clause at all) has a
fairly small number of local definitions. A third level of definition
(where inside where) should be used only very occasionally. [And if you
find yourself getting nested four and five levels deep in block
structure you can be pretty sure that your program has gone badly out of
control.]
A function should normally be placed inside a where clause only if it is
logically necessary to do so (which will be the case when it has a free
variable which is not in scope outside the where clause). If your
script consists, of say six functions, one of which solves a problem and
the other five of which are auxiliary to it, it is probably not a good
style to put the five subsidiary functions inside a where clause of the
main one. It is usually better to make all six top level definitions,
with the important one written first, say.
There are several reasons for this. First that it makes the program
easier to read, since it consists of six separate chunks of information
rather than one big one. Second that the program is much easier to
debug, because each of its functions can be exercised separately, on
appropriate test data, within a Miranda session. Third that this
program structure is more robust for future development - for example if
we later wish to add a second `main' function that solves a different
problem by using the same five auxiliary functions in another way, we
can do so without having to restructure any existing code.
There is a temptation to use `where' to hide information that is not
relevant at top-level. This may be misguided (especially if it leads to
code with large and complex where-clauses). If you don't wish all of
your functions or data structures to be "visible" from outside, the
proper way to do this is to include a `%export' directive in the script.
Note also that (in the current implementation) functions defined inside
a "where" clause cannot have their types explicitly specified. This is
a further reason to avoid putting structure inside a where clause that
does not logically have to be there.
_S_p_e_c_i_f_y_ _t_h_e_ _t_y_p_e_s_ _o_f_ _t_o_p_ _l_e_v_e_l_ _i_d_e_n_t_i_f_i_e_r_s
The Milner type discipline is an impressive advance in compiler
technology. It is also a trap for the unwary. The fact that the
Miranda compiler will accept several hundred lines of code without a
single type specification, and correctly infer the types of all the
identifiers does NOT mean that it is sensible to write code with no type
information. (Compare: compilers will also accept large programs with
no comments in, but that doesn't make such programs sensible.)
For other than fairly small scripts it is good style to insert an
explicit specification of the type of any top level identifier whose
type is not immediately apparent from its definition. Type
specifications look like this
ack::num->num->num
says that `ack' is a function taking two numbers and returning a number.
A type specification can occur anywhere in a script, either before or
after the definition of the corresponding identifier, but common sense
suggests that the best place for it is just before the corresponding
definition.
If in doubt it is always better to put in a type specification than to
leave it out. The compiler may not need this extra type information but
human beings definitely do. The extra type information becomes
particularly important when your code reaches the level of complexity at
which you start to make type errors.
If your script contains a type error it is unreasonable to expect the
compiler to correctly locate the real source of the error in the absence
of explicit type declarations. A type error means different parts of
your code are inconsistent with one another in their use of identifiers
- if you have not given the compiler any information about the intended
use of an identifier, you cannot expect it to know which of several
conflicting uses are the `wrong' ones. In such a case it can only tell
you that something is wrong, and indicate the line on which it first
deduced an inconsistency - which may be many lines later than the `real'
error. Explicit type declarations make it much more likely that the
compiler will spot the `real error' on the line where it actually
occurs.
Code containing explicit type information is also incomparably easier
for other people to read.
_U_s_e_ _s_a_f_e_ _l_a_y_o_u_t
This is a point to do with the operation of the offside rule. It is
most easily explained by means of an example. Consider the following
definition, here assumed to be part of a larger script
hippo = (rhino - swan)/piglet
_w_h_e_r_e
piglet = 17
rhino = 63
swan = 29
Some time after writing this we carry out a global edit to expand
`hippo' to `hippopotamus'. The definition now looks like this.
hippopotamus = (rhino - swan)/piglet
_w_h_e_r_e
piglet = 17
rhino = 63
swan = 29
the where-clause has become offside, and the definition will no longer
compile. Worse, it is possible (with a little ingenuity) to construct
examples of layout where changing the length of an identifier will move
a definition from one level of scope to another, so that the script
still compiles but now has a different meaning!!! Replacing an
identifier by a shorter one can cause similar difficulties with layout.
The layout of the `hippo' definition was unsafe, because the level of
indentation depended on the length of an identifier. There are several
possible styles of `safe' layout. The basic rule to follow is:
Whenever a right hand side goes on for more than one line
(because it consists of a set of guarded cases, or because it
carries a where clause, or just because it is an expression too
big to fit on one line), you should take a newline BEFORE
starting the rhs, and indent by some standard amount (not
depending on the width of the lhs).
There are two main styles of safe layout, depending on whether you take
the newline before or after the `=' of the definition. Here are two
possible safe layouts for the `hippo' definition
hippo =
(rhino - swan)/piglet
_w_h_e_r_e
piglet = 17
rhino = 63
swan = 29
hippo
= (rhino - swan)/piglet
_w_h_e_r_e
piglet = 17
rhino = 63
swan = 29
The reason that either style can be used is that the boundary, for
offside purposes, of a right hand side, is set by the first symbol of
the rhs itself, and not by the preceding `=' sign.
Both of these layouts have the property that the parse cannot be
affected by edits which alter the lengths of one or more identifiers.
Either of these layout styles also have the advantage that successive
levels of indentation can move to the right by a fixed step - this makes
code easier to read and lessens the danger that your layout will `fall
off' the right hand edge of the screen (although if you follow the
advice given earlier about avoiding deeply nested block structure this
is in any case unlikely to be a problem).
It would be convenient if there was a program for reformatting Miranda
scripts with a standard layout. Apart from ensuring that the layout was
`safe' in the above sense, it might make it easier for people to read
each other's code. A layout program of this kind may be provided in
later releases of the system.
Acknowledgement: The `hippopotamus' example (and the problem of unsafe
layout) was first pointed out by Mark Longley of the University of Kent.
_W_r_i_t_e_ _o_r_d_e_r_ _i_n_d_e_p_e_n_d_e_n_t_ _c_o_d_e
When defining functions by pattern matching it is best (except in a few
cases where it leads to real clumsiness of expression) to make sure the
patterns are mutually exclusive, so it does not matter in what order the
cases are written.
For the same reason it is better style to use sets of guards which are
composed of mutually exclusive boolean expressions. The keyword
`otherwise' sometimes helps to make this less painful.
By way of illustration of some of the issues here is a definition of a
function `merge' which combines two already sorted lists into a single
sorted result, eliminating duplicates in the process
merge [] y = y
merge (a:x) [] = (a:x)
merge (a:x) (b:y)
= a:merge x (b:y), if a<b
= b:merge (a:x) y, if a>b
= a:merge x y, otherwise
Note the use of mutually exclusive sets of patterns (it was tempting to
write `merge x [] = x' as the second case, but the above is probably
better style).
A related issue to these is that where a function is not everywhere
defined on its argument type, it is good practice to insert an explicit
error case. For example the definition given in the standard
environment for `hd', the function which extracts the first element of a
list, is
hd (a:x) = a
hd [] = error "hd []"
Of course if a function is applied to an argument for which no equation
has been given, the Miranda system will print an error message anyway,
but one advantage of putting in an explicit call to `error' is that the
programmer gets control of the error message. The other (and perhaps
main) advantage is that for someone else reading the script, it
explicitly documents the fact that a certain use of the function is
considered an error.
|