summaryrefslogtreecommitdiff
path: root/miralib/manual/30
blob: b28e6ee4e60f2ff1b5cb8c38ce3f42e9edf4bf28 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
_S_o_m_e_ _g_u_i_d_e_l_i_n_e_s_ _o_n_ _g_o_o_d_ _p_r_o_g_r_a_m_m_i_n_g_ _s_t_y_l_e_ _i_n_ _M_i_r_a_n_d_a

We give here a series of suggested guidelines for good programming style
in  Miranda.   The  list is not meant to be exhaustive.  These rules are
also not intended to be followed in all cases, regardless of conflicting
considerations.   That  is  why they are only suggestions for good style
and not grammar rules.

_A_v_o_i_d_ _t_h_e_ _i_n_d_i_s_c_r_i_m_i_n_a_t_e_ _u_s_e_ _o_f_ _r_e_c_u_r_s_i_o_n
  A Miranda script that consists of large number of functions which call
each  other  in  an apparently random fashion is no easier to understand
than, say, a piece of FORTRAN code which is written as a rat's  nest  of
GOTO  statements.  An excessive reliance on recursion (especially mutual
recursion) can be an indication  of  a  weak  programming  style.   Some
pointers:

Use  list  comprehensions,  `..'  lists,  and  library   functions,   in
preference  to  ad-hoc recursion.  For example it is probably clearer to
define factorial by writing
	fac n = product[1..n]

than to define it from first principles, as
	fac 0 = 1
	fac (n+1) = (n+1) * fac n

and  to  define  the  cartesian  product  of  two  lists   by   a   list
comprehension, thus
	cp x y = [(a,b)|a<-x;b<-y]

is certainly a lot clearer than the recursive definition,
	cp (a:x) y = f y ++ cp x y
		     where
		     f (b:y) = (a,b): f y
		     f [] = []
	cp [] y = []

The standard environment contains a number  of  useful  list  processing
functions  (eg  map filter reverse foldr foldl) with whose properties it
is worth becoming familiar.  They capture common patterns  of  recursion
over  lists, and can often be used to simplify your code, and reduce the
reliance on `ad-hoc' recursion.  Programs using list comprehensions  and
standard  functions  are  also  likely  to  run  faster  (on the current
implementation) than equivalent programs using ad-hoc recursion.

The standard environment is only a basic collection  of  useful  general
purpose  functions.   As you get used to programming in Miranda you will
probably begin to discover other useful functions  that  express  common
patterns  of  recursion (perhaps over data structures other than lists).
It is a good practice to collect such functions in  libraries  (together
with  some explanations of their properties) so that you can reuse them,
and share them with others.  Not all of them will survive  the  test  of
time, but it cannot hurt to experiment.

To cause the definitions from such a library to be in scope  in  another
script  you  would  use  a  `%include'  directive (see manual section on
library directives).

_A_v_o_i_d_ _u_n_n_e_c_e_s_s_a_r_y_ _n_e_s_t_i_n_g_ _o_f_ _d_e_f_i_n_i_t_i_o_n_s
 Scripts  that  get  deeply  nested  in  where-clauses  are  harder   to
understand,  harder  to  reason about formally, harder to debug (because
functions defined inside where's cannot be exercised seperately)  slower
to compile, and generally more difficult to work with.

A  well  structured  script  will  consist  of  a  series  of  top-level
definitions,  each  of which (if it carries a where-clause at all) has a
fairly small number of local definitions.  A third level  of  definition
(where inside where) should be used only very occasionally.  [And if you
find yourself  getting  nested  four  and  five  levels  deep  in  block
structure you can be pretty sure that your program has gone badly out of
control.]

A function should normally be placed inside a where clause only if it is
logically  necessary to do so (which will be the case when it has a free
variable which is not in scope  outside  the  where  clause).   If  your
script consists, of say six functions, one of which solves a problem and
the other five of which are auxiliary to it, it is probably not  a  good
style  to put the five subsidiary functions inside a where clause of the
main one.  It is usually better to make all six top  level  definitions,
with the important one written first, say.

There are several reasons for this.  First that  it  makes  the  program
easier  to read, since it consists of six separate chunks of information
rather than one big one.  Second that the  program  is  much  easier  to
debug,  because  each  of  its functions can be exercised separately, on
appropriate test data,  within  a  Miranda  session.   Third  that  this
program structure is more robust for future development - for example if
we later wish to add a second `main' function that  solves  a  different
problem  by  using  the same five auxiliary functions in another way, we
can do so without having to restructure any existing code.

There is a temptation to use `where' to hide  information  that  is  not
relevant at top-level.  This may be misguided (especially if it leads to
code with large and complex where-clauses).  If you don't  wish  all  of
your  functions  or  data  structures  to be "visible" from outside, the
proper way to do this is to include a `%export' directive in the script.

Note also that (in the current implementation) functions defined  inside
a  "where" clause cannot have their types explicitly specified.  This is
a further reason to avoid putting structure inside a where  clause  that
does not logically have to be there.

_S_p_e_c_i_f_y_ _t_h_e_ _t_y_p_e_s_ _o_f_ _t_o_p_ _l_e_v_e_l_ _i_d_e_n_t_i_f_i_e_r_s
 The Milner  type  discipline  is  an  impressive  advance  in  compiler
technology.   It  is  also  a  trap  for  the unwary.  The fact that the
Miranda compiler will accept several hundred lines  of  code  without  a
single  type  specification,  and  correctly  infer the types of all the
identifiers does NOT mean that it is sensible to write code with no type
information.   (Compare:  compilers will also accept large programs with
no comments in, but that doesn't make such programs sensible.)

For other than fairly small scripts  it  is  good  style  to  insert  an
explicit  specification  of  the  type of any top level identifier whose
type  is  not  immediately   apparent   from   its   definition.    Type
specifications look like this
	ack::num->num->num
says that `ack' is a function taking two numbers and returning a number.
A type specification can occur anywhere in a script,  either  before  or
after  the  definition of the corresponding identifier, but common sense
suggests that the best place for it is  just  before  the  corresponding
definition.   

If in doubt it is always better to put in a type specification  than  to
leave it out.  The compiler may not need this extra type information but
human  beings  definitely  do.   The  extra  type  information   becomes
particularly important when your code reaches the level of complexity at
which you start to make type errors.

If your script contains a type error it is unreasonable  to  expect  the
compiler to correctly locate the real source of the error in the absence
of explicit type declarations.  A type error means  different  parts  of
your  code are inconsistent with one another in their use of identifiers
- if you have not given the compiler any information about the  intended
use  of  an  identifier,  you  cannot expect it to know which of several
conflicting uses are the `wrong' ones.  In such a case it can only  tell
you  that  something  is  wrong, and indicate the line on which it first
deduced an inconsistency - which may be many lines later than the `real'
error.   Explicit  type  declarations  make it much more likely that the
compiler will spot the `real  error'  on  the  line  where  it  actually
occurs.

Code containing explicit type information is  also  incomparably  easier
for other people to read.

_U_s_e_ _s_a_f_e_ _l_a_y_o_u_t
 This is a point to do with the operation of the offside  rule.   It  is
most  easily  explained  by means of an example.  Consider the following
definition, here assumed to be part of a larger script

	hippo = (rhino - swan)/piglet
		_w_h_e_r_e
		piglet = 17
		rhino = 63
		swan = 29

Some time after writing this we  carry  out  a  global  edit  to  expand
`hippo' to `hippopotamus'.  The definition now looks like this.

	hippopotamus = (rhino - swan)/piglet
		_w_h_e_r_e
		piglet = 17
		rhino = 63
		swan = 29

the where-clause has become offside, and the definition will  no  longer
compile.   Worse,  it is possible (with a little ingenuity) to construct
examples of layout where changing the length of an identifier will  move
a  definition  from  one  level  of scope to another, so that the script
still  compiles  but  now  has  a  different  meaning!!!   Replacing  an
identifier by a shorter one can cause similar difficulties with layout.

The layout of the `hippo' definition was unsafe, because  the  level  of
indentation  depended on the length of an identifier.  There are several
possible styles of `safe' layout.  The basic rule to follow is:

	Whenever a right hand side goes on for more than one line
	(because it consists of a set of guarded  cases, or because it
	carries a where clause, or just because it is an expression too
	big to fit on one line), you should take a newline BEFORE
	starting the rhs, and indent by some standard amount (not
	depending on the width of the lhs).

There are two main styles of safe layout, depending on whether you  take
the  newline  before  or  after the `=' of the definition.  Here are two
possible safe layouts for the `hippo' definition

	hippo = 
	    (rhino - swan)/piglet
	    _w_h_e_r_e
	    piglet = 17
	    rhino = 63
	    swan = 29

	hippo 
	  = (rhino - swan)/piglet
	    _w_h_e_r_e
	    piglet = 17
	    rhino = 63
	    swan = 29

The reason that either style can be  used  is  that  the  boundary,  for
offside  purposes,  of  a right hand side, is set by the first symbol of
the rhs itself, and not by the preceding `=' sign.

Both of these layouts  have  the  property  that  the  parse  cannot  be
affected  by  edits  which alter the lengths of one or more identifiers.
Either of these layout styles also have the  advantage  that  successive
levels of indentation can move to the right by a fixed step - this makes
code easier to read and lessens the danger that your layout  will  `fall
off'  the  right  hand  edge  of  the screen (although if you follow the
advice given earlier about avoiding deeply nested block  structure  this
is in any case unlikely to be a problem).

It would be convenient if there was a program for  reformatting  Miranda
scripts with a standard layout.  Apart from ensuring that the layout was
`safe' in the above sense, it might make it easier for  people  to  read
each  other's  code.   A  layout program of this kind may be provided in
later releases of the system.

Acknowledgement: The `hippopotamus' example (and the problem  of  unsafe
layout) was first pointed out by Mark Longley of the University of Kent.

_W_r_i_t_e_ _o_r_d_e_r_ _i_n_d_e_p_e_n_d_e_n_t_ _c_o_d_e
 When defining functions by pattern matching it is best (except in a few
cases  where it leads to real clumsiness of expression) to make sure the
patterns are mutually exclusive, so it does not matter in what order the
cases are written.

For the same reason it is better style to use sets of guards  which  are
composed   of  mutually  exclusive  boolean  expressions.   The  keyword
`otherwise' sometimes helps to make this less painful.

By way of illustration of some of the issues here is a definition  of  a
function  `merge'  which combines two already sorted lists into a single
sorted result, eliminating duplicates in the process
	merge [] y = y
	merge (a:x) [] = (a:x)
	merge (a:x) (b:y)
	  =  a:merge x (b:y), if a<b
	  =  b:merge (a:x) y, if a>b
	  =  a:merge x y,     otherwise

Note the use of mutually exclusive sets of patterns (it was tempting  to
write  `merge  x  []  = x' as the second case, but the above is probably
better style).

A related issue to these is that where  a  function  is  not  everywhere
defined  on its argument type, it is good practice to insert an explicit
error  case.   For  example  the  definition  given  in   the   standard
environment for `hd', the function which extracts the first element of a
list, is
	hd (a:x) = a
	hd [] = error "hd []"

Of course if a function is applied to an argument for which no  equation
has  been  given, the Miranda system will print an error message anyway,
but one advantage of putting in an explicit call to `error' is that  the
programmer  gets  control  of the error message.  The other (and perhaps
main) advantage  is  that  for  someone  else  reading  the  script,  it
explicitly  documents  the  fact  that  a certain use of the function is
considered an error.