FROM SIGPLAN NOTICES, 21(12):158-166, December 1986. (C) D.A.Turner _A_n_ _O_v_e_r_v_i_e_w_ _o_f_ _M_i_r_a_n_d_a David Turner Computing Laboratory University of Kent Canterbury CT2 7NF ENGLAND Miranda is an advanced functional programming system which runs under the UNIX operating system (*). The aim of the Miranda system is to provide a modern functional language, embedded in a convenient programming environment, suitable both for teaching and as a general purpose programming tool. The purpose of this short article is to give a brief overview of the main features of Miranda. The topics we shall discuss, in order, are: Basic ideas The Miranda programming environment Guarded equations and block structure Pattern matching Currying and higher order functions List comprehensions Lazy evaluation and infinite lists Polymorphic strong typing User defined types Type synonyms Abstract data types Separate compilation and linking Current implementation status (*) _N_o_t_e: UNIX is a trademark of AT&T Bell Laboratories, Miranda is a trademark of Research Software Ltd. _B_a_s_i_c_ _i_d_e_a_s The Miranda programming language is purely functional - there are no side effects or imperative features of any kind. A program (actually we don't call it a program, we call it a "script") is a collection of equations defining various functions and data structures which we are interested in computing. The order in which the equations are given is not in general significant. There is for example no obligation for the definition of an entity to precede its first use. Here is a very simple example of a Miranda script: z = sq x / sq y sq n = n * n x = a + b y = a - b a = 10 b = 5 Notice the absence of syntactic baggage - Miranda is, by design, rather terse. There are no mandatory type declarations, although (see later) the language is strongly typed. There are no semicolons at the end of definitions - the parsing algorithm makes intelligent use of layout. Note that the notation for function application is simply juxtaposition, as in "sq x". In the definition of the sq function, "n" is a formal parameter - its scope is limited to the equation in which it occurs (whereas the other names introduced above have the whole script for their scope). The most commonly used data structure is the list, which in Miranda is written with square brackets and commas, eg: week_days = ["Mon","Tue","Wed","Thur","Fri"] days = week_days ++ ["Sat","Sun"] Lists may be appended by the "++" operator. Other useful operations on lists include infix ":" which prefixes an element to the front of a list, "#" which takes the length of a list, and infix "!" which does subscripting. So for example 0:[1,2,3] has the value [0,1,2,3], #days is 7, and days!0 is "Mon". There is also an operator "--" which does list subtraction. For example [1,2,3,4,5] -- [2,4] is [1,3,5]. There is a shorthand notation using ".." for lists whose elements form an arithmetic series. Here for example are definitions of the factorial function, and of a number "result" which is the sum of the odd numbers between 1 and 100 (sum and product are library functions): fac n = product [1..n] result = sum [1,3..100] The elements of a list must all be of the same type. A sequence of elements of mixed type is called a tuple, and is written using parentheses instead of square brackets. Example employee = ("Jones",True,False,39) Tuples are analogous to records in Pascal (whereas lists are analogous to arrays). Tuples cannot be subscripted - their elements are extracted by pattern matching (see later). _T_h_e_ _p_r_o_g_r_a_m_m_i_n_g_ _e_n_v_i_r_o_n_m_e_n_t The Miranda system is interactive and runs under UNIX as a self contained subsystem. The basic action is to evaluate expressions, supplied by the user at the terminal, in the environment established by the current script. For example evaluating "z" in the context of the first script given above would produce the result "9.0". The Miranda compiler works in conjunction with an editor (by default this is "vi" but it can be set to any editor of the user's choice). Scripts are automatically recompiled after edits, and any syntax or type errors signalled immediately. The polymorphic type system permits a high proportion of logical errors to be detected at compile time. There is quite a large library of standard functions. There is also an online reference manual. The interface to UNIX permits Miranda programs to take data from, and send data to, UNIX files and it is also possible to invoke Miranda programs directly from the UNIX shell and to combine them, via UNIX pipes, with processes written in other languages. _G_u_a_r_d_e_d_ _e_q_u_a_t_i_o_n_s_ _a_n_d_ _b_l_o_c_k_ _s_t_r_u_c_t_u_r_e An equation can have several alternative right hand sides distinguished by "guards" - the guard is written on the right following a comma. For example the greatest common divisor function can be written: gcd a b = gcd (a-b) b, _i_f a>b = gcd a (b-a), _i_f a0 _w_h_e_r_e delta = b*b - 4*a*c radix = sqrt delta Where clauses may occur nested, to arbitrary depth, allowing Miranda programs to be organised with a nested block structure. Indentation of inner blocks is compulsory, as layout information is used by the parser. (*) _N_o_t_e: In early versions of Miranda the keyword _i_f was not required. _P_a_t_t_e_r_n_ _m_a_t_c_h_i_n_g It is permitted to define a function by giving several alternative equations, distinguished by the use of different patterns in the formal parameters. This provides another method of doing case analysis which is often more elegant than the use of guards. We here give some simple examples of pattern matching on natural numbers, lists and tuples. Here is (another) definition of the factorial function, and a definition of Ackermann's function: fac 0 = 1 fac (n+1) = (n+1) * fac n ack 0 n = n+1 ack (m+1) 0 = ack m 1 ack (m+1) (n+1) = ack m (ack (m+1) n) Here is a (naive) definition of a function for computing the n'th Fibonacci number: fib 0 = 0 fib 1 = 1 fib (n+2) = fib (n+1) + fib n Here are some simple examples of functions defined by pattern matching on lists: sum [] = 0 sum (a:x) = a + sum x product [] = 1 product (a:x) = a * product x reverse [] = [] reverse (a:x) = reverse x ++ [a] Accessing the elements of a tuple is also done by pattern matching. For example the selection functions on 2-tuples can be defined thus fst (a,b) = a snd (a,b) = b As final examples we give the definitions of two Miranda library functions, take and drop, which return the first n members of a list, and the rest of the list without the first n members, respectively take 0 x = [] take (n+1) [] = [] take (n+1) (a:x) = a : take n x drop 0 x = x drop (n+1) [] = [] drop (n+1) (a:x) = drop n x Notice that the two functions are defined in such a way that that the following identity always holds - "take n x ++ drop n x = x" - including in the pathological case that the length of x is less than n. _C_u_r_r_y_i_n_g_ _a_n_d_ _h_i_g_h_e_r_ _o_r_d_e_r_ _f_u_n_c_t_i_o_n_s Miranda is a fully higher order language - functions are first class citizens and can be both passed as parameters and returned as results. Function application is left associative, so when we write "f x y" it is parsed as "(f x) y", meaning that the result of applying f to x is a function, which is then applied to y. The reader may test out his understanding of higher order functions by working out what is the value of "answer" in the following script: answer = twice twice twice suc 0 twice f x = f (f x) suc x = x + 1 Note that in Miranda every function of two or more arguments is actually a higher order function. This is very useful as it permits partial application. For example "member" is a library function such that "member x a" tests if the list x contains the element a (returning True or False as appropriate). By partially applying member we can derive many useful predicates, such as vowel = member ['a','e','i','o','u'] digit = member ['0','1','2','3','4','5','6','7','8','9'] month = member ["Jan","Feb","Mar","Apr","Jun","Jul","Aug","Sep", "Oct","Nov","Dec"] As another example of higher order programming consider the function foldr, defined foldr op k [] = k foldr op k (a:x) = op a (foldr op k x) All the standard list processing functions can be obtained by partially applying foldr. Examples sum = foldr (+) 0 product = foldr (*) 1 reverse = foldr postfix [] _w_h_e_r_e postfix a x = x ++ [a] _L_i_s_t_ _c_o_m_p_r_e_h_e_n_s_i_o_n_s List comprehensions give a concise syntax for a rather general class of iterations over lists. The syntax is adapted from an analogous notation used in set theory (called "set comprehension"). A simple example of a list comprehension is: [ n*n | n <- [1..100] ] This is a list containing (in order) the squares of all the numbers from 1 to 100. The above expression can be read as "list of all n*n such that n is drawn from the list 1 to 100". Note that "n" is a local variable of the above expression. The variable-binding construct to the right of the bar is called a "generator" - the "<-" sign denotes that the variable introduced on its left ranges over all the elements of the list on its right. The general form of a list comprehension in Miranda is: [ body | qualifiers ] Each qualifier is either a generator, of the form var<-exp, or else a filter, which is a boolean expression used to restrict the ranges of the variables introduced by the generators. When two or more qualifiers are present they are separated by semicolons. An example of a list comprehension with two generators is given by the following definition of a function for returning a list of all the permutations of a given list, perms [] = [[]] perms x = [ a:y | a <- x; y <- perms (x--[a]) ] The use of a filter is shown by the following definition of a function which takes a number and returns a list of all its factors, factors n = [ i | i <- [1..n _d_i_v 2]; n _m_o_d i = 0 ] List comprehensions often allow remarkable conciseness of expression. We give two examples. Here is a Miranda statement of Hoare's "Quicksort" algorithm, as a method of sorting a list, sort [] = [] sort (a:x) = sort [ b | b <- x; b<=a ] ++ [a] ++ sort [ b | b <- x; b>a ] Next is a Miranda solution to the eight queens problem. We have to place eight queens on chess board so that no queen gives check to any other. Since any solution must have exactly one queen in each column, a suitable representation for a board is a list of integers giving the row number of the queen in each successive column. In the following script the function "queens n" returns all safe ways to place queens on the first n columns. A list of all solutions to the eight queens problem is therefore obtained by printing the value of (queens 8) queens 0 = [[]] queens (n+1) = [ q:b | b <- queens n; q <- [0..7]; safe q b ] safe q b = and [ ~checks q b i | i <- [0..#b-1] ] checks q b i = q=b!i \/ abs(q - b!i)=i+1 _L_a_z_y_ _e_v_a_l_u_a_t_i_o_n_ _a_n_d_ _i_n_f_i_n_i_t_e_ _l_i_s_t_s Miranda's evaluation mechanism is "lazy", in the sense that no subexpression is evaluated until its value is known to be required. One consequence of this is that is possible to define functions which are non-strict (meaning that they are capable of returning an answer even if one of their arguments is undefined). For example we can define a conditional function as follows, cond True x y = x cond False x y = y and then use it in such situations as "cond (x=0) 0 (1/x)". The other main consequence of lazy evaluation is that it makes it possible to write down definitions of infinite data structures. Here are some examples of Miranda definitions of infinite lists (note that there is a modified form of the ".." notation for endless arithmetic progressions) ones = 1 : ones repeat a = x _w_h_e_r_e x = a : x nats = [0..] odds = [1,3..] squares = [ n*n | n <- [0..] ] perfects = [ n | n <- [1..]; sum(factors n) = n ] primes = sieve [ 2.. ] _w_h_e_r_e sieve (p:x) = p : sieve [ n | n <- x; n _m_o_d p > 0 ] One interesting application of infinite lists is to act as lookup tables for caching the values of a function. For example our earlier naive definition of "fib" can be improved from exponential to linear complexity by changing the recursion to use a lookup table, thus fib 0 = 1 fib 1 = 1 fib (n+2) = flist!(n+1) + flist!n _w_h_e_r_e flist = map fib [ 0.. ] Another important use of infinite lists is that they enable us to write functional programs representing networks of communicating processes. Consider for example the Hamming numbers problem - we have to print in ascending order all numbers of the form 2^a*3^b*5^c, for a,b,c>=0. There is a nice solution to this problem in terms of communicating processes, which can be expressed in Miranda as follows hamming = 1 : merge (f 2) (merge (f 3) (f 5)) _w_h_e_r_e f a = [ n*a | n <- hamming ] merge (a:x) (b:y) = a : merge x (b:y), _i_f ab = a : merge x y, _o_t_h_e_r_w_i_s_e _P_o_l_y_m_o_r_p_h_i_c_ _s_t_r_o_n_g_ _t_y_p_i_n_g Miranda is strongly typed. That is, every expression and every subexpression has a type, which can be deduced at compile time, and any inconsistency in the type structure of a script results in a compile time error message. We here briefly summarise Miranda's notation for its types. There are three primitive types, called num, bool, and char. The type num comprises integer and floating point numbers (the distinction between integers and floating point numbers is handled at run time - this is not regarded as being a type distinction). There are two values of type bool, called True and False. The type char comprises the Latin-1 character set - character constants are written in single quotes, using C escape conventions, e.g. 'a', '$', '\n' etc. If T is type, then [T] is the type of lists whose elements are of type T. For example [[1,2],[2,3],[4,5]] is of type [[num]], that is list of lists of numbers. String constants are of type [char], in fact a string such as "hello" is simply a shorthand way of writing ['h','e','l','l','o']. If T1 to Tn are types, then (T1, ... ,Tn) is the type of tuples with objects of these types as components. For example (True,"hello",36) is of type (bool,[char],num). If T1 and T2 are types, then T1->T2 is the type of a function with arguments in T1 and results in T2. For example the function sum is of type [num]->num. The function quadsolve, given earlier, is of type num->num->num->[num]. Note that "->" is right associative. Miranda scripts can include type declarations. These are written using "::" to mean is of type. Example sq :: num -> num sq n = n * n The type declaration is not necessary, however. The compiler is always able to deduce the type of an identifier from its defining equation. Miranda scripts often contain type declarations as these are useful for documentation (and they provide an extra check, since the typechecker will complain if the declared type is inconsistent with the inferred one). Types can be polymorphic, in the sense of [Milner 1978]. This is indicated by using the symbols * ** *** etc. as an alphabet of generic type variables. For example, the identity function, defined in the Miranda library as id x = x has the following type id :: * -> * this means that the identity function has many types. Namely all those which can be obtained by substituting an arbitrary type for the generic type variable, eg "num->num", "bool->bool", "(*->**) -> (*->**)" and so on. We illustrate the Miranda type system by giving types for some of the functions so far defined in this article fac :: num -> num ack :: num -> num -> num sum :: [num] -> num month :: [char] -> bool reverse :: [*] -> [*] fst :: (*,**) -> * snd :: (*,**) -> ** foldr :: (*->**->**) -> ** -> [*] -> ** perms :: [*] -> [[*]] _U_s_e_r_ _d_e_f_i_n_e_d_ _t_y_p_e_s The user may introduce new types. This is done by an equation in "::=". For example a type of labelled binary trees (with numeric labels) would be introduced as follows, tree ::= Nilt | Node num tree tree This introduces three new identifiers - "tree" which is the name of the type, and "Nilt" and "Node" which are the constructors for trees - note that constructors must begin with an upper case letter. Nilt is an atomic constructor, while Node takes three arguments, of the types shown. Here is an example of a tree built using these constructors t1 = Node 7 (Node 3 Nilt Nilt) (Node 4 Nilt Nilt) To analyse an object of user defined type, we use pattern matching. For example here is a definition of a function for taking the mirror image of a tree mirror Nilt = Nilt mirror (Node a x y) = Node a (mirror y) (mirror x) User defined types can be polymorphic - this is shown by introducing one or more generic type variables as parameters of the "::=" equation. For example we can generalise the definition of tree to allow arbitrary labels, thus tree * ::= Nilt | Node * (tree *) (tree *) this introduces a family of tree types, including tree num, tree bool, tree (char->char), and so on. The types introduced by "::=" definitions are called "algebraic types". Algebraic types are a very general idea. They include scalar enumeration types, eg color ::= Red | Orange | Yellow | Green | Blue | Indigo | Violet and also give us a way to do union types, for example bool_or_num ::= Left bool | Right num It is interesting to note that all the basic data types of Miranda could be defined from first principles, using "::=" equations. For example here are type definitions for bool, (natural) numbers and lists, bool ::= True | False nat ::= Zero | Suc nat list * ::= Nil | Cons * (list *) Having types such as "num" built in is done for reasons of efficiency - it isn't logically necessary. _N_o_t_e: In versions of Miranda before release two (1989) it was possible to associate "laws" with the constructors of an algebraic type, which are applied whenever an object of the type is built. For details see [Turner 1985, Thompson 1986]. This feature was little used and since has been removed from the language. _T_y_p_e_ _s_y_n_o_n_y_m_s The Miranda programmer can introduce a new name for an already existing type. We use "==" for these definitions, to distinguish them from ordinary value definitions. Examples string == [char] matrix == [[num]] Type synonyms are entirely transparent to the typechecker - it is best to think of them as macros. It is also possible to introduce synonyms for families of types. This is done by using generic type symbols as formal parameters, as in array * == [[*]] so now eg `array num' is the same type as `matrix'. _A_b_s_t_r_a_c_t_ _d_a_t_a_ _t_y_p_e_s In addition to concrete types, introduced by "::=" or "==" equations, Miranda permits the definition of abstract types, whose implementation is "hidden" from the rest of the program. To show how this works we give the standard example of defining stack as an abstract data type (here based on lists): _a_b_s_t_y_p_e stack * _w_i_t_h empty :: stack * isempty :: stack * -> bool push :: * -> stack * -> stack * pop :: stack * -> stack * top :: stack * -> * stack * == [*] empty = [] isempty x = (x=[]) push a x = (a:x) pop (a:x) = x top (a:x) = a We see that the definition of an abstract data type consists of two parts. First a declaration of the form "abstype ... with ...", where the names following the "with" are called the _s_i_g_n_a_t_u_r_e of the abstract data type. These names are the interface between the abstract data type and the rest of the program. Then a set of equations giving bindings for the names introduced in the abstype declaration. These are called the _i_m_p_l_e_m_e_n_t_a_t_i_o_n _e_q_u_a_t_i_o_n_s. The type abstraction is enforced by the typechecker. The mechanism works as follows. When typechecking the implementation equations the abstract type and its representation are treated as being the same type. In the whole of the rest of the script the abstract type and its representation are treated as two separate and completely unrelated types. This is somewhat different from the usual mechanism for implementing abstract data types, but has a number of advantages. It is discussed at somewhat greater length in [Turner 85]. _S_e_p_a_r_a_t_e_ _c_o_m_p_i_l_a_t_i_o_n_ _a_n_d_ _l_i_n_k_i_n_g The basic mechanisms for separate compilation and linking are extremely simple. Any Miranda script can contain one or more directives of the form %include "pathname" where "pathname" is the name of another Miranda script file (which might itself contain include directives, and so on recursively - cycles in the include structure are not permitted however). The visibility of names to an including script is controlled by a directive in the included script, of the form %export names It is permitted to export types as well as values. It is not permitted to export a value to a place where its type is unknown, so if you export an object of a locally defined type, the typename must be exported also. Exporting the name of a "::=" type automatically exports all its constructors. If a script does not contain an export directive, then the default is that all the names (and typenames) it defines will be exported (but not those which it acquired by %include statements). It is also permitted to write a _p_a_r_a_m_e_t_r_i_s_e_d script, in which certain names and/or typenames are declared as "free". An example is that we might wish to write a package for doing matrix algebra without knowing what the type of the matrix elements are going to be. A header for such a package could look like this: %free { element :: type zero, unit :: element mult, add, subtract, divide :: element->element->element } %export matmult determinant eigenvalues eigenvectors ... || here would follow definitions of matmult, determinant, || eigenvalues, etc. in terms of the free identifiers zero, || unit, mult, add, subtract, divide In the using script, the corresponding %include statement must give a set of bindings for the free variables of the included script. For example here is an instantiation of the matrix package sketched above, with real numbers as the chosen element type: %include "matrix_pack" { element == num; zero = 0; unit = 1 mult = *; add = +; subtract = -; divide = / } The three directives %include, %export and %free provide the Miranda programmer with a flexible and type secure mechanism for structuring larger pieces of software from libraries of smaller components. Separate compilation is administered without user intervention. Each file containing a Miranda script is shadowed by an object code file created by the system and object code files are automatically recreated and relinked if they become out of date with respect to any relevant source. (This behaviour is similar to that achieved by the UNIX program "make", except that here the user is not required to write a makefile - the necessary dependency information is inferred from the %include directives in the Miranda source.) _C_u_r_r_e_n_t_ _i_m_p_l_e_m_e_n_t_a_t_i_o_n_ _s_t_a_t_u_s An implementation of Miranda is available for a range of UNIX machines including SUN-4/Sparc, DEC Alpha, MIPS, Apollo, Sequent Symmetry, Sequent Balance, Silicon Graphics, IBM RS/6000, HP9000, PC/Linux. This is an interpretive implementation which works by compiling Miranda scripts to an intermediate code based on combinators. It is currently running at 550 sites (as of August 1996). Licensing information can be obtained from the world wide web at http://miranda.org.uk REFERENCES Milner, R. "A Theory of Type Polymorphism in Programming" Journal of Computer and System Sciences, vol 17, 1978. Thompson, S.J. "Laws in Miranda" Proceedings 4th ACM International Conference on LISP and Functional Programming, Boston Mass, August 1986. Turner, D.A. "Miranda: A non-strict functional language with polymorphic types" Proceedings IFIP International Conference on Functional Programming Languages and Computer Architecture, Nancy France, September 1985 (Springer Lecture Notes in Computer Science, vol 201). [Note - this overview of Miranda first appeared in SIGPLAN Notices, December 1986. It has here been revised slightly to bring it up to date.]