A Quick Tour of Haskell Syntax

If you’re a programmer who wants to parse Haskell for a blog post or wants a cursory overview of the language, this post is for you. It’s going to be somewhat longer than the learnxinyminutes style, as it will go a little more in depth. Without further ado, let’s get started.

Starting with the typical basic types:

-- inline comment
{- block comment -}
7 -- numbers
True -- booleans
'p' -- characters
"The slow brown fox." -- strings
[1, 2, 3] -- lists

Strings are actually just a list of characters. "Hello" is the same as ['H','e','l','l','o']. Moving on to basic functions.

7 + 12 -- 19
3 * 11 -- 33
6 - 2 -- 4
11 / 8 -- 1.375

-- Integer division
11 `div` 8 -- 1

not True -- False
10 == 9 -- False
10 /= 9 -- True
True && False -- False
False || True -- True
4 > 2 -- True
18 <= 13 -- False

Couple of things to notice. not is a function: it takes a boolean value, and negates it. Functions in Haskell do not require parentheses. The general syntax is func arg1 arg2 arg3 .... However, parentheses may be required in some expressions: not (3 < 5), evaluates to False, but not 3 < 5 is invalid, as Haskell will interpret the expression as (not 3) < 5, erroring out when it tries to boolean negate a number.

Also, 11 `div` 8 is the same thing as div 11 8. Backticks around any function call will allow you to use it in infix style, as opposed to prefix. It’s just syntactic sugar for some functions that read better in infix style.

Here are some functions on lists:

1 : [2, 3] -- [1, 2, 3]
'C' : "at" -- "Cat" (Remember strings are lists of characters)
True : [] -- [True]
1 : 2 : 3 : [] == [1, 2, 3] -- True

elem 3 [1, 4, 73, 12] -- False
'H' `elem` "Highgarden" -- True 
[1, 2] ++ [3, 4] -- [1, 2, 3, 4]
"Hello " ++ "World" -- "Hello World" 
head [1, 2, 3] -- 1
tail [1, 2, 3] -- [2, 3]
last [1, 2, 3] -- 3
init [1, 2, 3] -- [1, 2]
"Casterly Rock" !! 3 -- 't'

null [] -- True
null [1, 2, 3] -- False
length "Dorne" -- 5

(1, 2)
("Hello", True, 2)
fst (6, "Six") -- 6
snd (6, "Six") -- "Six" 

Notice how 1 : 2 : 3 : [] is the same as [1, 2, 3]. Lists are just elements : together, with an empty list ([]) at the end, similar to Lisp. (elem1, elem2, elem3, ...) are called tuples. A tuple can contain different types, whereas a list may only contain the same type for all its elements.

Let’s define our own function.

id :: a -> a
id x = x

This function takes a value, and returns the value untouched.

The top line is the type declaration. The name of the function (id), followed by ::, and finally the type signature. It says, “Take a value of type a, which can be anything, and return another value of the same type a”.

The bottom line is the computation. It takes a parameter and returns the parameter untouched. This might be a little confusing so let’s look at another example.

const :: a -> b -> a
const x y = x

This function takes two values, and returns the first one, ignoring the second.

The type declaration says, “Take a value of any type a, take another value of any type b (can be the same or different type as a), and return a value of type a”. The type after the final -> is the return type, and all the types before it are parameter types, also delimited by ->s.

The function takes in two variables, x and y, and just returns x untouched, similar to id above. These functions have been extremely simple, so let’s look at a couple of slightly more challenging functions.

concatenate3 :: String -> String -> String -> String
concatenate3 x y z = x ++ y ++ z

allEqual :: (Eq a) => a -> a -> a
allEqual x y z = x == y && y == z

In concatenate3, the type signature says it takes 3 Strings and returns a String. Notice how String is a specific type, whereas a and b were general. You can mix and match specific and general types in type signatures. The function just concatenates the 3 Strings using ++. Again, we do not need parentheses, as Haskell will interpret the statement as (x ++ y) ++ z, which is valid.

In allEqual, we see: (Eq a) =>. For now, just think of this as restricting the domain of types a to be types that can be compared for equality. Multiple restrictions are also allowed, i.e. (Eq a, Ord a, Num b, ...) =>. The function just compares the parameters to check if they are all equal.

Haskell makes creating anonymous functions simple.

(\x -> x + x)
(\x y -> x + y) 
(\x -> 10 + x) 5 -- 15

Anonymous functions are wrapped by parentheses. They lead with a \, then the arguments delimited by spaces, then a ->, and finally the computation.

In Haskell, functions are first-class, meaning functions can be passed in as arguments to other functions, returned from functions, assigned from variables, and held in data structures, such as lists.

applyAndConcat :: (String -> String) -> (String -> String) -> String -> String
applyAndConcat f g x = (f x) ++ (g x)

applyAndConcat tail init "Rickon" -- "ickonRicko"
-- (tail "Rickon") ++ (init "Rickon")
-- "ickon" ++ "Ricko"
-- "ickonRicko"

applyAndConcat takes in two functions, each of which take a String and return a String, takes in a String, and returns a String. Functions in type signatures are wrapped by parentheses: (String -> String). applyAndConcat applies both functions to the same input String, and returns the concatentation of the results.

(.) :: (b -> c) -> (a -> b) -> (a -> c)
(.) f g = (\x -> f (g x)) 
-- alternatively
f . g = (\x -> f (g x))

The function . is called function composition, similar to the definition you see in math (“fog” notation). It returns a function that takes in an input, passes it to the function g, and then pipes the result of g into f. Looking at the type signature, it makes sense. Given a value of type a, you pass it into function g, which returns a value of type b, and the value of type b gets passed into the function f, which returns a value of type c. Looking at it like it’s a black box, the type of the function composition is (a -> c).

($) :: (a -> b) -> a -> b
f $ x = f x

You often see $ in Haskell code. Looking at the definition, it seems as if $ does nothing. For the most part, $ is used not for computation, but for reducing the amount of parentheses needed. not $ 3 < 5 is the exact same as not (3 < 5). It’s just a matter of style. Also note that $ is not a special operator - it’s just a simple function, just like everything else.

add x y = x + y

addToThree = add 3
addToThree 6 -- 9
-- add 3 6

multiplyThreeNums x y z = x * y * z

times32 = multiplyThreeNums 8 4
times32 10 -- 320
-- multiplyThreeNums 8 4 10

twoNumsTimes10 = multiplyThreeNums 10
twoNumsTimes10 6 2 -- 120
-- multiplyThreeNums 10 6 2

In Haskell, you can partially apply a function. That is, given a function that takes n arguments, you can partially apply k arguments (where k < n), and you’ll end up with a function that takes n-k arguments. Concretely, in the example, we see add, which takes two arguments, and adds them together. addToThree is a partially applied add, where the first argument of add is predefined as 3. Since the first argument is already defined, addToThree only requires one more argument. Same idea for multiplyThreeNums, where you can partially apply 1 or 2 arguments.

Note that I didn’t add a type signature to any of the functions. In Haskell, if you don’t add a type signature, the compiler will automatically derive the type, allowing you the choice of whether or not to document the function.

addFour w x y z = 
  let a = w + x
      b = y + a
  in z + b

addFour w x y z =
   z + b
   a = w + x
   b = y + a

let ... in ... and ... where ... let you build up computation and store them in local variables. Picking between let and where is a matter of preference - they achieve the same thing.

fib 0 = 1
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

Here we see the typical fibonacci sequence using recursion. It’s no different compared to most other languages; the function calls itself. The fib 0 = 1 and fib 1 = 1 are interesting though. This is a case of “pattern matching”. Haskell goes down the list and tries to find a matching definition. It first checks if n is 0, and if so, returns the value associated with it (fib 0 = 1). If n is not 0, then it goes down the list, and checks if n is 1, and returns the associated value if so (fib 1 = 1). Otherwise, it goes to the catch-all definition, and returns that value (fib n = fib (n - 1) + fib (n - 2)). Pattern matching is extremely useful in Haskell, and you’re bound to see it crop up in many places.

fib n
  | n < 2 = 1
  | otherwise = fib (n - 1) + fib (n - 2)

fib n = 
  case n of
    0 -> 1
    1 -> 1
    _ -> fib (n - 1) + fib (n - 2)

fib n =
  if n < 2
   then 1
   else fib (n - 1) + fib (n - 2)

These are just variations of the same fib function, using different syntax. The first one is called “guard expressions”, and uses | followed by different cases and their values. otherwise is just a catch-all value (in fact, otherwise is the exact same as True). Note that there is no = before a guard expression. case ... of ... is also pattern matching, except that you can also use it in the middle of a computation. if ... then ... else ... is just a regular if statement. Note that every if must have a corresponding else in Haskell.

Let’s define some bread-and-butter Haskell functions.

map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = (f x) : (map f xs)

addToThree x = x + 3

map addToThree [1, 8, 17] -- [4, 11, 20]
-- (addToThree 1) : (map addToThree [4, 5])
-- 4 : (addToThree 8) : (map addToThree [5])
-- 4 : 11 : (addToThree 5) : (map addToThree [])
-- 4 : 11 : 20 : []
-- 4 : 11 : [20]
-- 4 : [11, 20]
-- [4, 11, 20]

map takes a function and a list, and applies the function to every element of the list. _ is just ignoring the value of an argument. If we’ve reached the end of the list (the empty list []), there’s no need to bother with the value of the function. _ is not strictly necessary, but it adds to the readability of the pattern. (x:xs) is the pattern matching notation of separating the head of the list (x) and the rest of the list (xs). So we see that in the case of the empty list, when there are no more elements to apply the function to, we return the empty list. Otherwise, we apply the function to the head of the list, and : it onto the result of mapping the tail of the list.

filter :: (a -> Bool) -> [a] -> [a]
filter _ [] = []
filter valid (x:xs) 
  | valid x = x : filter valid xs
  | otherwise = filter valid xs

gtThree x = x > 3

filter gtThree [10, 2, 5] -- [10, 5]

filter is similar to map, except that it takes a function that converts an a into a Bool, and keeps a value in the list if valid x returns True. Otherwise it skips over the element.

There are more list functions, including the big daddy foldr, but researching and implementing them is left as an exercise for the reader ™. Let’s move onto types. There are three ways to create new types.

type Email = String
type Name = String
type BigFunction = (c -> d) -> (b -> c) -> (a -> b) -> (a -> d)

validForms :: Name -> Email -> Bool
-- validForms :: String -> String -> Bool
crazyFunction :: BigFunction -> BigFunction -> Bool
-- crazyFunction :: ((c -> d) -> (b -> c) -> (a -> b) -> (a -> d)) ->
--                  ((c -> d) -> (b -> c) -> (a -> b) -> (a -> d)) ->
--                  Bool

This is called “type synonyms”. type followed by the name of your type (must begin with a capital letter), followed by =, and ended by an already defined type (such as String or a function). Type synonyms do nothing other than provide extra documentation for the type signature, and reduce the amount of typing needed if you have a long type. The type synonyms can be used interchangeably with the types the represent. If you think that’s useless, just remember type String = [Char].

data Bool = True | False
data Maybe a = Nothing | Just a

not :: Bool -> Bool
not True  = False
not False = True

fromMaybe :: a -> Maybe a -> a
fromMaybe x Nothing  = x
fromMaybe _ (Just y) = y

Here we see Algebraic Data Types - we’re creating brand new types. Everything after the data and before the = is going to be part of the type signature (func :: type1 -> type2 -> ... -> returntype). The parts after the = are called value constructors, and will be part of the function computation. The | works like an “or”: a Bool can be either a True or a False. Bool is always part of the type signature, whereas during the computation, you work with True and False. Notice how you can both pattern match on the value constructors and use them during the computation (not True = False).

For Maybe, Maybe String, Maybe Int, and Maybe a are all valid, but just Maybe is invalid. The a after the Maybe type in data Maybe a = ... is called a type parameter. Type parameters allow types to be polymorphic (i.e. Maybe String and Maybe Int are both supported). Any values after the value constructor name (a in Just a) are fields. The value constructors do not need to have the same amount of fields under it. Nothing has zero fields, while Just has one field of type a. Again, you can pattern match on value constructors and their fields.

data Tree a = Empty | Node (Tree a) a (Tree a)

treeMap :: (a -> b) -> Tree a -> Tree b
treeMap _ Empty = Empty
treeMap f (Node l v r) = Node (treeMap f l) (f v) (treeMap f r)

data Tree a = Empty | Node { left :: Tree a, value :: a, right :: Tree a}

-- left  :: Tree a -> Tree a
-- value :: Tree a -> a
-- right :: Tree a -> Tree a 

Algebraic Data Types can be self referencing. Node has three fields, two of which are Tree as. The trees will be built up with Nodes, and Emptys will be at the end of every tree. In treeMap, we check if we’re at the end of a branch (Empty) and return an Empty if so. Otherwise we pattern match on the Node to separate the subtrees and the value, apply the function on the value, apply the function recursively on the subtrees, and join them together with a Node.

The second definition of Tree is the same, except “record syntax” is used. Record syntax is an easy way to access fields. Instead of pattern matching to get the left subtree, you can just call the automatically generated function left on a Node: left tree = leftSubtree. Record syntax is a great convenience that makes it easier to work with value constructors.

newtype State s a = State { runState :: s -> (s, a) }

newtype is used to generate a thin wrapper over another type. It can only have one value contructor, which can only have one field. There are other subtleties, but I’ll skip over them for the sake of brevity newtype is extremely useful when dealing with typeclasses. What is a typeclass?

It’s similar to an interface: a set of behaviors that a type must follow. For example, if a type is part of the Eq typeclass, which deals with equality, it must implement either == or /=:

data Rectangle = Rect Int Int

instance Eq Rectangle where
  (Rect w1 h1) == (Rect w2 h2) = (w1 == w2) && (h1 == h2)

Here we define a new data type called Rect, and make Rect an instance of Eq by comparing the two fields of different Rects. This is quite a simplification, and typeclasses can get a little complex, but the gist is: typeclasses define a set of rules that other types must follow to be an instance of the typeclass. We can use typeclasses in type signatures if specific behaviors are needed. func :: (Eq a) => a -> b. Here, the type for a must be an instance of Eq for the program to compile.

That’s it for basic Haskell. This tutorial probably went a bit fast, so if you don’t quite understand something, you might have to look it up. Good resources include Learn You a Haskell, Real World Haskell, and Hoogle. Hoogle allows you to search using either the function name (map) or its type signature ((a -> b) -> [a] -> [b]). Hopefully now you can skim some Haskell code and understand it on a rudimentary level.

Thanks to /u/dave4420, /u/ingolemo, and /u/azeeem for correcting errors!

Reddit discussion

comments powered by Disqus