2010-06-05

Haskell From Scratch - 2. Starting Somewhere

I said that I need to figure out where to start and I was a bit annoyed by the piecemeal fashion in which the tutorial I was reading was presenting it to me. However, it was a good primer. Having read the first page or two I at least have some idea of the important syntax points in Haskell, which is useful coming from a Perl background because I know what's different. I used Learn You A Haskell. I'll assume you, the reader of this, will have read a page or two of that, but I don't think it's that necessary because you should be able to keep up as we go along.

OK, Go

So this game is going to be entirely based on the call-and-response idea of the old text adventure games. You type something, and it says something back. If you type the right thing you progress and score points; if you type the wrong thing you die or something; and if you type gobbledegook (or perfectly valid English that it wasn't expecting) it tells you that it doesn't understand.

Anyway it should be obvious from this somewhat trite description that the first thing we are going to need to learn to do is to read in stuff and print out stuff.

Haskell's read-in function is called getLine and its print-out function is, remarkably, putStrLn.

Examine the following code, courtesy of tchakkazulu.


main :: IO ()
main = do
  name <- prompt "Hello. Who are you?"
  putStrLn $ "Hello, " ++ name ++ ". Nice to meet you."

prompt :: String -> IO String
prompt line = do
  putStrLn line
  getLine

You should be familiar enough with Haskell just from the first bit of LYAH that you recognise how functions are created and called. That said, the basics of how functions are created and called are not all that abundant in this example. That's because IO is, apparently, the sin bin of Haskell, in which all the impure code gets put. So we've embarked on a journey, that's for sure.

In this example we've created the prompt function. Using a function instead of doing this process in main itself makes more sense because we are likely to do this a fair amount in this program.

The prompt function is defined as taking a String and returning an IO String. Since IO is the sin bin of Haskell it should suffice to say that an IO String is a thing that returns a String when you ask for one.

So the prompt function takes a String, prints it, and then reads a string in. See that we define a function in Haskell by first its name, then whitespace, then what in other languages would be the formal parameter list, whitespace-separated, then =, then the body of the function.

We will talk about what do does later.

Whitespace

Whitespace in Haskell is the comma in any C-like language, or C-inspired language, you care to mention. It separates arguments, or parameters, to functions. It also separates the function name from its list.

f a b = f(a, b)

The rationale is that your most common operator is the shortest, and applying data to functions is the raison d'etre of Haskell.

The other use of whitespace is to line up the blocks. You can use the old brace-and-semicolon syntax if you prefer, in which case Haskell will not whine about whitespace, but if you are going to use whitespace you must line your stuff up. That's because if it's not lined up Haskell isn't prepared to start making guesses about what you meant.

$

The $ is also known as 'apply' and has the effect of applying the things after it to the thing before it.

The $ is an infix function, which means the thing that appears to the left of it is the first argument, and the thing on the right is its second. This is denoted by putting the function name in parentheses when you define it:

($) :: ( a -> b ) -> a -> b

This is the type signature of the function. There is a section on type signatures below, and we'll discuss it there.

You can also use the same syntax in order to use the function as a prefix function:

f $ g = ($) f g

This is sometimes useful, especially when you are experimenting and new to it and don't want to be confused by both the syntax of Haskell and whatever it is you are trying to figure out in the first place.

$ is basically a separator, and changes the order in which the statement is "parsed".

(It doesn't actually affect the parsing so much as it affects what you have actually written in the first place. The parentheses analogy should clear it up.)

Mathematically speaking, you can say:

f g h = f(g, h)

f $ g h = f(g(h))

More complexly:

f g h i j = f(g, h, i, j)

f $ g $ h i $ j = f(g(h, i(j)))

Lined up:

f $ g $ h i $ j

f ( g ( h,i ( j )))

With any luck you can begin to see why it is called 'apply'. It has the effect of treating the first thing on its right as a function to apply as the argument to the function on the left.

Remember that because Haskell is function-oriented, g is a function. The $ is deciding whether the function is being passed as a paramter to f along with h, or whether we are running g with h as a parameter, and passing the result of g as the only parameter to f.

The practical difference is easily expressed in terms of a language that uses parentheses the same way mathematical notation does: It is the difference between


print("Hello, ") . $name . ". Nice to meet you.";

and


print("Hello, " . $name . ". Nice to meet you.");

You can think of the $ as introducing a new set of parentheses, nested in any previous set, that closes at the end of the line. In many cases, the exact outcome of $ will depend on the type signatures of the functions involved, but we get an idea here. Later we will explore type signatures.

If you think of how f $ g h would work, consider how $ is defined. f will be its first argument and g h its second. Thus it fulfils its purpose, which is to separate g h from f; g h is forced to be seen as a function g with the argument h instead of two arguments, g and h to the function f. Later we will explore the way we define functions in Haskell, and delve deeper into the meaning of the type signature above.

It is possible to use parentheses when calling functions in Haskell. Consider that you wanted to express f(g(h), i(j)) in Haskell.

You couldn't use the $ notation to separate g(h) and i(j) because the $ is shorthand for starting parentheses that close at the end of the line. So we can't use $ to wrap them around h

You'd have to use parentheses:

f (g h) (i x) or f (g h) $ i x

Because the $ has the effect of nesting our analagous parenthetical sections, this is the way we have for concatenating them instead.

In fact parentheses instead of $ is valid in Haskell, but you will usually find $ used instead because $ is an actual function.

Type signatures

Let's look at the type signature of prompt. It says that prompt takes a String and returns an IO String.

prompt :: String -> IO String

The double-colon notation is used to specify that we are giving the type signature of prompt, rather than its definition. This is familiar to anyone familiar with a formal OO language, where function signatures are defined first, then their implementations later.

A list of types then follows. Types start with capital letters. This type list maps to the parameter list; the last type is the return type of the function. That means that a function's type is the same as the last type in the list.

(This is not strictly true, however. Later we will see how we can interpret the type list in different ways, and apply the new knowledge to the discussion on $ above.)

In C++ you might write:


int main(int, char**)

This tells us that main returns an integer and accepts an integer and a pointer to a char* (a string array). The arguments are separated by commas (in this case one comma because there are only two arguments), and the return type goes before the name of the function. In this simple snippet we start to see the real difference between C-like languages and function-oriented ones. In C++, this function signature is basically telling us that this function can be treated exactly as though it were an integer, provided we give it another integer and a string array.

In Haskell it is slightly different. Because the function is the fundamental unit, not the object or variable or whatever, the last type in the list is the return type iff we provide values for all the others. If you don't, you end up with a closure. We'll talk about this more under partial parameterisation.

I have not named the parameters here: this signature is only suitable for a function's declaration. In Haskell, this is always how you do it. In C++ you will name these parameters when you define the function; this concept holds for Haskell.

IO Strings are types that interact with the outside world. That is what IO usually means. An IO String is a type that will return a String when it is probed for one.

You can tell the type of a procedure or variable with the :t construct when you are running ghci.


ghci> :t getLine
getLine :: IO String

In C++, the function's parameters and their types both form the signature of the function, and names are given to the variables themselves when you repeat the whole signature when you write the function definition of the function we declared above.


int main(int argc, char** argv) { return 0; }

Here we have the definition of the function, and so the int and char** are given names. These names are necessary if you actually want to use the parameters.

In Haskell, the names of the parameters are given after the function name and before its definition.


prompt :: String -> IO String
prompt line = do
  putStrLn line
  getLine

You can see that these in tandem tell Haskell that line is a String. The compiler also knows that the do block must return an IO String, which is the type of getLine.

The do-block in our definition is basically cheating and turns the function body into a list of things to do, basically like a plain old procedural language. This is pretty much what you need to do when IO is around. The do-block returns the last thing in it, like in Perl. So we can see that the last thing in the prompt function is getLine, which, being in a do-block, is the return value of the function. :t getLine tells us it is an IO String, which is what the function returns, so Haskell is happy.


getLine :: IO String

Type variables

The final thing you should know about Haskell is that the types can be variables too. The type signature of the apply function uses letters to define the types. Where the same letter appears, it is the same type.

The type of $, as we saw above, uses lowercase letters to define the types. These simply mean that any types can be used. Type a can be the same as type b but it is not required: however, it is the case that all instances of a will be the same type, and this type can be fixed as soon as the function is used in context - i.e. with another function. Just like algebra!

This allows functions to be genericised across all types. A similar thing is available to C++: templating. In C++ the template type has the same basic rules as these here in Haskell: that to be a valid type, they must support certain operations. In the apply function, there are no restrictions, and so any types can be used.

This is often very useful when you know that your function is going to be completely generic.

Functions as parameters

Haskell is function-oriented, as we keep saying. That means that everything is a function, so everything you're doing here is defining what functions you can pass around.

It may have struck you that it would be useful to be able to specify that an input function have, itself, a specific signature.

You can do that with parentheses, like this:

stringFromInt :: ( Int -> String ) -> Int -> String

This function will return a String, given an Int and a function. The function it is given must have the type Int -> String and therefore can be considered a mapping function of sorts.

map :: Int -> String

Then we can say stringFromInt map 1 and the stringFromInt function will take the map function and the number 1 and presumably apply one to the other and return the String.

This is a stupid example you will never use but it gets across the point that a function's signature can be specified in the type signature of the function that is accepting it simply by using parentheses.

Partial Parameterisation

Let's revisit the concept of the last item in the type signature being the return type of the function.

Let's say you define a function called add


add :: Int -> Int -> Int
add x y = x + y

As with any function in Haskell, its use can always be replaced with the actual contents of the function.

Your function is a first-class citizen because Haskell is function-oriented. That means that it is itself a variable that can be passed around. In fact, when you call a function in Haskell, it basically replaces the function call with the body of the function. Therefore, you can think of any situation where add is seen as simply x + y.

But of course, the x and the y need to be given values. We should amend what we just said to saying that the use of a function can always be replaced with the parameterised contents of the function. add 1 2 is exactly equivalent to 1 + 2 - which is exactly what the function definition says! add 1 2 = 1 + 2.

In Haskell, the difference is entirely up to the compiler, and so any real difference is an optimisation thing rather than a grammatical thing. For the writer of Haskell code, these two things are completely equivalent.

What happens when we see add 1?

If you're wondering why you would see add with only one parameter, consider that in Haskell, it being function-oriented, it is sensible to define functions in terms of other functions, just like we define objects in terms of other objects in OO languages. Therefore you might define something like this:


addOne :: Int -> Int
addOne x = add 1 x

In other languages you may refer to this concept as a closure. It is a copy of the function where some or all of the variables are given values: all that remains is for the function to actually be run, in this case by being given the remaining parameter.

In this situation the second parameter to add is still given in code, but it is still a variable. But the replacement of add with its contents is still applicable: this function is indistinguishable from 1 + x.

This is an example of partial parameterisation. The add function is in this case replacable with a version of the function with one of its parameters specified.

To follow the logic, note that addOne 1 is completely indistinguishable from add 1 1. This is the mathematician's friend: it's a problem that's already been solved. Since we know that add 1 1 = 1 + 1 and now we know that addOne 1 = add 1 1 we can safely say that addOne 1 = 1 + 1. We have used commutative logic to show this, but Haskell knows it too. As soon as we provide a value for the parameter to addOne, we also provide it to the partially parameterised add, and hence we collapse the whole stack into 1 + 1.

All this helps us to understand the difference it makes to the type signature of the $ function. Briefly, it means that any number of the items at the end of the type signature can be grouped together and considered to be a function of that type signature

Let's look again at the add function.


add :: Int -> Int -> Int
add x y = x + y

The function signature, from what we know so far, says that it accepts two ints and returns a single int. However, what we have just done has shown us that it means another thing! It also means that it accepts one int, and returns a function that takes one int and returns an int.

That means that these two function signatures are equivalent.


add :: Int -> Int -> Int
add :: Int -> ( Int -> Int )

The parentheses define a single return value, and in this case the return value is a function whose signature is Int -> Int. This equivalence holds true for all functions and for all values at the end of the signature list. That is, you can replace any number of the types at the end of the type signature with a single function that has that as its type signature.

If you replace all three, you simply get back the add function in the first place. If you replace two, you get the partial parameterised version. If you replace just one, it doesn't really make sense, but you still get the original add function again.

Now we see why there is no distinction between the return type and the parameter types. Any of the types in the list can be considered the first parameter of a function whose type signature comprises the rest of the type list.

You can play with ghci to see this. If you ask for the type of a function that is partially parameterised you will find that the answer is the rest of the parameter list:


*Main> :t add 1
add 1 :: Int -> Int

You can load the add function by putting it in add.hs; then invoke ghci from the same directory and type :l add

I don't know whether there's a name for this equivalence, but it is interesting, and you should remember that it holds, even if you don't remember why it holds. (Although I do find that remembering why it holds helps to remember that it holds in the first place).

Now let's put the apply function in the light of this new-found knowledge.

Recall the signature of the apply function:

($) :: ( a -> b ) -> a -> b

We have discussed enough now to understand it. First we know that the letters are type variables, meaning that they can hold any type. We also know that the first parameter is a function of signature (a -> b). Third, we know that the function is infix because it is defined with parentheses around its name.

These three things tell us that whenever we see a $, the thing on its left is the first argument and the thing on the right is its second. The thing on its left must be a function because this is Haskell, and its signature defines both a and b in the type signature of $.

The unnamed equivalence principle described above means that the type b (the return type of the first argument to $) can, itself, be a whole function with its own signature: the variable b can encompass this whole type signature and so we can ignore its details and concentrate on just 'b'.

Let's take an example.

f $ g h

If we define the signature of f, we can define the types in the signature of $.

f :: String -> IO

With this definition we can say that

($) :: (String -> IO) -> String -> IO

for this particular use of $

Since the second argument is g h, that means the type of g h must be String; the IO returned will be the IO returned by f.

In order not to be bothered by what type h is, we have to allow for h to be any type. That means we can define the type of g as:

g :: c -> String.

If we then wanted to go back, we could define f and g in terms of $:

($) :: ( a -> b ) -> a -> b

f :: a -> b

g :: c -> a

The astute among us will have noticed that I chose the signature of f anything but arbitrarily. In fact f's signature exactly matches that of putStrLn and g is a loose analogy for (++).

Since (++) takes two arguments, we couldn't make a perfect analogy.

But note that

(++) :: [a] -> [a] -> [a]

and since we know that

($) :: ( a -> b ) -> a -> b

and

putStrLn :: String -> IO

then

putStrLn $ "Hello " ++ name

is

($) :: ( String -> IO ) -> String -> IO

(++) :: [Char] -> [Char] -> [Char]

This is because Haskell knows how to convert between String and [Char]. Strings are just lists of Chars. We've not looked at lists yet, but it should be plain enough that String ++ String concatenates them.

So you can see that Haskell knows what to do when you give it parameters for its functions. Try it in ghci:


Prelude> :t ($) putStrLn
($) putStrLn :: String -> IO

You can easily see that because its first parameter is putStrLn, its second parameter must be a string, and it must return an IO. (I am not currently sure of the significance of the parentheses in this example so I have omitted them for now).

This sort of knowing-what-you-mean is part and parcel of Haskell and later I think we will be doing all sorts of crazy stuff with it. Let's leave it there for now, though.

Brevitometer TM

tl;dr

moar infos

Posts on this page

About Podcats

A blog about things that are interesting. Alastair McGowan-Douglas and his wife Dee often stumble upon things on the internet that other people should know about and write it down so that then they do.

Podcats blogs

Advertisement

See our privacy policy.

Wishlist

If you like my blog and want to help, I would very much appreciate something from my Amazon wishlist!

If you don't like my blog and want to help, I would very much appreciate feedback to al@podcats.in. In fact, if you want to leave feedback of any sort, send it along!