Author: Eiko

Tags: Haskell, Haskell Magic, eta reduction, function, newtype, data, recursion

Haskell Magic Introduction Lecture 3 - Basic Functions, newtype, data (continued)

As a ‘functional’ programming language, how can we not have functions? We will introduce more commonly used basic functions, as well as operations on basic functions!

Eta Reduction

addXY :: (Num a) => a -> a -> a
-- addXY x y :: (Num a) => a
addXY x y = (+) x y 

-- addXY x :: (Num a) => a -> a
addXY x = (+) x  
-- y can be removed, as functions of type (a -> a), addXY x and (+) x are equal

addXY = (+) -- a -> a -> a
-- well, by the same spirit, x can also be removed! as functions of type (a -> a -> a)
-- they are equal!

When we write the definition f x = g x, x can be eliminated, resulting in f = g. This process is called Eta reduction. Eta reduction can help us write more abstract and concise code. (In some cases, using eta reduction can also make the compiler notice more optimizations, improving code reuse and efficiency.)

Let’s look at more examples:

getFirsts :: [(a, b)] -> [a]
getFirsts list = map fst list

-- --> reduction: remove the 'list' at both ends
getFirsts :: [(a, b)] -> [a]
getFirsts = map fst
showLength :: [a] -> String
showLength list = "The length of the list is " ++ show (length list)

showLength :: [a] -> String
showLength = ("The length of the list is " ++) . show . length 

Lambda Expressions

Lambda abstraction can be seen as the reverse operation of Eta-reduction: Eta-expansion, which “adds” a variable to the expression after it, allowing you to use a nameless function where needed, usually as a parameter to another function:

(\x -> 2*x) 3 = 2*3 -- lambda expression with one variable

addXY = \x y -> x + y -- lambda expression with two variables

\x1 x2 x3 ... xn -> f x1 x2 x3 .. xn -- you can have arbitrary many variables

Lambda abstraction can also be seen as the inverse operation of eta-reduction:

getFirsts = map fst

-- eta expansion:
getFirsts = \list -> map fst list -- they are equal as functions
-- well, in that case, just write 
-- getFirsts list = map fst list

Main Event: Recursion (Important)

In Haskell, there is no such thing as loops. For functions, all loops can be implemented using recursion. In functional programming, recursion is as common as eating and walking.

Example: Guessing Game

import Text.Read (readMaybe)
import Data.Ord 

main :: IO ()
main = do
  putStrLn "Guess the number:"
  guessed <- getLine
  case (compare secretNumber) <$> (readMaybe guessed) of -- monad magic, please ignore for now
    Just LT -> putStrLn "Too large!" >> main    -- recursion happens!
    Just EQ -> putStrLn "You win!"              -- the program ends here, no more recursion
    Just GT -> putStrLn "Too small!" >> main    -- recursion happens!
    Nothing -> putStrLn "Wrong format!" >> main -- recursion happens!
    where secretNumber = 88

-- ignore the monad magics for now owo, we will explain them next class!
-- getLine :: IO String -- feed in user input
-- (>>) :: IO a -> IO b -> IO b -- sequencing events in a monad
-- do notation: a syntax sugar form of a monad binding operator (>>=)

Example: Quick Sort

Recursion must have a termination condition, unless you intend to write an infinite loop.

-- this function pattern matches on the input list
-- if the input is empty, it returns empty
-- if the input is non-empty, recursion happens and it calls itself!
quickSort :: (Ord a) => [a] -> [a]
quickSort [] = [] -- without this line, the recursion will loop forever (in fact, runtime error)
quickSort (x:xs) = quickSort [y | y <- xs, y < x] ++ [x] ++ quickSort [y | y <- xs, y >= x]

An example of an infinite loop:

repeat :: a -> [a]      -- this is a standard function
repeat x = x : repeat x -- infinite list of x's

Tail Recursion

Let’s look at a simple example, calculating 1+2+…+100

sum [1..100] :: Int -- you can use list, of course

What if we don’t use the sum function? Or how do we write our own sum function?

mySum :: (Num a) => [a] -> a
mySum [] = 0
mySum (x:xs) = x + mySum xs  -- this works

mySym [1,3,4] = mySum 1:[3,4]
    = 1 + mySum [3,4] -- use the definition, repeatedly
    = 1 + 3 + mySum [4]
    = 1 + 3 + 4 + mySum []
    = 1 + 3 + 4 + 0
    ```
    But...
```Haskell
GHCi, version 9.6.3: https://www.haskell.org/ghc/  :? for help
ghci> :{
ghci| mySum [] = 0
ghci| mySum (x:xs) = x + mySum xs
ghci| :}
ghci> mySum [1..10000000]
50000005000000
ghci> mySum [1..100000000]
*** Exception: stack overflow
ghci> sum [1..100000000]
5000000050000000
ghci> 

What happened? why is mySum different from sum? Because of lazy evaluation, the expression 1+2+3+4+…+100000000+0 is not evaluated immediately, causing a stack overflow.

-- the right way
mySum2 = sum2 0  -- eta reduction happens here (mySum2 list = sum2 0 list)
  where sum2 !acc [] = acc 
        sum2 !acc (x:xs) = sum2 (acc+x) xs 
        -- when sum2 traverses the list, it accumulates all the list elements in acc
        -- ! is the Bang pattern, it forces evaluation, avoids laziness in the variable acc
        -- without the (!), evaluating it on [1..100000000] still gives you a stack overflow
        -- functions and 'variables' defined in 'where' clause are local
        -- they will not affect other functions.
        -- advanced tip : ! will only force evaluation to weak head normal form (WHNF).
        -- if you need to fully evaluate a complex data structure, you need rdeepseq

Effect:

ghci> :{
ghci| mySum2 = sum2 0
ghci|   where sum2 !acc [] = acc
ghci|         sum2 !acc (x:xs) = sum2 (acc+x) xs
ghci| :}
ghci> mySum2 [1..100000000]
5000000050000000

This technique, where a recursion is called directly by the function body itself, is called tail recursion. (the ! is called bang pattern, it triggers eager evaluation, avoids laziness in this variable acc).

Although tail recursion is a type of recursion, it will be optimized by the compiler into a loop during compilation.

Example: Fibonacci numbers in different ways

Naive recursion

fib :: Int -> Integer
fib 1 = 1
fib 2 = 1
fib n = fib (n-1) + fib (n-2) -- the naive version
-- computing time is exponential and this wastes a lot of computation
-- fib 5 = fib 4 + fib 3
-- = (fib 3 + fib 2) + (fib 2 + fib 1) -- wastes the information about fib 3
-- = ((fib 2 + fib 1) + 1) + (1 + 1)
-- = ((1 + 1) + 1) + (1 + 1)
--    ^^^^^^^        ^^^^^^^  redundant computation! should only be computed once

The function fib calculates the Fibonacci number at a given index using naive recursion. The computation time is exponential and results in a lot of wasted computation. For example, when calculating fib 5, the function will compute fib 3 twice, leading to redundant computation.

Use Tail recursion

fib2 n = fibT 0 1 1
  where fibT previous current n = current
        fibT previous current m = fibT current (previous + current) (m+1)

This code uses tail recursion to calculate the Fibonacci sequence. It starts with the first two numbers, 0 and 1, and then uses a recursive function to calculate the next number by adding the previous two numbers. This process continues until the desired number in the sequence is reached.

Use Infinite List

fibList = 0 : 1 : zipWith (+) fibList (drop 1 fibList) -- infinite list!
-- 0 : 1 : (0 : 1   : a2  : ...)
--         (+   +      +       )
--         (1 : a2  : a3  : ...)
-- ||  ||  ||   ||       
-- 0 : 1 : a2 : a3  : ...

!! :: [a] -> Int -> a -- get the i-th term of a list, O(i) time since linked list
-- this is a standard function, but we kindly provide its definition for you 
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] 
zipWith operator (x:xs) (y:ys) = operator x y : zipWith xs ys
zipWith operator _ _ = []
-- zipWith f [1,2,3] [4,5,6] = [f 1 4, f 2 5, f 3 6]

Common Functions

List Operations (map, fold, concat, filter, zip, zipWith …)

Fold

map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs   -- recursion!

map (*2) [1..5] = [2,4,6,8,10]
-- foldl f z0 [x1,...,xn] = ((((z0 `f` x1) `f` x2) `f` x3) `f`) ... `f` xn
foldl :: (a -> b -> b) -> a -> [b] -> b
foldl f z0 [] = z0
foldl f z0 (x:xs) = foldl f (z0 `f` x) xs

-- foldl' is the strict version of foldl
foldl' :: (a -> b -> b) -> a -> [b] -> b
foldl' f !z0 [] = z0
foldl' f !z0 (x:xs) = foldl' f (z0 `f` x) xs
-- in fact, 
-- sum = foldl' (+) 0
-- product = foldl' (*) 1

-- foldr f z0 [x1,...,xn] = x1 `f` (x2 `f` (x3 `f` (... `f` (xn `f` z0))))
foldr :: (b -> a -> a) -> a -> [b] -> a
foldr f z0 [] = z0
foldr f z0 (x:xs) = foldl f (x `f` z0) xs
-- efficient if f is lazy at right argument

Exercise: what is this function? without computing the code, figure it out its type and what does it do?

foldr (:) []

This function takes a list and returns the same list.

Filter

filter :: (a -> Bool) -> [a] -> [a]
filter condition [] = []
filter condition (x:xs) = case condition x of
    True  -> x:filter condition xs
        False ->   filter condition xs

        filterNonZero = filter (/= 0) -- hard exercise: what is the type of this function?

The filter function takes in a condition and a list and returns a new list with only the elements that satisfy the condition. If the list is empty, it returns an empty list. Otherwise, it checks the condition for the first element and adds it to the new list if it satisfies the condition. Then, it recursively calls itself on the rest of the list. The filterNonZero function uses the filter function with the condition of not equal to 0, and its type is (Num a, Eq a) => [a] -> [a].

Concat

(++) :: [a] -> [a] -> [a]
[] ++ ys     = ys
(x:xs) ++ ys = x:(xs ++ ys)

concat :: [[a]] -> [a]
concat = foldr (++) [] -- you should not use foldl, because foldr behaves better with laziness
-- foldr is very efficient if the operator is lazy on its right argument

concat [[1],[2,3],[4,5,6],[]] = [1,2,3,4,5,6]

Exercise: Verify that filter can be defined as concat:

filter condition = concat . map (\x -> if condition x then [x] else [])

The Use of Newtype and an Introduction to Type Safety

Newtype is a type keyword similar to data, used to create new types. It can have any number of type variables, but unlike data, newtype can only have one data type. This essentially creates a copy of the data type, which is represented in memory exactly the same as the original type, but is differentiated by the compiler in the type system.

This gives newtype the following characteristics:

  1. Newtype is a zero-cost abstraction, meaning it does not add any additional performance overhead. After compilation, it behaves the same as the type it copies.

  2. Newtype can be used to prevent confusion between similar types, increasing readability and type safety.

getUserInput :: IO String 
getUserInput = getLine

filterUserInput :: String -> String
filterUserInput string = filterFunction string

useUserInput :: String -> IO ()
useUserInput str = (...)
-- bad design, what are these strings? you might accidentally mix them

getUserInput >>= useUserInput -- compiles without problem, but this is not what we want
-- ignore the monad and functor magics for now. 
-- If you are curious, think (>>=) :: IO a -> (a -> IO b) -> IO b
-- (In fact, (>>=) :: Monad m => m a -> (a -> m b) -> m b , here m = IO)
newtype UserInput = UserInput String
newtype FilteredString = FilteredString String

getUserInput :: IO UserInput
getUserInput = UserInput <$> getLine 

filterUserInput :: UserInput -> FilteredString -- much more clear what it does
filterUserInput (UserInput string) = FilteredString (filterFunction string)

useUserInput :: FilteredString -> IO ()  -- much more clear that you need to do filtering before use
useUserInput (FilteredString str) = (...)
-- this will prevent anyone from passing UserInput directly to useUserInput without filtering

getUserInput >>= useUserInput -- type error : UserInput is not FilteredString 
-- ignore the monad and functor magics (>>=, <$>) for now. 
-- If you are curious, (<$>) :: (a -> b) -> IO a -> IO b
-- in fact, (<$>) :: (Functor f) => (a -> b) -> f a -> f b , here f = IO

When your data only has one item, it is better to use newtype instead.

data Name = Name String -- why not use newtype

newtype Name = Name String

Usage of data Enum, and (coproduct)

Example: Shapes

type Radius = Double 
type Height = Double 
type Width = Double  -- types are just synonyms, they are identified by the compiler
                     -- i.e. Radius = Height = Width = Double as types
                     -- they do not provide extra type safety, but they provide clarity
                             
      --| Shape is the type constructor
data Shape = Circle Radius | Rectangle Width Height 
              --| Circle :: Radius -> Shape 
              --| these are data constructors
                              --| Rectangle :: Width -> Height -> Shape
                              --| these are data constructors

Circle 5 :: Shape

Rectangle 3 4 :: Shape

area :: Shape -> Double
area (Circle r) = pi * r**2
area (Rectangle x y) = x * y
-- enumerated data constructors can be easily pattern-matched

You can use type variables in newtype and data.

Example: Maybe a, Possibly Missing Information

Maybe is a standard type in Haskell that is used to safely express information that may not exist or computations that may fail.

data Maybe a = Just a | Nothing -- you don't have to define this, this is already defined
-- Just :: a -> Maybe a
-- Nothing :: Maybe a  -- requires no field

safeSqrt :: Double -> Maybe Double
safeSqrt x = if x >= 0 then Just (sqrt x) else Nothing
-- this function avoids runtime errors!

In general, you can use any number of type variables.

data InterestingData a b c d = InterestingData a | WhatIsThis b c | Nothing
-- d is a 'phantom' type variables here, they only exists at type level, 
-- no values corresponds to them. This is valid

newtype MyTaggedType a b = CreateMyTaggedType b
-- a is phantom variable. Remember newtype can only have one field on the constructor side,
-- not at the left side of type definition.

Example: List [a]

In fact, the list [a] is a data type.

data [a] = [] | a:[a]
[] :: [a] -- empty list
(:) :: a -> [a] -> [a] -- list constructor

-- 1:2:3:[] = 1:(2:(3:[])) = [1,2,3]

However, [] is already predefined, so the above code is not usable. We can use a custom List instead:

data List a = Nil | Cons a (List a)

Nil :: List a                 -- is the empty list
Cons :: a -> List a -> List a -- list constructor

Cons 1 (Cons 2 (Cons 3 Nil))  -- a list with three terms. Compare it with 1:2:3:[]

myLength :: List a -> Int
myLength Nil = 0
myLength (Cons _ xs) = 1 + myLength xs  
-- this code is acceptable, but for best performance you should write it in tail recursion

Use Hoogle to find your functions

(Demonstration in browser) hoogle.haskell.org

Summary

  1. Eta reduction can be used to simplify definitions, allowing for more abstract code.

  2. Recursion is the workhorse in functional programming.

  3. Tail recursion is a special form of recursion.

  4. Lambda expressions can create functions without giving them a name.

  5. Functions such as map, foldl, foldr, foldl’, foldr’, filter, and concat are commonly used.

  6. The keyword newtype can create distinct type copies, while type is used to create synonyms.

  7. The keyword data can create complex data types, allowing for a mix of sums and products.

Homework

  1. Exercise: try to eta-reduce the following functions, eliminate their input variable ```Haskell removeOdds :: [Int] -> [Int] removeOdds list = filter odd list

    doubleSum :: [Int] -> [Int] doubleSum list = sum $ map (*2) list

    apply :: (a -> b) -> a -> b apply f x = f x – you can eliminate them all, not just x ```

  2. Exercise: what is this function? without computing the code, figure it out its type and what does it do? Haskell foldr (:) []

  3. Exercise: Verify that filter can be defined as concat: Haskell filter condition = concat . map (\x -> if condition x then [x] else [])

  4. Prove that List a and [a] are isomorphic. That is, write a function to convert List a to [a] and a function from [a] to List a. Their composition should be id.

    toList :: [a] -> List a
    
    toHaskellList :: List a -> [a]
  5. Can you write a function to chain multiple possibly failing computations? For example, computing sqrt(1-sqrt(x)).

    sequenceMaybe :: (a -> Maybe b) -> (b -> Maybe c) -> a -> Maybe c
    sequenceMaybe f g x = case f x of
      Just y  -> g y
      Nothing -> Nothing
    
    safeSqrt :: Double -> Maybe Double
    safeSqrt x
      | x >= 0    = Just (sqrt x)
      | otherwise = Nothing
  6. Consider the following definition of a binary tree type: ```Haskell data Tree a = Nil | Tree a (Tree a) (Tree a) deriving Show

    – for example, you can have exampleTree = Tree 2 (Tree 3 Nil Nil) (Tree 1 Nil Nil) :: Tree Int – 2 – 3 1

    – Task: try to implement the mapTree function : – that maps every element inside the tree by using a given function. mapTree :: (a -> b) -> Tree a -> Tree b mapTree = undefined

    main = print $ mapTree (*2) exampleTree ```

  7. Use recursion to implement merge sort.

    mergeSort :: (Ord a) => [a] -> [a]
    mergeSort [] = []
    mergeSort [x] = [x]
    mergeSort list = merge (mergeSort part1) (mergeSort part2)
        where (part1, part2) = splitEvenly list
              merge (x:xs) (y:ys)
                | x <= y    = x:merge xs (y:ys)
                | otherwise = y:merge (x:xs) ys
              merge [] ys = ys
              merge xs [] = xs
              splitEvenly (x0:x1:xs) = (x0:rest1, x1:rest2)
                where (rest1, rest2) = splitEvenly xs
              splitEvenly [x] = ([x], [])
              splitEvenly [] = ([], [])