Author: Eiko

Tags: Haskell, Haskell Magic, eta reduction, function, newtype, data, recursion

Haskell 魔法入门 Lecture 3 - 基本函数, newtype, data (continued)

作为’函数式’编程,怎么能少了函数呢?我们将介绍更多基本常用的函数,以及对于基本的函数的操作!

Eta Reduction

addXY :: (Num a) => a -> a -> a
-- addXY x y :: (Num a) => a
addXY x y = (+) x y 

-- addXY x :: (Num a) => a -> a
addXY x = (+) x  
-- y can be removed, as functions of type (a -> a), addXY x and (+) x are equal

addXY = (+) -- a -> a -> a
-- well, by the same spirit, x can also be removed! as functions of type (a -> a -> a)
-- they are equal!

当我们写下定义f x = g x 时,x可以被消除,得到f = g. 这个过程就是Eta reduction. Eta reduction可以帮助我们写下更抽象和简洁的代码。(一些情况下,使用eta-reduction还可以使编译器注意到更多优化,提升代码复用和运行效率)

让我们看看更多例子:

getFirsts :: [(a, b)] -> [a]
getFirsts list = map fst list

-- --> reduction: remove the 'list' at both ends
getFirsts :: [(a, b)] -> [a]
getFirsts = map fst
showLength :: [a] -> String
showLength list = "The length of the list is " ++ show (length list)

showLength :: [a] -> String
showLength = ("The length of the list is " ++) . show . length 

Lambda Expressions (匿名函数)

Lambda abstraction可以视为Eta-reduction的反向操作:Eta-expansion,它会为它后面的表达式“增加”一个变量,这使得你可以在需要的地方使用一个没有名字的函数,通常是作为参数传入别的函数:

(\x -> 2*x) 3 = 2*3 -- lambda expression with one variable

addXY = \x y -> x + y -- lambda expression with two variables

\x1 x2 x3 ... xn -> f x1 x2 x3 .. xn -- you can have arbitrary many variables

Lambda abstraction也可以视为eta-reduction的逆操作:

getFirsts = map fst

-- eta expansion:
getFirsts = \list -> map fst list -- they are equal as functions
-- well, in that case, just write 
-- getFirsts list = map fst list

重头戏:Recursion (重要)

在Haskell中,并没有循环这种东西。对于函数来讲,所有的循环都可以使用递归来实现。在函数式编程中,递归就和吃饭走路一样常见。

Example: 猜数字游戏

import Text.Read (readMaybe)
import Data.Ord 

main :: IO ()
main = do
  putStrLn "Guess the number:"
  guessed <- getLine
  case (compare secretNumber) <$> (readMaybe guessed) of -- monad magic, please ignore for now
    Just LT -> putStrLn "Too large!" >> main    -- recursion happens!
    Just EQ -> putStrLn "You win!"              -- the program ends here, no more recursion
    Just GT -> putStrLn "Too small!" >> main    -- recursion happens!
    Nothing -> putStrLn "Wrong format!" >> main -- recursion happens!
    where secretNumber = 88

-- ignore the monad magics for now owo, we will explain them next class!
-- getLine :: IO String -- feed in user input
-- (>>) :: IO a -> IO b -> IO b -- sequencing events in a monad
-- do notation: a syntax sugar form of a monad binding operator (>>=)

Example: 快速排序

递归一定要有终止条件,除非你本来就打算写无限循环

-- this function pattern matches on the input list
-- if the input is empty, it returns empty
-- if the input is non-empty, recursion happens and it calls itself!
quickSort :: (Ord a) => [a] -> [a]
quickSort [] = [] -- without this line, the recursion will loop forever (in fact, runtime error)
quickSort (x:xs) = quickSort [y | y <- xs, y < x] ++ [x] ++ quickSort [y | y <- xs, y >= x]

无限循环的一个例子:

repeat :: a -> [a]      -- this is a standard function
repeat x = x : repeat x -- infinite list of x's

Tail Recursion

我们来看一个简单的例子,计算1+2+…+100

sum [1..100] :: Int -- you can use list, of course

如果我们不使用sum函数呢?或者如何写出我们自己的sum函数?

mySum :: (Num a) => [a] -> a
mySum [] = 0
mySum (x:xs) = x + mySum xs  -- this works

mySym [1,3,4] = mySum 1:[3,4]
    = 1 + mySum [3,4] -- use the definition, repeatedly
    = 1 + 3 + mySum [4]
    = 1 + 3 + 4 + mySum []
    = 1 + 3 + 4 + 0

但是…

GHCi, version 9.6.3: https://www.haskell.org/ghc/  :? for help
ghci> :{
ghci| mySum [] = 0
ghci| mySum (x:xs) = x + mySum xs
ghci| :}
ghci> mySum [1..10000000]
50000005000000
ghci> mySum [1..100000000]
*** Exception: stack overflow
ghci> sum [1..100000000]
5000000050000000
ghci> 

What happened? why is mySum different from sum? 因为惰性计算,表达式 1+2+3+4+…+100000000+0 并不会立即求值,造成stack overflow.

-- the right way
mySum2 = sum2 0  -- eta reduction happens here (mySum2 list = sum2 0 list)
  where sum2 !acc [] = acc 
        sum2 !acc (x:xs) = sum2 (acc+x) xs 
        -- when sum2 traverse the list, it accumulates all the list elements in acc
        -- ! is the Bang pattern, it forces evaluation, avoids laziness in the variable acc
        -- without the (!), evaluting it on [1..100000000] still give you stack overflow
        -- functions and 'variables' defined in 'where' clause are local
        -- they will not affect other functions.
        -- advanced tip : ! will only force evaluation to weak head normal form (WHNF).
        -- if you need to fully evaluate a complex data structure, you need rdeepseq

效果:

ghci> :{
ghci| mySum2 = sum2 0
ghci|   where sum2 !acc [] = acc
ghci|         sum2 !acc (x:xs) = sum2 (acc+x) xs
ghci| :}
ghci> mySum2 [1..100000000]
5000000050000000

This technique, where a recursion is called directly by the function body itself, is called tail recursion. (the ! is called bang pattern, it triggers eager evaluation, avoids laziness in this variable acc).

Tail recursion 虽然是递归的一种,但是它会被编译器在编译时优化成一个循环。

Example: Fibbonacci numbers in different ways

Naive recursion

fib :: Int -> Integer
fib 1 = 1
fib 2 = 1
fib n = fib (n-1) + fib (n-2) -- the naive version
-- computing time is exponential and this wastes a lot of computation
-- fib 5 = fib 4 + fib 3
-- = (fib 3 + fib 2) + (fib 2 + fib 1) -- wastes the information about fib 3
-- = ((fib 2 + fib 1) + 1) + (1 + 1)
-- = ((1 + 1) + 1) + (1 + 1)
--    ^^^^^^^        ^^^^^^^  redundant computation! should only be computed once

Use Tail recursion

fib2 n = fibT 0 1 1
  where fibT previous current n = current
        fibT previous current m = fibT current (previous + current) (m+1)

Use Infinite List

fibList = 0 : 1 : zipWith (+) fibList (drop 1 fibList) -- infinite list!
-- 0 : 1 : (0 : 1   : a2  : ...)
--         (+   +      +       )
--         (1 : a2  : a3  : ...)
-- ||  ||  ||   ||       
-- 0 : 1 : a2 : a3  : ...

!! :: [a] -> Int -> a -- get the i-th term of a list, O(i) time since linked list
-- this is a standard function, but we kindly provide its definition for you 
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] 
zipWith operator (x:xs) (y:ys) = operator x y : zipWith xs ys
zipWith operator _ _ = []
-- zipWith f [1,2,3] [4,5,6] = [f 1 4, f 2 5, f 3 6]

常用函数

列表操作 (map, fold, concat, filter, zip, zipWith …)

Fold

map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs   -- recursion!

map (*2) [1..5] = [2,4,6,8,10]
-- foldl f z0 [x1,...,xn] = ((((z0 `f` x1) `f` x2) `f` x3) `f`) ... `f` xn
foldl :: (a -> b -> b) -> a -> [b] -> b
foldl f z0 [] = z0
foldl f z0 (x:xs) = foldl f (z0 `f` x) xs

-- foldl' is the strict version of foldl
foldl' :: (a -> b -> b) -> a -> [b] -> b
foldl' f !z0 [] = z0
foldl' f !z0 (x:xs) = foldl' f (z0 `f` x) xs
-- in fact, 
-- sum = foldl' (+) 0
-- product = foldl' (*) 1

-- foldr f z0 [x1,...,xn] = x1 `f` (x2 `f` (x3 `f` (... `f` (xn `f` z0))))
foldr :: (b -> a -> a) -> a -> [b] -> a
foldr f z0 [] = z0
foldr f z0 (x:xs) = foldl f (x `f` z0) xs
-- efficient if f is lazy at right argument

Exercise: what is this function? without computing the code, figure it out its type and what does it do?

foldr (:) []

In practice, people only use foldl’ and foldr. Rarely do you need to use foldl and foldr’, you can think about why this is the case :)

Filter

filter :: (a -> Bool) -> [a] -> [a]
filter condition [] = []
filter condition (x:xs) = case condition x of
    True  -> x:filter condition xs
    False ->   filter condition xs

filterNonZero = filter (/= 0) -- hard exercise: what is the type of this function?

Concat

(++) :: [a] -> [a] -> [a]
[] ++ ys     = ys
(x:xs) ++ ys = x:(xs ++ ys)

concat :: [[a]] -> [a]
concat = foldr (++) [] -- you should not use foldl, because foldr behaves better with laziness
-- foldr is very efficient if the operator is lazy on its right argument

concat [[1],[2,3],[4,5,6],[]] = [1,2,3,4,5,6]

Exercise: Verify that filter can be defined as concat:

filter condition = concat . map (\x -> if condition x then [x] else [])

Newtype的使用与类型安全初探

newtype 是一个类似于data的类型关键字,用于创造新的类型。它可以有任意数量的类型变量,但与data不同的是,newtype只能拥有一个数据类型,这使得它本质上是为这个数据类型创造了一个copy, 这个copy在内存中表达与原类型完全相同,但是在类型系统中编译器将对它们加以区分。

这使得newtype具有如下特点:

  1. newtype是一个0成本抽象,它并不会带来额外的性能开销。在编译后它的表现会和其所copy的类型一致。

  2. newtype可以用来防止相似的类型混淆,增加可读性和类型安全性。

getUserInput :: IO String 
getUserInput = getLine

filterUserInput :: String -> String
filterUserInput string = filterFunction string

useUserInput :: String -> IO ()
useUserInput str = (...)
-- bad design, what are these strings? you might accidentally mix them

getUserInput >>= useUserInput -- compiles without problem, but this is not what we want
-- ignore the monad and functor magics for now. 
-- If you are curious, think (>>=) :: IO a -> (a -> IO b) -> IO b
-- (In fact, (>>=) :: Monad m => m a -> (a -> m b) -> m b , here m = IO)
newtype UserInput = UserInput String
newtype FilteredString = FilteredString String

getUserInput :: IO UserInput
getUserInput = UserInput <$> getLine 

filterUserInput :: UserInput -> FilteredString -- much more clear what it does
filterUserInput (UserInput string) = FilteredString (filterFunction string)

useUserInput :: FilteredString -> IO ()  -- much more clear that you need to do filtering before use
useUserInput (FilteredString str) = (...)
-- this will prevent anyone from passing UserInput directly to useUserInput without filtering

getUserInput >>= useUserInput -- type error : UserInput is not FilteredString 
-- ignore the monad and functor magics (>>=, <$>) for now. 
-- If you are curious, (<$>) :: (a -> b) -> IO a -> IO b
-- in fact, (<$>) :: (Functor f) => (a -> b) -> f a -> f b , here f = IO

当你的data只有一项时,应该使用newtype来替代它。

data Name = Name String -- why not use newtype

newtype Name = Name String

data的枚举用法,和(coproduct)

例子:形状

type Radius = Double 
type Height = Double 
type Width = Double  -- types are just synonyms, they are identified by the compiler
                     -- i.e. Radius = Height = Width = Double as types
                     -- they do not provide extra type safety, but they provide clearity
        
      --| Shape is the type constructor
data Shape = Circle Radius | Rectangle Width Height 
              --| Circle :: Radius -> Shape 
              --| these are data constructors
                              --| Rectangle :: Width -> Height -> Shape
                              --| these are data constructors

Circle 5 :: Shape

Rectangle 3 4 :: Shape

area :: Shape -> Double
area (Circle r) = pi * r**2
area (Rectangle x y) = x * y
-- enumerated data constructors can be easily pattern-mathced

你可以在newtype和data中使用类型变量

例子:Maybe a, 可能丢失的信息

Maybe 是一个Haskell中的标准类型,它用来安全的表达可能不存在的信息,或者可能失败的计算

data Maybe a = Just a | Nothing -- you don't have to define this, this is already defined
-- Just :: a -> Maybe a
-- Nothing :: Maybe a  -- requires no field

safeSqrt :: Double -> Maybe Double
safeSqrt x = if x >= 0 then Just (sqrt x) else Nothing
-- this function avoids runtime errors!

一般来说,你可以使用任意多个类型变量

data InterestingData a b c d = InterestingData a | WhatIsThis b c | Nothing
-- d is a 'phantom' type variables here, they only exists at type level, 
-- no values corresponds to them. This is valid

newtype MyTaggedType a b = CreateMyTaggedType b
-- a is phantom variable. Remember newtype can only have one field on the constructor side,
-- not at the left side of type definition.

例子:列表[a]

事实上,列表[a]是一个data

data [a] = [] | a:[a]
[] :: [a] -- empty list
(:) :: a -> [a] -> [a] -- list constructor

-- 1:2:3:[] = 1:(2:(3:[])) = [1,2,3]

但是[]是已经预定义好的,上面的代码显然用不了。我们可以使用一个自定义的List:

data List a = Nil | Cons a (List a)

Nil :: List a                 -- is the empty list
Cons :: a -> List a -> List a -- list constructor

Cons 1 (Cons 2 (Cons 3 Nil))  -- a list with three term. compare it with 1:2:3:[]

myLength :: List a -> Int
myLength Nil = 0
myLength (Cons _ xs) = 1 + myLength xs  
-- this code is acceptable. but for best performance you should write it in tail recursion

Use Hoogle to find your functions

(在浏览器中演示) hoogle.haskell.org

小结

  1. Eta reduction can be used to simplify definition, allowing for more abstract code

  2. Recursion is the workhorse in functional programming.

  3. Tail recursion is a special form of recursion.

  4. Lambda expression can create functions without giving function a name

  5. functions map, foldl, foldr, foldl’, foldr’, filter, concat ..

  6. newtype can create distinct type copies, while type is used to create synonym.

  7. data can create quite complex data types, allowing sum and products be mixed together.

作业

  1. Exercise: try to eta-reduce the following functions, eliminate their input variable ```Haskell removeOdds :: [Int] -> [Int] removeOdds xs = filter odd xs

    doubleSum :: [Int] -> [Int] doubleSum xs = sum (map (*2) xs)

    apply :: (a -> b) -> a -> b apply f x = f x – you can eliminate them all, not just x ```

  2. Exercise: what is this function? without computing the code, figure it out its type and what does it do? Haskell foldr (:) []

  3. Exercise: Verify that filter can be defined as concat: Haskell filter condition = concat . map (\x -> if condition x then [x] else [])

  4. 为List a 和[a]证明“同构”。即写一个转换List a 到[a]的函数,和一个[a]到List a 的函数。它们之间的(两个)复合应该是id.

  5. 你能写一个函数,用于连接多个可能失败的计算吗?比如计算sqrt(1-sqrt(x))

    sequenceMaybe :: (a -> Maybe b) -> (b -> Maybe c) -> a -> Maybe c
    sequenceMaybe (...?)
    
    myfunction = safeSqrt `sequenceMaybe` (safeSqrt . (1-)) :: Double -> Maybe Double
    
    -- hint: don't forget that you can use (case ... of) expression to pattern match
    -- case safeSqrt x of
    --   Just y  -> (...)
    --   Nothing -> (...)
  6. 考虑如下定义的一个二叉树类型: ```Haskell data Tree a = Nil | Tree a (Tree a) (Tree a) deriving Show

    – for example, you can have exampleTree = Tree 2 (Tree 3 Nil Nil) (Tree 1 Nil Nil) :: Tree Int – 2 – 3 1

    – Task: try to implement the mapTree function : – that maps every element inside the tree by using a given function. mapTree :: (a -> b) -> Tree a -> Tree b mapTree (…?)

    main = print $ mapTree (*2) exampleTree ```

  7. 使用递归,尝试实现归并排序. Haskell mergeSort :: (Ord a) => [a] -> [a] (什么是归并排序?归并排序是这样一种排序算法,它先将输入分成两个部分,对两个部分分别进行归并排序(recursion),然后通过“归并操作”,即不断比较两个数列的头元素,先取其中较小的那个,将两个已经排好的序列在O(n)的时间合并起来。)

    mergeSort [] = []
    mergeSort [x] = [x]
    mergeSort list = merge (mergeSort part1) (mergeSort part2)
        where (part1, part2) = splitEvenly list
              merge (x:xs) (y:ys)
                | x <= y    = x:merge xs (y:ys)
                | otherwise = y:merge (x:xs) ys
              merge [] ys = ys
              merge xs [] = xs
              splitEvenly (x0:x1:xs) = (x0:rest1, x1:rest2)
                where (rest1, rest2) = splitEvenly xs
              splitEvenly [x] = ([x], [])
              splitEvenly [] = ([], [])