Author: Eiko

Tags: Haskell, Haskell Magic, type, class, type variable, polymorphism, patterm matching, currying, associativity

Haskell魔法入门 Lecture 2 - 基本类型, Class, 类型变量 和 Polymorphism

基本类型

Haskell作为一个静态的强类型语言,有如下一些最基本的类型。使用它们我们可以构造复杂的程序和类型。

单位

最简单的类型是如下称为”单位”的类型

() :: ()

这里右边的()是一个类型,而左边的()是类型()中的一个值。(你可以理解为一个只有一个元素的集合\(\{*\}\), 虽然type通常不作集合理解)这个类型()只有()这一个取值,不能取其他任何东西。它通常用来代表没有任何信息量的返回值,或者一个占位符。

main :: IO () -- main returns a '()' value

数字

-- Int, Integer, Double, Float -- for numbers

123 :: Int

a :: Int -- 64 bit integer, 'long' -- in fact, not necessarily 64 bit
a = 5

b :: Integer -- integer with no length limit
b = 19387128905274021394721394728930147208913472839104723190847219034 

c :: Double -- we also have Float, for single precision 
c = 1.399e-2 -- 0.01399

import Data.Ratio
22 % 7 :: Rational -- rational numbers

一元函数

有基础类型之后,我们可以通过箭头(->)来构造新的类型,我们在它左边和右边分别放一个类型,就会得到一个类型a->b, 这便是从类型a到类型b的一元函数的类型

f :: Int -> Int -- the type of a function from Int to Int
f = (+1)        -- this is the definition of the function f

f 5 :: Int      -- once you apply an Int with (Int -> Int), you get an Int
-- f 5 = 6

isEven :: Int -> Bool
isEven n = mod n 2 == 0

--isEven 6 = True
--isEven 7 = True

类型变量使得你可以用一个定义处理大量的不同类型

id :: a -> a  -- identity function, it does nothing, always return the same value
id x = x      -- 'a' is a type variable here, it can take any type, no matter how complex 
              -- in category theory, this is the canonical element in End(1_C), 
              -- 1_C is the identity natural transform

二元函数,运算符

你可以有二元,或者定义有更多输入变量的函数。它们的类型是 a -> b -> c. 这种写法事实上是Currying, 将一个二元函数看成一个把第一个变量映射到一个一元函数的函数。

(没错!二元函数可以看成一个取值为一元函数的一元函数!owo)

比如数学上有一个二元函数\(f(x, y):X\times Y\to Z\), 当你固定\(x\)时,\(y\mapsto f(x,y)\)就会成为一个\(Y\)上的一元函数。 \[f(x,) : Y\to Z\] 于是\(f\)就是一个将\(X\)映射到\(Y\to Z\)的函数 \[f : X \to (Y \to Z).\] 在Haskell中,这样的二元函数就用类型a -> b -> c来表达,它与a -> (b -> c)是等价的,这就是说,(->)是右结合的。

每当我们写下a -> b -> c时,我们实际的意思是a -> (b -> c). 因为(->)是右结合的,这个括号可以忽略。

f :: a -> b -> c -- this is the type of a general two-variable function

addXY :: Int -> Int -> Int  
addXY x y = x + y

-- addXY 5 6 = 11

常用的运算符(\(), (.) ```Haskell -- function application, it takes a function, a value, and apply it (\)) :: (a -> b) -> a -> b

f $ x = f x – this function has lowest precedence, it can be used as a ‘bracket’ print x + y – type error, you cannot add print x to y print $ x + y – correct, equivalent to print (x + y), + have higher precedence

– composition of two functions, f . g means first apply g, then apply f – compare the composition in category theory: they are identical – : Hom(B, C) x Hom(A, B) -> Hom(A, C) (.) :: (b -> c) -> (a -> b) -> (a -> c) (f . g) x = f (g x)


### 两种二元函数的表达的转换:currying和uncurrying
事实上,你当然可以定义基于笛卡尔乘积的二元函数
```Haskell
addXY' :: (Num a) => (a, a) -> a
addXY' (x, y) = x + y

addXY :: (Num a) => a -> a -> a
addXY x y = x + y

如之前所说的,这两者是同构的,我们有一种标准的方式将两者转换。

curry :: ((a, b) -> c) -> a -> b -> c -- this bracket cannot be removed
curry fc xa yb = fc (xa, yb)

uncurry :: (a -> b -> c) -> (a, b) -> c -- this bracket cannot be removed
uncurry f2 (xa, yb) = f2 xa yb

curry . uncurry = id -- these two identity means the two types 
uncurry . curry = id --(a -> b -> c) and (a, b) -> c are isomorphic

二元函数的运算符形式

任何一个二元函数f :: a->b->c都可以写成运算符形式`f`. 运算符形式是指一种将第一个参数放在左边,第二个参数放在右边的表达形式,类似于我们常见的各种二元运算符.

-- a two-variable function can be partially applied to get a one variable function
addXY 5 :: Int -> Int 
-- (addXY 5) 6 = addXY 5 6 = 11

-- any two-variable function can be written as an operator by using the operator syntax
5 `addXY` 6 :: Int -- the same as addXY 5 6

-- Note the following two partial applications are applied to different variables.
(`addXY` 6) :: Int -> Int  -- (`addXY` 6) 5 = 5 `addXY` 6 = 11
(5 `addXY`) :: Int -> Int  -- (5 `addXY`) 6 = 5 `addXY` 6 = 11

二元运算符的函数形式

二元运算符比如+, 除了作为运算符一样使用,你还可以把它当成二元函数使用。注意数学上运算符本身和二元函数没有区别,它只是函数的一种特殊写法.

要将运算符作为函数使用,只需将它用括号包围。

5 + 2 = 7

(+) 5 2 = 7

f $ x = f x
($) f x = f x

使用符号定义自己的运算符

除了把二元函数写成运算符形式,你还可以定义自己的运算符(但是不能和Haskell保留的符号重合)

(<:) :: Int -> Int -> Int
a <: b = a*a + b*2

-- (optional) you can use infixr, infixl to declare your symbol's default 
-- associativity and priority
infixl 5 <: -- this means <: is left associative, has priority 5

5 <: 6 :: Int -- computes 37

左结合与右结合

当我们写下一连串的相同的运算符时,如果不加括号并且该运算符并不一定满足结合律,可能会产生歧义.这时为了有一个明确的运算顺序,我们就需要区分左结合和右结合

infixl 5 :> -- declare left associative
a :> b :> c :> d = ((a :> b) :> c) :> d

infixr 5 :> -- declare right associative
a :> b :> c :> d = a :> (b :> (c :> d))

如果一个运算符满足结合律(比如(.)), 那么它是左结合还是右结合的就无关紧要(除非在计算的表现上有不同)。因为无论你怎么添加或者移除括号,得到的值是相同的。

一个有用的例子:定义右作用运算符

所谓右作用,是将函数作用在右边。在代数学中,左作用和右作用之间乘法的顺序是相反的,因此还需要定义一个反向的合成(右合成). 这不是标准函数,但是你可以自己定义,尤其是你喜欢右作用的话owo

( # ) :: a -> (a -> b) -> b -- right action
x # f = f x

(.>) :: (a -> b) -> (b -> c) -> a -> c -- right composition
f .> g = g . f
-- equivalently,
(.>) = flip (.)
-- if you prefer
(f .> g) x = g (f x)

-- compare the left composition (.) :: (b -> c) -> (a -> b) -> a -> c
-- (f . g) x = f (g x)

x # f # g = x # (f .> g) = x # (g . f) = g (f x)

infixl 9 # -- left associative: x # f # g = (x # f) # g
-- x # f # g = x # (f .> g)

-- example: f computes (x^2+1)^3
f x = x # (^2) # (+1) # (^3)
-- eta-reduction:
f = (^2) .> (+1) .> (^3)

-- compare the left action way of writing:
-- f $ x = f x
-- infixr 0 $ -- $ is right associative, f $ g $ x is viewed as f $ (g $ x)
-- f $ (g $ x) = (f . g) $ x
f x = (^3) $ (+1) $ (^2) $ x
-- eta-reduction:
f = (^3) . (+1) . (^2)

注意唯一的一条语法规则:函数作用永远优先于运算符。

calculate x y `op` calculate z t -- always = (calculate x y) `op` (calculate z t)

f1 . f2 f3 -- will always be understood as f1 . (f2 f3)

(+1) . (*2) 3 -- type error, you cannot compose (+1) with (*2) 3, which is a number 6
(+1) $ (*2) 3 -- 7
(+1) . (*2) $ 3 -- 7

布尔值

x :: Bool
x = True

not :: Bool -> Bool
not True = False
not False = True  -- Pattern matching 

(||) :: Bool -> Bool -> Bool -- the 'or' operator
(&&) :: Bool -> Bool -> Bool -- the 'and' operator

列表

-- [], the list type constructor.
-- [a] is the type for the list whose elements are all of type a
-- [] is a linked list, it is the default lazy data structure in haskell
as :: [Int]
as = [1,2,3,4,5]
-- alternatively, it can be written as
as = [1..5] -- in fact, this notation will work for any type that implements Enum class

-- you can have infinite lists
infl :: [Integer]
infl = [1..]

bs :: [Double] -- note that a list can only contain elements of the same type
bs = [5.5, 0.0, 1.333]

More about list:

-- things you should know about lists
[] :: [a] -- the empty list
-- a is a type variable here, it can represent any type

(:) :: a -> [a] -> [a] -- the list constructor, 
-- it takes two arguments, an element of type a and a list of the same type
-- examples:
2:[] = [2]
3:[1] = [3, 1]
[1]:[2] --type error
1:2 -- type error
1:['a'] -- type error

(:)可以用来Pattern match, 解构一个列表

-- (:) is right associative (:是右结合的运算符,意思是默认先算右边而不是从左往右计算)
[1,2,3,4,5] = 1:2:3:4:5:[] = 1:(2:(3:(4:(5:[]))))

-- (:) can be used to destruct a list in pattern matching
head :: [a] -> a -- obtain the first element of a list
head (x:xs) = x

head [1,2,3] = 1
head [] -- run time error


tail :: [a] -> [a]
tail (x:xs) = xs
tail [] = [] -- without this line, tail [] will give run-time error


tail [1,2,3] = [2,3]

字符和字符串

-- Char, String

x :: Char -- 单个字符,表示一个Unicode字符
x = 'o' :: Char -- single quote instead of double quote

type String = [Char]  -- 类型别名,String只是[Char]的别名
xs :: String -- 字符串是字符的列表
xs = ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!']
-- alternatively
xs = "hello world!" -- they are equivalent. Use double quote for string

z :: Char
z = "!" -- type error, String (=[Char]) is not Char

Tuples (笛卡尔乘积, categorical product)

-- When you have two types a, b, you can construct a type (a, b) 
t :: (Int, Char)
t = (5, 'a')

-- to get the first element, use fst
fst :: (a, b) -> a
fst (x, y) = x -- this is the definition of this function

fst t = 5

-- to get the second element, use snd
snd :: (a, b) -> b
snd (x, y) = y

snd t = 'a'
-- You can have as many items as you want

y :: (Int, Char, Double, Integer, Float, String)
y = (0, 'b', 9.99, 123, 8e-2, "oh no")

简单的自定义类型(数据结构)

为了不要一上来就把大家搞晕,我们先介绍最简单的自定义数据类型,它与上面出现的tuples(笛卡尔乘积)是完全等价的。

data MyType = MyTypeConstructor Type1 Type2 ... TypeN
-- this is basically the same as (Type1, Type2, ..., TypeN)
-- MyType is the type name here, or the type constructor
-- MyTypeConstructor is the data constructor
-- it is a function that gives you a value of type MyType
-- when you filled in its parameters
-- MyTypeConstructor :: Type1 -> Type2 -> ... -> MyType
-- ***data constructor can have the same name as type constructor*** (which is a common practice)

举个例子:我可以定义一个用来记录名字和id的类型

     --| this is the type, or type constructor
     --|
data NameId = NameId String Int
              --|
              --| this is the data constructor, they are allowed to have the same name

    -- the data constructor is a function of the type String -> Int -> NameId

alice = NameId "Alice" 123 :: NameId
-- NameId is used to both represent the type and the data constructor.
-- alice is now a value (also viewed as a function with zero arguments) of type NameId

-- we can also use the data constructor to destruct and pattern match a value
printName :: NameId -> IO ()
printName (NameId name int) = putStrLn name
-- if a record is not used, you can use _ to place hold it without reference it
-- printName (NameId name _) = putStrLn name    -- works

putStrLn :: String -> IO () -- function in the standard library, prints a line with given string

Record Syntax

data NameId = NameId { name :: String, userId :: Int } -- equivalent to the above example
-- but now you will get two extra functions generated for you
name :: NameId -> String
userId :: NameId -> Int


alice = NameId "Alice" 123 -- fine
bob = NameId { name = "Bob", userId = 124 } -- Record Syntax gives you another way to write it

-- of course you can also write them yourself (if not using Record Syntax)
-- name (NameId str int) = str
-- userId (NameId str int) = int

printName = putStrLn . name

用data定义和类型(coproduct)

data 除了可以定义笛卡尔乘积之外,还可以用来定义数据类型的’和’,最简单的例子是枚举类型:

data UserId = PrivateId Int | GroupId Int --
-- UserId is the type
-- PrivateId :: Int -> UserId is the data constructor
-- GroupId   :: Int -> UserId is another data constructor

data Day = Mon | Tue | Wed | Thu | Fri | Sat | Sun deriving Show
-- Mon :: Day
-- ... this type Day have 7 data constructors

dayToInt :: Dat -> Int
dayToInt Mon = 1
dayToInt Tue = 2
dayToInt Wed = 3
dayToInt Thu = 4
dayToInt Fri = 5
daytoint Sat = 6
daytoint Sun = 7
-- actually, you can derive Bounded and Enum for this class, you do not have to write this

Pattern Match的三种方式

data MagicalMaterial = Potion String | Dust deriving Show

-- Guards
printMaterial :: MagicalMaterial -> IO ()
printMaterial input 
    | Potion s <- input   = putStrLn $ "this is a potion named " ++ s
    | Dust     <- input   = putStrLn $ "this is dust"

-- Pattern-mathch style function definition
printMaterial :: MagicalMaterial -> IO ()
printMaterial (Potion s) = putStrLn $ "this is a potion named " ++ s
printMaterial Dust       = putStrLn $ "this is dust"

-- Case expression
printMaterial :: MagicalMaterial -> IO ()
printMaterial input = case input of
    Potion s -> putStrLn $ "this is a potion named " ++ s
    Dust     -> putStrLn $ "this is dust"

Guards 通常被用来处理条件判断

rejectIfDivisibleBy2Or3 :: Int -> String
rejectIfDivisibleBy2Or3 x
    | x `mod` 2 == 0 = "reject, it is even"
    | x `mod` 3 == 0 = "reject, it is a multiple of 3"
    | otherwise      = "accepted: " ++ show x
    -- otherwise = True

你需要对上述操作非常熟悉,因为Pattern Matching是最基本的操作,大家基本上不会使用if (偶尔使用)

if (condition of type Bool) then (expression of type a) else (expression of type a)
-- the whole expression have type a

Class

Haskell中的class和面向对象中的class有所不同。如果你了解rust,class就是traits. 它描述了一些类型的共有性质,从而可以通过这些共有性质出发来定义这些类型的统一行为,编写统一的函数。

Num, Integral class

让我们来看第一个例子,Num class. 如果我们使用(+)运算符,我们会发现,它可以对Int和Double使用

5 + 3 = 8 :: Int

1.1 + 8.8 = 9.9 :: Double

那么(+)的类型应该是什么呢?难道是a -> a -> a?但是这是不可能的,因为并不是所有类型都可以实现加法。事实上,如果你检查(+)的类型签名,你应该可以得到(可以在ghci中通过 :t (+) 来检查一个函数的类型)

(+) :: (Num a) => a -> a -> a   -- a function with type variable types is called polymorphic function

5 :: (Num a) => a  -- in fact, the number '5' is a polymorphic function

这里类型签名::后面多了一个(Num a), 还有一个双箭头=>. 这个(Num a)就是class constraint, 它限制类型变量a只能在一类“数字类型”中取值,而不能取任何类型。=>前面的部分叫做Context. 一个类型a要成为Num中的一元,它需要实现以下几个函数:

class (Num a) where
  (+) :: a -> a -> a          -- 加法
  negate :: a -> a            -- 加法逆
  (*) :: a -> a -> a          -- 乘法,不需要实现乘法逆。因此Num更加类似于一个‘环’。
  fromInteger :: Integer -> a -- 将整数嵌入到a类中
  abs :: a -> a               -- 绝对值
  signum :: a -> a            -- 符号函数

除了需要定义符号函数和绝对值,剩下的四条基本上在说a是一个环。(当然,如果你定义的加法不交换或者乘法不结合之类那就不是环了(那样是危险的)) 这上面的六行实际上是class Num的定义,只要实现了上述函数,你自定义的类型也可以成为Num class的一元并且可以使用所有对Num class可用的函数,比如整数次方指数函数

(^) :: (Num a, Integral b) => a -> b -> a  -- computes x^n, with the (*) defined by you.
x^0 = 1             -- Integral is another class which contains things like Int, Integer.
a^n = a * a^(n-1)   -- defined by induction (well, recursion)

比如,当我定义了一个矩阵数据结构,将矩阵实现为了Num class,那么我就可以使用这个(^)函数而无需对矩阵再写一次。

data Matrix = ... some definition

instance Num Matrix where -- syntax for writing a class instance
  mat1 + mat2 = (... some definition)
  negate mat  = (... some definition)
  mat1 * mat2 = (... some definition)
  (...)

mat^5 -- will work

sum [mat1, mat2, mat3] -- will work
-- sum :: (Num a) => [a] -> a

product [mat1, mat2, mat3] -- will work
-- product :: (Num a) => [a] -> a

Ord, Eq class

如果要使用(>), (<), (==), (>=), (/=)等比较运算符,你的类型首先需要是有序关系的,或者是可以比较的。这就是说,它需要成为Ord或者Eq class的元素(实例,instance).

(<) :: (Ord a) => a -> a -> Bool
(>) :: (Ord a) => a -> a -> Bool
(>=) :: (Ord a) => a -> a -> Bool
(<=) :: (Ord a) => a -> a -> Bool

compare :: (Ord a) => a -> a -> Ordering  
-- Ordering is a type with only three possible values, LT, EQ, GT
-- compare 5 6 = LT
-- compare 1 1 = EQ
-- compare 7 2 = GT

(==) :: (Eq a) => a -> a -> Bool  -- test whether two values of the same type are equal
(/=) :: (Eq a) => a -> a -> Bool  -- test whether two values of the same type are not equal

为了实现Eq class,你只需要定义(==)和(/=)函数其中的一个。另一个会自动根据你定义的这一个生成!

为了实现Ord class, 你的type首先需要实现Eq class,然后你只需要定义compare或者(<=)其中的任何一个。

事实上,大部分由基本类型定义的自定义类型都可以自动导出Ord, Eq class,你只需要加一句

data NameId = NameId String Int deriving (Eq, Ord)

Show, Read class

Show, Read class是实现了值如何转变成字符串,以及如何从字符串变回值的函数的类型。

class Show a where
    show :: a -> String

class Read a where
    (...a bit complicated...)

read :: (Read a) => String -> a
readMaybe :: (Read a) => String -> Maybe a -- we will talk about this in later classes
                                           -- import Text.Read(readMaybe)   to use it

我们常见的基本类型都已经实现了Show和Read class:

-- show 123 = "123"
-- show (1,'a') = "(1,'a')"
-- show "hello" = "\"hello\""

display = putStrLn . show

display (1,'a',[5,6]) -- will print "(1, 'a', [5, 6])"
-- in fact, the standard 'print' function works similarly.

print :: (Show a) => a -> IO ()

-- ideally, we should have   read . show = id
-- show . read however is typically a bit far from id, and can fail 
-- since read may throw runtime error
-- so we usually use readMaybe to avoid them 

data MyType = MyType String deriving Show 
-- you can let compiler derive Show, Read instances for you

print (MyType "hello") -- will print MyType "hello" on your screen


read "5" :: Int -- equals 5

Enum, Bounded class

Enum class描述了可以用整数来枚举类型中的元素的类型

class Enum a where
    toEnum :: Int -> a
    fromEnum :: a -> Int

-- once Enum class is implemented, you can use the following standard functions
enumFrom :: (Enum a) => a -> [a]                 -- equivalent to [x..]
enumFromThen :: (Enum a) => a -> a -> [a]        -- equivalent to [x0,x1..]
enumFromThenTo :: (Enum a) => a -> a -> a -> [a] -- equivalent to [x0,x1..xN]
enumFromTo :: (Enum a) => a -> a -> [a]          -- equivalent to [a..b]
-- We don't really use these functions, we always use its syntax sugar form
-- [1..] = [1,2,3,...]
-- [1,3..100] = [1,3,5,7,...,99]
-- ['a'..'z'] = ['a', 'b', .., 'z']
-- Int, Char, these types are standard instances of Enum

Bounded class描述了取值范围有界的一类类型。

class Bounded a where
    minBound :: a
    maxBound :: a

instance Bounded Char  -- these types are instances of Bounded
instance Bounded Bool
instance Bounded Int

-- minBound :: Bool = False
-- maxBound :: Bool = True

-- minBound :: Int = -9223372036854775808
-- maxBound :: Int = 9223372036854775807

-- minBound :: Char = '\NUL'
-- maxBound :: Char = '\1114111'

小结

  1. 基本类型(单位,数字,Bool), 列表和tuple, 最简单的一些函数

  2. 一元和多元函数, (->), currying

  3. Pattern matching, 解构列表, tuple, data constructor

  4. 用data定义简单的积类型和和类型

  5. class的概念,常见class: Num, Integral, Eq, Ord, Show, Read, Enum, Bounded.

试一试

  1. 在ghci中实验这些函数!

  2. 定义一个复数Complex类型

data Complex = Complex { real :: Double, imaginary :: Double } deriving Show

为它实现Num class. (注意复数的乘法不是分量乘法)

  1. 定义Person数据类型,包含姓名,年龄等信息。为它手动实现Show和Eq class

  2. 尝试定义函数

myMap :: (a -> b) -> [a] -> [b]
-- hint: use pattern match on list (i.e. (x:xs) or []) to destruct a list
--       you will need to use 'recursion', i.e. call the function itself in its definition

下集预告

基本函数,data的更多用法,newtype和类型安全初探。