Notes On Using Hasktorch

Author: Eiko

Time: 2025-02-22 18:12:07 - 2025-02-22 18:12:07 (UTC)

Hasktorch

Is a Haskell library built on libtorch, the same C++ library that powers PyTorch.

Tensors

The tensors people say here are not tensors in mathematics, they are just arrays (vector written in coordinates) whose dimension is a product of some numbers. For example, a $2 \times 3$ matrix is an array of $6$ elements and is also a tensor of dimension $2 \times 3$ .

But you see why it can be called as tensor, a linear transform $f \in Hom (W, V)$ , or a mathematical tensor in $V \otimes W^{*}$ , can be viewed as a tensor (array) of dimension $\dim V \times \dim W$ once you pin down a basis of $V$ and $W$ .

For example we have functions in Torch.Tensor.Factories that create tensors filled with zeros

zeros' :: [Int] -> Tensor

zeros :: [Int] -> TensorOptions -> Tensor

There are also two easy cute maps that can be used to create tensors from lists or values

asTensor :: (TensorLike a) => a -> Tensor

asValue :: (TensorLike a) => Tensor -> a

For example you can make a scalar as a zero-dimensional tensor (of shape []) using asTensor (3 :: Float). You can also convert a list of lists to a tensor using asTensor ([[1,2],[3,4]] :: [[Float]]).

Tensor options can be constructed via the following functions

defaultOpts :: TensorOptions

withDType   :: DType  -> TensorOptions -> TensorOptions

withDevice  :: Device -> TensorOptions -> TensorOptions

withLayout  :: Layout -> TensorOptions -> TensorOptions

-- where DType is used to specify data type
data DType = Bool | UInt8 | Int8 | Int16 | Int32 | Int64 | Half | Float | Double | ComplexHalf | ComplexFloat | ComplexDouble | QInt8 | QUInt8 | QInt32 | BFloat16

-- Device is used to specify the device tensor is running on
data Device = Device { deviceType :: DeviceType, deviceIndex :: Int }

data DeviceType = CPU | CUDA | MPS

Operations

Tensors are in Num so you can do Ring operations on them, component-wise.
There are component wise operations like relu :: Tensor -> Tensor which applies the ReLU function to each component of the tensor. There are a lot of functions you can use in Torch.Functional and Torch.Typed.Functional.
The function select :: Int -> Int -> Tensor -> Tensor in Torch.Tensor slices input tensor along the selected dimension at given index.

The first parameter specifies the dimension to slice on, counted from $0$ .

The second parameter specifies the index to slice at, counted from $0$ .

${select}_{i, j} : R^{d_{0} \times d_{1} \times \dots \times d_{n - 1}} \to R^{d_{0} \times \dots \times d_{i - 1} \times {j} \times d_{i + 1} \times \dots \times d_{n - 1}}$

$(x_{i_{0}, \dots, i_{n - 1}}) \mapsto (x_{i_{0}, \dots, i_{i - 1}, j, i_{i + 1}, \dots, i_{n - 1}})$

This is actually a projection map at the $i$ -th coordinate, choosing the $j$ -th slice.

Random Numbers

import Control.Monad.State
import Torch
import Torch.Internal.Managed.Type.Context (manual_seed_L)

Without specifying RNG generator, you have the impure random number generation

randIO' :: [Int] -> IO Tensor
-- ^ impure, but there is a hack function `manual_seed_L` to set the seed

example = do
  manual_seed_L 12345
  randIO' [2,3]

that fills you with uniform random numbers in $[0, 1]$ with the given shape.

There is also a stateful generator

rand' :: [Int] -> Generator -> (Tensor, Generator)
-- ^ which is essentially
--   [Int] -> State Generator Tensor

example = do
  rng0 <- mkGenerator (Device CPU 0) 12345
  ... use rng0 potentially in state ...

Differentiation

There are two main functions in Torch.Autograd

makeIndependent :: Tensor -> IO (IndependentTensor)
```
newtype IndependentTensor = IndependentTensor { toDependent :: Tensor }
```
this is just a newtype wrapper. But using the IO function makeIndependent we will implicitly mark the corresponding array in libtorch as differentiable and construct a compute graph that we can take grad on. Think of IndependentTensor as free variables you can use to compose functions, and take partial derivatives on.

The grad function is used to compute the gradient of a (composed) tensor w.r.t. some independent tensors.

grad
  :: Tensor 
      -- ^ a tensor that requires gradient (requiresGrad = True)
      -- this tensor is a function of the independent tensors
  -> [IndependentTensor]
      -- ^ the "free variables" that the tensor depends on
  -> [Tensor]            
      -- ^ gradient of the tensor w.r.t. each of the free variables
      --   evaluated at the current value of the independent tensors

Differentiable Programs

class Parametrized f where
  flattenParameters :: f -> [Parameter]
  -- type Parameter = IndependentTensor

  default flattenParameters
    :: (Generic f, Parametrized' (Rep f))
    => f -> [Parameter]

  flattenParameters = flattenParameters' . from
  -- recall that from :: a -> Rep a   is the unit
  --             to   :: Rep a -> a   is the counit

  replaceOwnParameters :: f -> ParamStream f
  -- type ParamStream a = State [Parameter] a

The use of generics allows automatic generation of the flattenParameters function for any type that is an instance of Generic and avoids the need to write boilerplate code for each type.

The class and instance derivation will derive them for tensors, containers of tensors, other types that build on tensors, and so on.

Example: Linear Regression

I rewrote the example to use my own RST monad, which can conveniently handle the stateful and reader-like computations.

module Main where

import Control.Monad.RST
import Control.Monad
import Torch

groundTruth :: Tensor -> Tensor
groundTruth t = squeezeAll $ matmul t a + b
  -- the squeezeAll removes redundant dimensions after a contraction of tensors
  where a = asTensor [1,2,3 :: Float]
        b = full' [1] (5 :: Float)

linearModel
  :: Linear -- ^ represents a linear layer, implemented Parameterized
  -> Tensor -> Tensor
linearModel a x = squeezeAll $ linear a x

randnM' :: (Monad m) => [Int] -> RST '[] '[Generator] m Tensor
randnM' dims = do
  gen <- get
  let (t, gen') = randn' dims gen
  put gen'
  return t

runStepM :: (Parameterized model, Optimizer optim)
  => Loss -> RST '[LearningRate] '[model, optim] IO ()
runStepM loss = do
  model <- getsE EZ
  optim <- getsE (ES EZ)
  learn <- queriesE EZ
  (model', optim') <- liftIO $ runStep model optim loss learn
  putsE EZ model'
  putsE (ES EZ) optim'

train :: RST '[LearningRate] '[Linear, GD, Generator, Int] IO ()
train = do
  model <- getsE EZ
  input <- embedRST $ randnM' [5, 3]
  count <- modifyThenGet @Int (+1)
  let loss = mseLoss (groundTruth input) (linearModel model input)
  when (count `mod` 100 == 0) $ liftIO $ putStrLn $ "train Loss:" <> show loss
  embedRST $ runStepM @Linear @GD loss

main :: IO ()
main = do
  initModel <- sample $ LinearSpec { in_features = 3, out_features = 1 }
  randGen <- mkGenerator (Device CPU 0) 99
  let learningRate = 5e-3 :: LearningRate
  (_, model' :* _) <- runRST (replicateM 2000 train) (learningRate :* Nil) (initModel :* GD :* randGen :* 0 :* Nil)
  print model'