So far we have explored neural networks almost in the vacuum. Although we have provided some illustrations for better clarity, relying an existing framework would allow us to benefit from the knowledge of previous contributors. One such framework is called Hasktorch. Among the practical reasons to use Hasktorch is relying on a mature Torch Tensor library. Another good reason is strong GPU acceleration, which is necessary for almost any serious deep learning project. Finally, standard interfaces rather than reinventing the wheel will help to reduce the boilerplate.

Fun fact: one of Hasktorch contributors is Adam Paszke, the original author of Pytorch.

**Today's post is also based on**

- Day 2: What Do Hidden Layers Do?
- Day 4: The Importance Of Batch Normalization
- Day 5: Convolutional Neural Networks Tutorial

The source code from this post is available on Github.

## The Basics

The easiest way to start with Hasktorch is via Docker:

```
docker run --gpus all -it --rm -p 8888:8888 \
-v $(pwd):/home/ubuntu/data \
htorch/hasktorch-jupyter:latest-cu11
```

Now, you may open `localhost:8888`

in your browser to access Jupyterlab
notebooks. Note that you need to select `Haskell`

kernel when creating a new notebook.

If you have never used Torch library before, you may also want to review this tutorial.

## MNIST Example

Let's take the familiar MNIST example and see how it can be implemented in Hasktorch.

### Imports

```
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Exception.Safe
( SomeException (..),
try,
)
import Control.Monad ( forM_, when, (<=<) )
import Control.Monad.Cont ( ContT (..) )
import GHC.Generics
import Pipes hiding ( (~>) )
import qualified Pipes.Prelude as P
import Torch
import Torch.Serialize
import Torch.Typed.Vision ( initMnist )
import qualified Torch.Vision as V
import Prelude hiding ( exp )
```

The most notable import is the `Torch`

module itself. There are also related
helpers such `Torch.Vision`

to handle image data. The function `initMnist`

has
type

`initMnist :: String -> IO (MnistData, MnistData)`

The function is loading MNIST train and test datasets, similar to `loadMNIST`

from previous posts.

It might be also useful to pay attention to
`Pipes`

module. It is an
alternative to previously used `Streamly`

, which also allows building
streaming components.

We also import functions from `Control.Monad`

, which are useful for IO
operations.

Finally, we hide `exp`

function in favor of Torch `exp`

, which operates on
tensors (arrays)^{1} rather than floating point scalars:

`Torch.exp :: Tensor -> Tensor`

### Defining Neural Network Architecture

First we define a neural network data structure that contains trained parameters (neural network weights). In the simplest case, it can be a multilayer perceptron (MLP).

```
data MLP = MLP
{ fc1 :: Linear,
fc2 :: Linear,
fc3 :: Linear
}
deriving (Generic, Show, Parameterized)
```

This MLP contains three linear layers. Next, we may define a data structure that specifies the number of neurons in each layer:

```
data MLPSpec = MLPSpec
{ i :: Int,
h1 :: Int,
h2 :: Int,
o :: Int
}
deriving (Show, Eq)
```

Now, we can define a neural network as a function, similar as we did on
Day 5 with a "reversed" composition operator
`(~>)`

.

```
(~>) :: (a -> b) -> (b -> c) -> a -> c
f ~> g = g. f
mlp :: MLP -> Tensor -> Tensor
mlp MLP {..} =
-- Layer 1
linear fc1
~> relu
-- Layer 2
~> linear fc2
~> relu
-- Layer 3
~> linear fc3
~> logSoftmax (Dim 1)
```

We finish by a (log) softmax layer reducing the tensor's dimension 1 (`Dim 1`

).
Derivatives of `linear`

, `relu`

, and `logSoftmax`

are already handled by Torch
library.

### Initial Weights

How do we generate initial random weights? As you may remember from Day 5, we could create a function such as this one:

```
randNetwork = do
let [i, h1, h2, o] = [784, 64, 32, 10]
fc1 <- randLinear (Sz2 i h1)
fc2 <- randLinear (Sz2 h1 h2)
fc3 <- randLinear (Sz2 h2 o)
return $
MLP { fc1 = fc1
, fc2 = fc2
, fc3 = fc3
}
```

In our example we do almost the same, except we benefit from
applicative functors and
`Randomizable`

.

```
instance Randomizable MLPSpec MLP where
sample MLPSpec {..} =
MLP
<$> sample (LinearSpec i h1)
<*> sample (LinearSpec h1 h2)
<*> sample (LinearSpec h2 o)
```

We say above that `MLP`

is an instance of the `Randomizable`

typeclass,
parametrized by `MLPSpec`

. All we needed to define this instance was to
implement a `sample`

function. To generate initial MLP weights, later we can
simply write

```
let spec = MLPSpec 784 64 32 10
net <- sample spec
```

### Train Loop

The core of the neural network training is `trainLoop`

, which enables a single
training "epoch". Let us first inspect its type signature.

`trainLoop :: Optimizer o => MLP -> o -> ListT IO (Tensor, Tensor) -> IO MLP`

This signifies that the function accepts an initial neural network configuration, an optimizer, and a dataset. The optimizer can be a gradient descent (GD), Adam, or other optimizer. The result of the function is a new MLP configuration, as a result of IO call. IO is necessary for instance if we want to print the loss after each iteration. Now, let's take a look at the implementation:

`trainLoop model optimizer = P.foldM step begin done. enumerateData`

First, we enumerate the dataset with `enumerateData`

. Then, we iterate over (fold)
the batches. The `step`

function is an analogy to a step in the gradient descent
algorithm:

```
where
step :: MLP -> ((Tensor, Tensor), Int) -> IO MLP
step model ((input, label), iter) = do
let loss = nllLoss' label $ mlp model input
-- Print loss every 50 batches
when (iter `mod` 50 == 0) $ do
putStrLn $ "Iteration: " ++ show iter ++ " | Loss: " ++ show loss
(newParam, _) <- runStep model optimizer loss 1e-3
return newParam
```

We calculate a
negative log likelihood loss
`nllLoss'`

between the ground truth label and the output
of our MLP. Note that `model`

is the parameter, i.e. weights of the MLP
network. Then, we take advantage of the iteration number `iter`

to print the
loss every 50 iterations. Finally, we perform a gradient descent step using our
optimizer via
`runStep :: ... => model -> optimizer -> Loss -> LearningRate -> IO (model, optimizer)`

and keep only new model `newParam`

. The learning rate here is `1e-3`

, but can
be eventually changed.

The `done`

function is (trivial in this case) finalization of `foldM`

iterations over the MLP model and `begin`

are the initial weights (we use `pure`

to satisfy the type
`m x`

requirement).

```
done = pure
begin = pure model
```

### Putting It All Together

The remaining part is simple. We load the data into batches, specify the number of neurons in our MLP, choose an optimizer, and initialize the random weights.

```
main = do
(trainData, testData) <- initMnist "data"
let trainMnist = V.MNIST {batchSize = 256, mnistData = trainData}
testMnist = V.MNIST {batchSize = 1, mnistData = testData}
spec = MLPSpec 784 64 32 10
optimizer = GD
net <- sample spec
```

Then, we train the network for 5 epochs:

```
net' <- foldLoop net 5 $ \model _ ->
runContT (streamFromMap (datasetOpts 2) trainMnist) $ trainLoop model optimizer. fst
```

Finally, we may examine the model on test images

` forM_ [0 .. 10] $ displayImages net' <=< getItem testMnist`

For this purpose may use a function such as

```
displayImages :: MLP -> (Tensor, Tensor) -> IO ()
displayImages model (testImg, testLabel) = do
V.dispImage testImg
putStrLn $ "Model : " ++ (show. argmax (Dim 1) RemoveDim. exp $ mlp model testImg)
putStrLn $ "Ground Truth : " ++ show testLabel
```

### Running

```
Iteration: 0 | Loss: Tensor Float [] 12.3775
Iteration: 50 | Loss: Tensor Float [] 1.0952
Iteration: 100 | Loss: Tensor Float [] 0.5626
Iteration: 150 | Loss: Tensor Float [] 0.6660
Iteration: 200 | Loss: Tensor Float [] 0.4771
Iteration: 0 | Loss: Tensor Float [] 0.5012
Iteration: 50 | Loss: Tensor Float [] 0.4058
Iteration: 100 | Loss: Tensor Float [] 0.3095
Iteration: 150 | Loss: Tensor Float [] 0.4237
Iteration: 200 | Loss: Tensor Float [] 0.3433
Iteration: 0 | Loss: Tensor Float [] 0.3671
Iteration: 50 | Loss: Tensor Float [] 0.3206
Iteration: 100 | Loss: Tensor Float [] 0.2467
Iteration: 150 | Loss: Tensor Float [] 0.3420
Iteration: 200 | Loss: Tensor Float [] 0.2737
Iteration: 0 | Loss: Tensor Float [] 0.3054
Iteration: 50 | Loss: Tensor Float [] 0.2779
Iteration: 100 | Loss: Tensor Float [] 0.2161
Iteration: 150 | Loss: Tensor Float [] 0.2933
Iteration: 200 | Loss: Tensor Float [] 0.2289
Iteration: 0 | Loss: Tensor Float [] 0.2693
Iteration: 50 | Loss: Tensor Float [] 0.2530
Iteration: 100 | Loss: Tensor Float [] 0.1979
Iteration: 150 | Loss: Tensor Float [] 0.2616
Iteration: 200 | Loss: Tensor Float [] 0.1986
#%%*****
::: %
%:
:%
#:
:%
%.
#=
:%.
=#
Model : Tensor Int64 [1] [ 7]
Ground Truth : Tensor Int64 [1] [ 7]
%%%#
%# %
. #%
:%:
%+
*%
%=
%%
%%%%++%%%=
==%%=.
Model : Tensor Int64 [1] [ 2]
Ground Truth : Tensor Int64 [1] [ 2]
.-
=
%
.#
=:
@
#
++
%:
%
Model : Tensor Int64 [1] [ 1]
Ground Truth : Tensor Int64 [1] [ 1]
%.
*%-
%%%%#
:%%+:%-
%% -%.
% .@+
% %%.
% #%*
%%%%%%
:%%%-
Model : Tensor Int64 [1] [ 0]
Ground Truth : Tensor Int64 [1] [ 0]
= +
% %
+. %
% %:
+ %
%--=*%
:: +%
=%
=%
*
Model : Tensor Int64 [1] [ 4]
Ground Truth : Tensor Int64 [1] [ 4]
%@
@:
=@
@%
@
:@
%#
@
@
+
Model : Tensor Int64 [1] [ 1]
Ground Truth : Tensor Int64 [1] [ 1]
% %
% %
+# -+
+%*::*%
:%==%+
%
++
%
%-+
*
Model : Tensor Int64 [1] [ 4]
Ground Truth : Tensor Int64 [1] [ 4]
+
%%+
.%*%%
-: *%
-#-%%.
%% =#
%
.%
#.
%
Model : Tensor Int64 [1] [ 9]
Ground Truth : Tensor Int64 [1] [ 9]
..=.
.%%%%%%
::%+:
%
%
%=
%%%%%%+
:%%%%
%%%%
%#
Model : Tensor Int64 [1] [ 6]
Ground Truth : Tensor Int64 [1] [ 5]
+%%%#
+%* .%%
:%. .#%+
%@%%%%*
+%-
-%#
%%
%%
%=
@
Model : Tensor Int64 [1] [ 9]
Ground Truth : Tensor Int64 [1] [ 9]
==:
%%**%%
.% %:
*- +#
% :#
# :#
-# +#
-# .%
# +%:
#%%%%=
Model : Tensor Int64 [1] [ 0]
Ground Truth : Tensor Int64 [1] [ 0]
```

See the complete project on Github. For suggestions about the content feel free to open a new issue.

## Summary

Today we have learned the basics of Hasktorch library. The most important is that the principles from our previous days still apply. Therefore, the transition to the new library was quite straightforward. With a few minor changes, this example could be run on a graphics processing unit accelerator.

## Further Reading

Hasktorch:

Docker containers: