Deconstructing Deep Learning + δeviations
Drop me an email
Format :
Date | Title
TL; DR
Using the library functions which we defined till now to run a simple Neural Network.
Now that we defined everything that a simple NN needs we will use the same functions from the DL library in Julia. (FluxML). The rule is to only use things we defined previously. Oh also I found out this really awesome interactive alternative to Jupyter notebook today -> Pluto.jl and I am in love. It auto updates outputs if you change functions and everything. Wow!!!!
Okay let us get to it. First let us import what we need. The main library is Flux. And from it we import the MNIST dataset and also other functionality that we will need and I will relate to what we did so far.
using Flux, Flux.Data.MNIST, Statistics
using Flux: onehotbatch, onecold, crossentropy, throttle
using Base.Iterators: repeated, partition
Let us load the data and perform the train test split. Since the labels are categorical, we one hot encode them as well. And we push them to their respective partitions while we are at it. The other thing which we have not done so far is GPU implementation (only done this for Convs). But well that comes in time.
imgs = MNIST.images()
labels = onehotbatch(MNIST.labels(), 0:9)
train = gpu.([(cat(float.(imgs[i])..., dims = 4), labels[:,i])
for i in partition(1:60_000, 32)]);
tX = cat(float.(MNIST.images(:test)[1:1000])..., dims = 4) |> gpu;
tY = onehotbatch(MNIST.labels(:test)[1:1000], 0:9) |> gpu;
Now to define our stupidly simple architecture. Dense == Linear layer which we defined. Everything else (Conv, maxpool, softmax) we defined before so we are allowed to use.
m = Chain(
Conv((3, 3), 1=>32, relu),
Conv((3, 3), 32=>32, relu),
x -> maxpool(x, (2,2)),
Conv((3, 3), 32=>16, relu),
x -> maxpool(x, (2,2)),
Conv((3, 3), 16=>10, relu),
x -> reshape(x, :, size(x, 4)),
Dense(90, 10), softmax) |> gpu
Let us also define the loss function and accuracy metric while we are at it. The loss function is crossentropy and the accuracy is basically the average of the number of predictions (m(x)) that are equal to the original labels in the test set (onecold(y)).
loss(x, y) = crossentropy(m(x), y)
accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))
Since we only did SGD till now (booo I wanted to use ADAM), let us just use that. We also train this for 10 epochs.
evalcb = throttle(() -> @show(accuracy(tX, tY)), 10)
opt = Descent()
Flux.train!(loss, params(m), train, opt, cb = evalcb)
So we get around 96.7% accuracy in a few epochs. It is MNIST but anyway yay!
Ah I really love the syntax. It is so clean and nice. Guess it is a little hard to get used to though but oh well. Its nice to see we already built so much. (Theres more but baby steps).
I am so enjoying this game. Build everything from scratch and when you are done then you can use the existing libraries. That way you get the best of both worlds. Also not giving up on Python but using it for other stuff to keep me motivated is just nice.