Deconstructing Deep Learning + δeviations
Drop me an email
Format :
Date | Title
TL; DR
Here we will talk about convolution and how to implement it from scratch.
Before I begin, let me make something clear. I am trying to go from basic to optimized and so I will start with a naive implementation. And then eventually get to much better code. I could take it directly but I feel like that defeats the whole point of this exercise. And who knows? Maybe I'll find something fun along the way :)
Okay. Now what is a convolution operation? Simply put, it is a way of representing information in a compressed manner. Or in other words, representing complex information in a different way. It is the basic block of Deep learning and the reason it works so well. There are many high level abstractions to the concept and so I want to look at the simplest version of it and then move on to more complex parts.
To be honest, I did not find many proper versions of this. Most of them tend to abstract it way to such a huge extent it is almost pointless to even try.
Note that this is for a 2D image (grayscale lets say)
3x3 Array{Int64,2}:
0 1 0
0 1 0
0 1 0
Sure. Here is a filter and a smol image. (yes it has random values but we will get to applying it to a real image soon)
# Kernel/filter
2×2 Array{Int64,2}:
0 1
2 3
# Image
3×3 Array{Int64,2}:
0 1 2
3 4 5
6 7 8
Now let us take do the first bit of the conv. We have these. If you do not get it, read the steps above once.
0 1
2 3
0 1
3 4
```
Now we multiply them element wise aka in position.
```julia
2×2 Array{Int64,2}:
0x0=0 1x1=1
2x3=6 3x4=12
# or
2×2 Array{Int64,2}:
0 1
6 12
```
Then sum them all up and we get 1+6+12 = 19.
Rinse and repeat and we get.
``` julia
2×2 Array{Float64,2}:
19.0 37.0
25.0 43.0
Fun right? (LOL)
Okay let's get to the code because I am getting bored now.
What do we need? We need to read images, load a test image and be able to display it as we go. We can do this using the Images, TestImages and ImageView packages.
using Images, TestImages,ImageView
img = testimage("mandrill");
We will not touch this image right now but we will create a test image and a kernel.
img = [0 1 2 ; 3 4 5; 6 7 8]
kernel = [0 1 ; 2 3]
Our Image is :
Our kernel is :
Okay let us take stride = 1 and give ourselves the option of setting a kernel and choosing a padding. Valid means no padding and same means that the returned dimensions should be the same as the image aka there will be padding.
We take the sizes of the image and the kernel and store them away before we do anything else.
function conv2d(img, kernel, stride = 1, padding = "valid")
input_h, input_w = size(img)
kernel_h, kernel_w = size(kernel)
To add padding, we use the formulae : The amount of height added $$\frac{kernel_{height} - 1}{2}$$
The width is the same except we use the kernel width.
Now to we create a zero matrix with the size height x width = $$input_{h} + 2 \cdot pad_{h}, input_{w} + 2 \cdot pad_{w}$$ . Why do this? Because allocating memory means faster operations.
Now we loop over the image and add padding.
function conv2d(img, kernel, stride = 1, padding = "valid")
input_h, input_w = size(img)
kernel_h, kernel_w = size(kernel)
if padding == "same"
pad_h = (kernel_h-1) ÷ 2
pad_w = (kernel_w-1) ÷ 2
img_padded = zeros(input_h+(2*pad_h),input_w+(2*pad_w))
for i in 1:input_h , j in 1:input_w
img_padded[i+pad_h, j+pad_w] = img[i,j]
end
elseif padding == "valid"
else
throw(DomainError(padding, "Invalid padding value"))
end
Our padded image is
Now we will follow steps 7 to 9.
result = zeros((input_h- kernel_h) ÷ stride +1 ,(input_w- kernel_w) ÷ stride +1 )
result_h, result_w = size(result)
ih, iw = 1, 1
for i in 1: result_h
for j in 1: result_w
for k in 1:kernel_h
for l in 1:kernel_w
result[i,j] += img[ih+k-1, iw+l-1]*kernel[k,l]
end
end
ih+=stride
end
iw+= stride
ih = 1
end
return result
end
conv2d(img , kernel)
We then get :
How did I save the images?
using Cairo,Gtk
function write_to_png(guidict, filename)
canvas = guidict["gui"]["canvas"]
ctx = getgc(canvas)
Cairo.write_to_png(ctx.surface, filename)
end
tm2 = imshow(conv2d(img , kernel))
write_to_png(tm2, "/home/subhaditya/Desktop/GITHUB/SubhadityaMukherjee.github.io/img/deconstrucImages/temp.png")
function conv2d(img, kernel, stride = 1, padding = "valid")
input_h, input_w = size(img)
kernel_h, kernel_w = size(kernel)
if padding == "same"
pad_h = (kernel_h-1) ÷ 2
pad_w = (kernel_w-1) ÷ 2
img_padded = zeros(input_h+(2*pad_h),input_w+(2*pad_w))
for i in 1:input_h , j in 1:input_w
img_padded[i+pad_h, j+pad_w] = img[i,j]
end
elseif padding == "valid"
else
throw(DomainError(padding, "Invalid padding value"))
end
result = zeros((input_h- kernel_h) ÷ stride +1 ,(input_w- kernel_w) ÷ stride +1 )
result_h, result_w = size(result)
ih, iw = 1, 1
for i in 1: result_h
for j in 1: result_w
for k in 1:kernel_h
for l in 1:kernel_w
result[i,j] += img[ih+k-1, iw+l-1]*kernel[k,l]
end
end
ih+=stride
end
iw+= stride
ih = 1
end
return result
end
Cool! Let us do it for an image. Note that Julia stores images in arrays and since our example only works for 2D matrices, we convert the image into grayscale first and then apply the channelview function.
We are using a really cute mandrill for our example.
img = testimage("mandrill");
kernel_blur = [-2 -1 0; -1 1 1; 0 1 2]
imshow(conv2d(channelview(Gray.(img)),kernel_blur))
We get :
Wow. What happened to the poor thing. Notice the weird numbers? Those are a specific kernel to do this. But why?
Simple. Right now we are doing this by hand. Eventually the network uses this to learn specific features which allows us to do everything computer vision does.
I will probably write another post on filters just because it is fun.