If you are in a hurry:

Say you have two arrays of numbers: \(I\) is the image and \(g\) is what we call the convolution kernel. They might look like^{1}

\[I=
\left( \begin{array}{ccc}
255 & 7 & 3 \\
212 & 240 & 4 \\
218 & 216 & 230
\end{array}\right)
\]
and
\[g=
\left( \begin{array}{cc}
-1 & 1
\end{array}\right).
\]
We define their convolution as^{2}
\[
I' = \sum_{u,v}{I(x-u, y-v)\; g(u,v)}.
\]

It means that you overlay at each position \((x, y)\) of \(I\) a mirror image of \(g\) looking *backwards*, so that its bottom right element is over the position of \(I\) you are considering; then you multiply overlapping numbers and put the sum of the results in the position \((x, y)\) of \(I'\). In the above example,
\[
\left( \begin{array}{cccc}
1\cdot 0 & \mathbf{-1\cdot 255} & 7 & 3 \\
& 212 & 240 & 4 \\
& 218 & 216 & 230
\end{array}\right)
\rightarrow
\left( \begin{array}{ccc}
\mathbf{-255} & 7 & 3 \\
212 & 240 & 4 \\
218 & 216 & 230
\end{array}\right),
\]
then
\[
\left( \begin{array}{cccc}
& 1\cdot 255 & \mathbf{-1 \cdot 7} & 3 \\
& 212 & 240 & 4 \\
& 218 & 216 & 230
\end{array}\right)
\rightarrow
\left( \begin{array}{ccc}
-255 & \mathbf{248} & 3 \\
212 & 240 & 4 \\
218 & 216 & 230
\end{array}\right),
\]
\[
\left( \begin{array}{cccc}
& 255 & 1 \cdot 7 & \mathbf{-1 \cdot 3} \\
& 212 & 240 & 4 \\
& 218 & 216 & 230
\end{array}\right)
\rightarrow
\left( \begin{array}{ccc}
-255 & 248 & \mathbf{4} \\
212 & 240 & 4 \\
218 & 216 & 230
\end{array}\right),
\]
and then on the next row,
\[
\left( \begin{array}{cccc}
& 255 & 1 \cdot 7 & 3 \\
1\cdot 0 & \mathbf{-1 \cdot 212} & 240 & 4 \\
& 218 & 216 & 230
\end{array}\right)
\rightarrow
\left( \begin{array}{ccc}
-255 & 248 & 4 \\
\mathbf{-212} & 240 & 4 \\
218 & 216 & 230
\end{array}\right).
\]
Note that we assume that the non-existing left neighbors of the first column are zero.

If \(g\) was two-dimensional, like \[g= \left( \begin{array}{cc} -1 & 1 \\ 2 & 3 \end{array}\right), \] we would mirror it in the two dimensions before overlaying, \[ \left( \begin{array}{cccc} 3 \cdot 0 & 2 \cdot 0 & & \\ 1\cdot 0 & \mathbf{-1\cdot 255} & 7 & 3 \\ & 212 & 240 & 4 \\ & 218 & 216 & 230 \end{array}\right). \]

There are three main packages you want to have around in Python for this kind of task:

- PIL, the Python Imaging Library.
- Numpy, numeric Python for efficient array manipulation. It is part of
- SciPy, scientific tools for Python.

If you are working in OS-X you probably only have Numpy around. The easiest way to install them all (and then some) is to download and install the wonderful Sage package. Once you have it you’ll be able to run a Python interpreter with all the scientific tools available by typing

sage -python

in your terminal. In OS-X `sage`

will most likely end up as `/Applications/sage/sage`

, so you might need to modify your PATH.

For 2D convolutions you want the `convolve`

function in the `scipy.signal`

package, as in^{3}:

from scipy import signal as sg print sg.convolve([[255, 7, 3], [212, 240, 4], [218, 216, 230]], [[1, -1]], "valid") # gives [[-248 -4] [ 28 -236] [ -2 14]]

The “valid” last argument is telling `convolve`

not to pad the image with zeros in order to be able to compute a value for each pixel, but to limit itself to the pixels where all valid neighbors are present.

This functions will take care of reading and writing files in a sensible way:

from PIL import Image import numpy as np def np_from_img(fname): return np.asarray(Image.open(fname), dtype=np.float32) def save_as_img(ar, fname): Image.fromarray(ar.round().astype(np.uint8)).save(fname) def norm(ar): return 255.*np.absolute(ar)/np.max(ar)

We make sure that the image we read is stored as an array of floats, and then we normalize and round before saving back as 8-bit integers.

This is all you need to start playing.

A convolution with a matrix \((1 -1)\) will find vertical edges, and a convolution with \[ \left( \begin{array}{c} 1 \\ -1 \end{array}\right) \] will find horizontal edges. Starting, for example, with

after doing

save_as_img(norm(sg.convolve(img, [[1.], [-1.]])), 'img/portal-h.png') save_as_img(norm(sg.convolve(img, [[1., -1.]])), 'img/portal-v.png')

we extract the horizontal and vertical edges,

Having the horizontal and the vertical edges we can easily combine them, for example by computing the length of the vector they would form on any given point, as in: \[ E = \sqrt{I_h^2 + I_v^2}. \] Doing this in Python is a bit tricky, because convolution has changed the size of the images. We need to be careful about how we combine them. One way to do it is to first define a function that takes two arrays and chops them off as required, so that they end up having the same size:

def common_size(a1, a2): """Chop-off the first rows and cols from the two numpy arrays a1 and a2 so that they end up having the same size. >>> print common_size(np.array([[0, 0], ... [1, 2], ... [3, 4]]), ... np.array([[0, 5, 6], ... [0, 7, 8]])) (array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]])) """ (r1, c1) = a1.shape (r2, c2) = a2.shape return (a1[r1-r2 if r1>r2 else 0:, c1-c2 if c1>c2 else 0:], a2[r2-r1 if r2>r1 else 0::, c2-c1 if c2>c1 else 0:])

And now we are ready to write the gradient function:

def gradient(im): imv, imh = common_size(sg.convolve(im, [[1., -1.]]), sg.convolve(im, [[1.], [-1.]])) return np.sqrt(np.power(imv, 2)+np.power(imh, 2))

Applying it to the image above we get

Which is, I think, kind of neat for such a small amount of code.

My first formal introduction to convolutions was in 1998, when I took an “Introduction to the Fourier tranform and its applications” class from Stanford (EE261). Every week the nice people at Stanford sent to Barcelona a VHS tape with a recording of the week’s lessons, and my friend Gonzalo and I watched it. At the end they sent us the exam in an envelope, and we took it. It was not cheap, but HP paid for it. It was eye-opening; very seldom had I been exposed to teaching of the quality of what was coming in the videotapes.

It turns out that what looks like the same class is now available for free online, and I believe also in iTunes University. If you have any interest at all in the subject matter you should take it. The book that we followed, Bracewell’s The Fourier Tranform and its Applications^{4}, is a jewell and, from what I see, priced as such and out of print. I also own Two Dimensional Imaging, by the same author, but I didn’t enjoy it nearly as much. I haven’t read Fourier Analysis and Imaging, also by Bracewell, but from the references I’ve found online looks like one I’ll want to read. Another excellent book with very clear explanations is Peter Kraniauskas’ Transforms in Signals and Systems. It is probably a symptom of how old I am that this one is also out of print, and second-hand copies are rather expensive.

There are plenty of good resources available online for free, so you probably don’t need to buy anything. The one book I have found useful is the Python Essential Reference, by David M. Beazley: it is surprisingly easy to find in it what you are looking for.

This example comes from the quiz 16-20 of Stanford’s Artificial Intelligence class.

This is the conventional definition of convolution, and the one that Professor Thrun gives in the class. But what he actually applies looks more like \(I' = \sum_{u,v}{I(x+u, y+v)\, g(u,v)}\).

Note how I mirror-image the \(g\) matrix to produce the result Prof. Thrun expects.

Disclaimer: I do get a cut from your Amazon purchase. Thank you very much for your support.

blog comments powered by Disqus