I spent a few hours tonight learning the basics of image processing. In particular, I wanted to understand convolution, that is, a techique for applying various types of filters to images.
I’m not going to cover all the nitty gritty mathematical details of convolution here, a good intro can be found here, from Cornell University. Here I’m going to focus on providing an overview.
The basic idea for convolution is that you define a filter, a ‘kernel’, typically a 3 x 3 matrix, and the values of the 9 elements of that kernel are used as a sliding filter over the image you want to process. Then, you apply that kernel filter to each pixel of your image, where the kernel, in order to define a new value for each pixel of your image, takes into account the values of the neighboring elements, by multiplying each element of the kernel with the corresponding image pixel value, and summing those products. This sum becomes the new value for the pixel currently being processed.
A trivial kernel is the ‘identity kernel’, i.e a 3 x 3 matrix with zeros in every position except the middle (1,1). Applying this kernel to a window of the image will result in an unchanged pixel value.
Another trivial kernel is the ‘shifted identity’ kernel. For instance, if we want to shift the image one pixel left, we would employ a kernel with all zeros, exept the (1,0) position.
When playing a bit with various shifted identity kernels, I noticed something odd: for sure, the image shifted to left, but when doing a manual check for what the new matrix value should be, I noticed that instead of shifting left, the image should have shifted right.
After some consultation on Stackoverflow, I learned that the convention for adding up the cells is somewhat strange. So I decided to implement my own convolve-function, which would offset the ‘strange’ element mapping of the ‘official’ convolve-algoritm.
It didn’t take long before I realized that in order to get my ‘poor-mans-convolve’ to produce identical results as the official one, I’d have to transform my kernels by two reflexions, one lateral, one vertical. Having done so, my convolve produced identical results as the official one.
However, when applying my convolve-function with identical filters to an image, the results were far from identical:
Top left is the original image. After that (going left-to-right) comes 6 images transformed by scipy’s convolve function. After those six, comes six images transformed with my own convolve function. except for the blur filter (where the kernel is all one’s), there is not much similarity between each pair of transformed images….
I spent some time checking and double-checking the outcomes from my convolve function, comparing to Scipy’s, and the results were always identical.
Almost having given up, after some more or less ‘creative’ (or crazy) attempts to figure out what was going on, I noticed that my convolve function occasionally produced negative values for the transformed pixels. Hmm, as far as I know, pixel values should reside in [0..255]…
So, my first attempt to fix the problem was to do an abs() on all negative values. That changed the look of the images, but not to being identical to those of Scipy’s convolve function.
Then I for whatever reason applied the modulo operation to the pixelvalues and voila’: all of sudden the transformations done with my home grown convolve function are identical to those of Scipy’s convolve:
A priori, I had no theory or hyphothesis on why the modulo operator would fix the problem, and even now I can’t figure out why I decided to use it. Basically, the last few hours were spent on ‘just trying different things’. Fundamentally, the trouble shooting was not guided by any analytical approach, but by a gut feeling. For sure, the reason that gut feeling popped up in my mind (or gut) has probably a lot to do with long time of experience of programming, debugging and general trouble shooting, but since I have no real previous experience of image processing, my mind was not capable of helping out. Instead, it was the gut.
Moral of the story: with enough experience in the general domain, and a bit of stubbornness and creative thought, you can fix problems even though you are not fully aware of all the nitty gritty detail.
import matplotlib.pyplot as plt import numpy as np import matplotlib.image as mpimg import scipy.ndimage.filters as filter def normalize(matrix): sum = np.sum(matrix) if sum > 0.: return matrix / sum else: return matrix def neighbors(r,c,supermatrix): m = supermatrix[r:r+3,c:c+3] return m def convolve(n,kernel): sum = 0 for (rr,cc),value in np.ndenumerate(n): sum += n[rr,cc] * kernel[rr,cc] return sum % 255 def poor_mans_convolve(matrix,super,kernel,shape): result = np.ndarray(shape,dtype=np.float) for (r,c),value in np.ndenumerate(matrix): n = neighbors(r,c,super) result[r,c] = convolve(n,kernel) return result fig = plt.figure(figsize=(14, 6.5), dpi=100) kernel_edge_detect1 = np.array([[1.,0.,-1.], [0.,0.,0.], [-1.,0.,1.]]) kernel_edge_detect2 = np.array([[0.,1.,0.], [1.,-4.,1.], [0.,1.,0.]]) kernel_edge_detect3 = np.array([[-1.,-1.,-1.], [-1.,8.,-1.], [-1.,-1.,-1.]]) kernel_sharpen = np.array([[0.,-1.,0.], [-1.,5.,-1.], [0.,-1.,0.]]) kernel_sharpen2 = np.array([[-1.,-1.,-1.], [-1.,9.,-1.], [-1.,-1.,-1.]]) kernel_blur = np.array([[1.,1.,1.], [1.,1.,1.], [1.,1.,1.]]) kernel_list = [kernel_edge_detect1,kernel_edge_detect2,kernel_edge_detect3,kernel_sharpen,kernel_sharpen2,kernel_blur] title_list = ['edge-detect1','edge-detect2','edge-detect3','sharpen1','sharpen2','blur'] image=mpimg.imread('portrait.jpg')[:,:,0] shape = image.shape newimage = np.ndarray(shape,dtype=np.float) poorimage = np.ndarray(shape,dtype=np.float) supershape = (shape + 2,shape + 2) supermatrix = np.zeros(supershape,dtype=np.float) supermatrix[1:-1,1:-1] = image imagelist_std_convolve =  imagelist_poor_convolve =  fig.add_subplot(4,4,1) plt.title('Original image') plt.imshow(image,cmap=plt.cm.gray) plt.axis('off') for i in range(len(kernel_list)): kernel_list[i] = normalize(kernel_list[i]) newimage = filter.convolve(image,kernel_list[i],mode='constant', cval=0) imagelist_std_convolve.append(newimage) print 'kernel' print kernel_list[i] print fig.add_subplot(4,4,i+2) plt.title(title_list[i]) plt.imshow(newimage,cmap=plt.cm.gray) plt.axis('off') for i in range(len(kernel_list)): flipped_kernel = kernel_list[i].copy() flipped_kernel = np.fliplr(flipped_kernel) flipped_kernel = np.flipud(flipped_kernel) flipped_kernel = normalize(flipped_kernel) poorimage=poor_mans_convolve(image,supermatrix,flipped_kernel,shape) imagelist_poor_convolve.append(poorimage) print 'flipped_kernel' print flipped_kernel print fig.add_subplot(4,4,i+2+6) plt.title('poor mans ' + title_list[i]) plt.imshow(poorimage,cmap=plt.cm.gray) plt.axis('off') plt.tight_layout() plt.show()