Tuesday, December 02, 2008

Python: use Cython to make it faster

Just as an example, I present some function that I translated to Cython. I also show how big the change in speed of my application was. The function is a part of an image processing application and it analyzes a given image on pixel by pixel basis. There are four nested loops! in this function. For this reason the function is very time consuming. The original code was:def calculateVar(self):
N=self.N;
I=self.ima

[ans,xoffs,yoffs,dists]=self.getSearchRegion()
noOfAngles=int(ans.shape[0])

self.ROTs=zeros(xoffs.shape)

diffMtx_size=N*N/self._jump/self._jump
diffMtx=zeros(diffMtx_size,dtype=int)

for ai in range(noOfAngles):
for offi in range(xoffs.shape[1]+0):

diffMtx.fill(0)
ind=0;

xoff=xoffs[ai,offi]
yoff=-yoffs[ai,offi]

for y1 in range(0,I.shape[0],self._jump):
for x1 in range(0,I.shape[1],self._jump):
x2=x1+xoff
y2=y1+yoff

if x2>=N or y2>=N or x2<0 ind="ind+1" diffmtx2="diffMtx[diffMtx">0]
self.ROTs[ai,offi]=diffMtx2.var()

This function was change to the following one:def calculateVar(self):
N=self.N;
I=self.ima
I2=array(I,dtype=int)
[ans,xoffs,yoffs,dists]=self.getSearchRegion()

#this is part in Cython!
self.ROTs=loopcore.loopcore(I2,ans.shape[0],xoffs.shape[1],
array(xoffs,dtype=int),array(yoffs,dtype=int),
N, self._jump)

where loopcore is a Cython module loopcore.pyx as follows:import numpy as np
cimport numpy as np

DTYPE = np.int
ctypedef np.int_t DTYPE_t
ctypedef np.float_t DTYPE_t2

cdef inline int int_abs(int a, int b): return abs(a-b)

def loopcore(np.ndarray[DTYPE_t, ndim=2] I,int noOfAngles,
int noOfpixels, np.ndarray[DTYPE_t, ndim=2] xoffs,
np.ndarray[DTYPE_t, ndim=2] yoffs,
int N, int jump ):

cdef int y1,x1,x2, y2, ind,ai,offi, xoff,yoff,array_size

array_size=N*N/jump/jump

cdef np.ndarray[DTYPE_t, ndim=1] p= np.zeros(array_size, dtype=DTYPE)
cdef np.ndarray[DTYPE_t, ndim=1] p2= np.zeros(0, dtype=DTYPE)
cdef np.ndarray[DTYPE_t2, ndim=2] ROTs= np.zeros([noOfAngles,noOfpixels], dtype=np.float)

for ai in range(noOfAngles):
for offi in range(noOfpixels):
p.fill(0)
ind=0
xoff=xoffs[ai,offi]
yoff=-yoffs[ai,offi]
for y1 in range(0,N,jump):
for x1 in range(0,N,jump):
x2=x1+xoff
y2=y1+yoff

if x2>=N or y2>=N or x2<0 ind="ind+1" p2="p[p">0]
ROTs[ai,offi]=p2.var()
return ROTs

The gain in speed was huge. Execution of this function in Python takes about 2.10 min, while using Cython it takes about 0.05 min, i.e. code in Cython is 40+ times faster than that in Python. I'm sure that it can be made even faster than that!


Note 1
I noticed that the more variables is defined (cdef, int, float, ...) the greater the gain in speed is achieved.

Note 2
In Ubuntu 8.04 and 8.10 I compiled the Cython pyx files without any problems using the following command: cython loopcore.pyx gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o loopcore.so loopcore.c
Note 3
When running any python code from this post, remember to correct code indentations.