Just as an example, I present some function that I translated to Cython. I also show how big the change in speed of my application was. The function is a part of an image processing application and it analyzes a given image on pixel by pixel basis. There are four nested loops! in this function. For this reason the function is very time consuming. The original code was:

`def calculateVar(self):`

` N=self.N;`

I=self.ima

[ans,xoffs,yoffs,dists]=self.getSearchRegion()

noOfAngles=int(ans.shape[0])

self.ROTs=zeros(xoffs.shape)

diffMtx_size=N*N/self._jump/self._jump

diffMtx=zeros(diffMtx_size,dtype=int)

for ai in range(noOfAngles):

for offi in range(xoffs.shape[1]+0):

diffMtx.fill(0)

ind=0;

xoff=xoffs[ai,offi]

yoff=-yoffs[ai,offi]

for y1 in range(0,I.shape[0],self._jump):

for x1 in range(0,I.shape[1],self._jump):

x2=x1+xoff

y2=y1+yoff

if x2>=N or y2>=N or x2<0 ind="ind+1" diffmtx2="diffMtx[diffMtx">0]

self.ROTs[ai,offi]=diffMtx2.var()

This function was change to the following one:

`def calculateVar(self):`

N=self.N;

I=self.ima

I2=array(I,dtype=int)

[ans,xoffs,yoffs,dists]=self.getSearchRegion()

**#this is part in Cython!**

self.ROTs=loopcore.loopcore(I2,ans.shape[0],xoffs.shape[1],

array(xoffs,dtype=int),array(yoffs,dtype=int),

N, self._jump)

where loopcore is a Cython module loopcore.pyx as follows:

`import numpy as np`

cimport numpy as np

DTYPE = np.int

ctypedef np.int_t DTYPE_t

ctypedef np.float_t DTYPE_t2

cdef inline int int_abs(int a, int b): return abs(a-b)

def loopcore(np.ndarray[DTYPE_t, ndim=2] I,int noOfAngles,

int noOfpixels, np.ndarray[DTYPE_t, ndim=2] xoffs,

np.ndarray[DTYPE_t, ndim=2] yoffs,

int N, int jump ):

cdef int y1,x1,x2, y2, ind,ai,offi, xoff,yoff,array_size

array_size=N*N/jump/jump

cdef np.ndarray[DTYPE_t, ndim=1] p= np.zeros(array_size, dtype=DTYPE)

cdef np.ndarray[DTYPE_t, ndim=1] p2= np.zeros(0, dtype=DTYPE)

cdef np.ndarray[DTYPE_t2, ndim=2] ROTs= np.zeros([noOfAngles,noOfpixels], dtype=np.float)

for ai in range(noOfAngles):

for offi in range(noOfpixels):

p.fill(0)

ind=0

xoff=xoffs[ai,offi]

yoff=-yoffs[ai,offi]

for y1 in range(0,N,jump):

for x1 in range(0,N,jump):

x2=x1+xoff

y2=y1+yoff

if x2>=N or y2>=N or x2<0 ind="ind+1" p2="p[p">0]

ROTs[ai,offi]=p2.var()

return ROTs

The gain in speed was huge. Execution of this function in Python takes about 2.10 min, while using Cython it takes about 0.05 min, i.e. code in Cython is

**40+**times faster than that in Python. I'm sure that it can be made even faster than that!Note 1

I noticed that the more variables is defined (cdef, int, float, ...) the greater the gain in speed is achieved.

Note 2

In Ubuntu 8.04 and 8.10 I compiled the Cython pyx files without any problems using the following command:

` cython loopcore.pyx`

` gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.5 -o loopcore.so loopcore.c`

Note 3

When running any python code from this post, remember to correct code indentations.

benduozhongsheng

ReplyDeletemean54

mud212

mum66

norm63