Saturday, June 09, 2007

Ruby: reading Microsoft/OpenOffice spreadsheets

Reading of Excel files (*.xls) can be done using roo package for Ruby.One function that I write for myself was function that reads Excel file (fragment seen on the below picture) to a Hash table, where first column (image file name) is a key. Moreover, a key is only base name of the file,i.e., no extension. The line #(1) is very interesting, as I found out that roo reads x-rays in a way that each character is separated by '0'. Thus, to get correct String, it is necessary to remove zeros from the string. This is I think due to encoding shame. Excel uses Unicode 16, instead of ASCI; hence, each character is represented by two bites: firt byte is a character, and the second byte is '0'. Hence, to read the file name to ASCI, it is necessary to skip ever second byte. It must be done, because otherwise gsub does not want to work.




def readExcelToHash f
oo = Excel.new(f)
oo.default_sheet = 1
xcl=Hash.new
0.upto(oo.last_row) { |line|
r=oo.row(line)
next if r[0].to_s.length==0
rs= String.new r[0].to_s
a=String.new
rs.each_byte{|x| a+= x.chr if x!=0} #(1)
next if rs =~ /^#/
xcl[a.gsub(/\.tif+$/,'')]=r
}
xcl
end

As far as creating spreadsheets is concerned, roo does not provide this functionality.

No comments:

Post a Comment