This is the third installment of my notes on using Linux as a desktop OS. You can catch up in the Desktop Linux category if you’re interested in the background.
Real work kind of got in the way on Friday, and I found myself trying to do some things with OpenOffice on this new Linux box. Basically, the problem was this: I wanted to compare my school district’s per-pupil expenditures for 2006 with those of the other schools in the county. I also wanted to compare our district’s performance index with those same schools.
As a quick aside, Ohio schools have something called a performance index. It’s a number which quantifies how “good” a school is based on the students’ performance on achievement tests. Schools with a high performance index (like ours) can brag about how great they are. Schools with lower performance indexes can point out that it’s not necessarily the only, or even the most effective, way to measure a public school. But I digress….
All of the data I needed is on the Ohio Department of Education web site. They have some reasonably useful tools for extracting all kinds of data about Ohio’s schools. It only took a few minutes to generate a report that has the per pupil expenditures and performance indexes for all of the school districts.
The problem is that this file didn’t have any county information. So I couldn’t just select the ones I needed. I managed to find a separate report that listed all of the school districts with their counties, but this one also had the private schools in it. Without a one-to-one correspondance between the two data sets, the easiest way to combine this data is with a lookup.
No problem. I’ve been down this road before. In Excel, I would use “=vlookup(A1,'[somefile]!somesheet’!$A$1:$C:850,3,0).” Basically, take a look at the value in cell A1. Compare it to the values in the first column of this range on another spreadsheet. If you find it, return the value on the same line from the third column.
I know, I’m getting a little complicated, here. But these are the kinds of things our kids should be able to do.
The first problem I had was that the formula syntax was different. I sort of expected this. While there are functions to do the same things in Microsoft Office and OpenOffice, they’re going to work a little differently. I was really annoyed that OpenOffice uses a semicolon as a parameter delimiter instead of a comma.
I know. I’m whining. But I’ve been using parameter lists since 1988. Everybody uses commas. I’m pretty sure I’ve never done any programming in any language that didn’t use commas, with the exception of shell scripts and DOS batch files, which use spaces. And if I try to open an Excel file, it replaces all the commas with semicolons and just works. So nothing is broken in old files. But it’s still annoying.
The second problem was that the IRNs were in different formats. The IRN is a six-digit numeric code used to uniquely identify each educational institution in Ohio. This was the primary key I was using for the vlookup. I was using two CSV files generated using the same tool on ODE’s web site. In one, OpenOffice treated the IRN as a number. In the other, it was text. Since you can’t compare numbers and text, I wasn’t getting any matches. I worked around this by creating a new column and adding zero to the IRN. This forced it to be a numeric, and I could compare them again.
The next problem was that I couldn’t do lookups across multiple files. There weren’t any errors, I just didn’t get any results when I tried to do it this way. Copying the data from the two spreadsheets into two different sheets in the same file fixed the problem. So now, I was just doing a vlookup from one sheet to another in the same document. That worked well.
So, I got all of the data in one place. Next, I just needed to filter the data, so I only see my county. That was easy enough to do, and it worked pretty much just like Excel. My standard procedure at this point is to copy the filtered data and paste the values into a new sheet. Ultimately, this is the data I want. I don’t care about the formulas and data sources used to get it. I don’t want to worry about links to other sheets or what happens when I re-sort the data, or how things break if I change filter settings. I just want the data itself. In Excel, I do this by copying the data I want and then pasting the values into a new sheet. Paste special is a wonderful thing, and I have to remember to blog about it sometime.
In OpenOffice, there’s no way to do this. Sure, the paste special does have a lot of other neat features that aren’t in Excel, but this isn’t one of them. I worked around it by copying the data, pasting it into a text file, and then copying it back out and pasting it into a new spreadsheet.
After a few simple calculations, I had the data I needed. The whole process took about an hour. Using Excel, it would have taken about 20 minutes. If I do this type of thing once a week (a conservative estimate), that’s a TSPY of -34.67. Granted, I’ll get more efficient at this. Different isn’t necessarily better or worse, it’s just different. But it is a big price to pay for not using Microsoft Office.
By the way, our district spent $9862 per student last year. The district ranked 27th among the 31 school districts in Cuyahoga County, spending 16% less than the county average of $11738. At the same time, the district’s performance index is 106.9. This is the third highest in the county, and 12% above the average of 95.8.
I share your pain. A few years ago I was doing a web app that used Microsoft SQL Server. I really didn’t expect to find much difference between it an MySQL but I still remember having an issue with vendor provided commands, trying to pull the current date from the server. At the time it was totally different in each server. SQL actually provides these functions but most vendors ignore them for some reason, and have thier own methods. Very frustrating.