Sunday, March 20, 2011

Are technology advances damaging your data quality?



Way back on a Friday evening in 1989 I took home a computer, a software package called Foxpro and a manual. By Monday morning I had written a package to manage the employees and productivity of the telephone interviewing department I was running at the time.

I've been using Foxpro ever since. It's fast, powerful, easy to use and it has unparalleled string manipulation capabilities. Unfortunately, Microsoft have stopped developing the product, and will stop supporting it in a few years time, so I recently started looking for a good replacement.

At first I thought I wasn't getting it. I checked site after site, product after product, expert after expert, and instead of finding products which were easier to use: more accessible, more flexible, more agile, more data-centric, I found products which were technically challenging: over-complex, cumbersome, which put wall after wall between me and my data, which required reams of coding to do the simplest action like add a field, remove a field, expand a field, and so on. And most of which were priced to suit the budgets of companies with turnovers matching the GDPs of a large African country. Yes, there are some applications (very few) that try to make process easier, but they are primitive and clunky.

Most are based on SQL (that's a query language, ladies and gentlemen - the clue's in the name - and really very difficult to use to do any type of string processing) and based on client-server setups that required a high level of technical knowledge. You can get a record and display a record and store the modifications (what SQL was designed to do), but if you want to do more than that it gets tough.

I tried to work with some of them and just couldn't get my head around them. Never mind a whole application in a weekend - creating a database in a weekend was a challenge. My synapses may not be working at the speeds that they did in 1989, but I'm not in my dotage just yet.

Many support packages - data profilers and so on - have been created to work solely with these high-end packages, even those free and open source variants, cutting out a huge chunk of the data market.

I wasn't getting it. And then I realized I was getting it. This is the state of play in the data industry at the moment. A chasm has grown between easy to use, cheap but less scaleable products (Microsoft Access, Visual Foxpro, FileMaker and so on) and those scaleable but complex (and far too expensive) client-server applications.

So how does this work in practice? Joe from sales wants to create a database of his contacts. He can't manage the complexity of the SQL system in which the company is working so he'd have to send e-mails, set up meetings to request this database, get the IT department to put it into their planning, wait six months and watch his sales plummet. Or he could open Access or Excel and start typing. Guess which is the option most people take?

These systems encourage the creation of data silos (more about those in my next post).

Data quality is adversely affected by these databases also because they put a distance between the user and their data. Being unable to actually browse through the raw data is a handicap in data quality terms. Data which is filtered through systems, even to the extent of any SQL query other than SELECT *, will be less useful and have less quality.

The data software industry needs to take a close look at what they're up to and ask themselves of they should really be producing data products which can only be used and understood by highly trained technical staff, because they're not giving the data quality industry an iota of help. They should open up their programs to those common file formats that they're currently ignoring, such as .dbf. They need to be made easier, more flexible, more agile and a good deal cheaper

As for me, I'm back with Foxpro, and I have decided to stop apologising for using it. It allows me to produce top quality data, and that's what it's all about.

No comments: