Tuesday, March 29, 2011

Data silos - learn to live with them.

Mention data silos (separate data, databases or sets of data files that are not part of an organization's enterprise-wide data administration) and the blood pressure of any self-respecting data quality grafter will rise. Data silos are unconnected to data elsewhere in an organisation and thus are unable to share common types of information, which may create inconsistencies across multiple enterprise applications and may lead to poor business decisions.

One thing, though, that we need to think about: in almost all cases data is in an organisation to mitigate the work that that organisation does. The organisation is not there to improve data quality. My proviso is that every organisation has an obligation to the data owners (usually its customers) to manage that data correctly. Thus any improvement in data quality must serve the organisation and its customers, not the data in and of itself.

Data silos are inevitable, and that will never change. Organisation-wide data management systems are overcomplex and put too great a distance between the staff and the data they need. An employee's main focus is (or should be) to do the work that they are being paid to do as well as they can. If an organisation cannot provide ease of access to the tools that they need (hardly possible unless every member of staff has an accompanying IT employee at their beck and call) then they will reach for the tools that they can use - pen and paper, local spreadsheets, documents and database tables. Few employees (or work groups or departments) are going to wait around and do nothing while an attempt is made to hook their requirements into the cumbersome company database systems.

So whilst a CEO depending on a central database system to provide his or her business intelligence may suffer from data silos (only may, because it depends what is being stored in those silos), the troops doing the real work find having the data they need at their fingertips a distinct advantage.

I've been on both sides on the silos debate. I dug my heals in hard when attempts were made to integrate my data island into the company database because I knew it would not fulfil requirements, it would not enable me to do my job, it would be a waste of money, and I would not use it. I was right on all counts. At the same time I was trying to get the sales staff to give up their data for integration into a central marketing database. I failed for the same reasons (and because the staff were not going to give up the data which gave them the edge on their colleagues, improved their commission and enabled them to spend two weeks every year in the Caribbean as a reward from the company - incentives can be very bad for data quality).

Naturally data silos in certain parts of a company are very damaging, without dispute. You can't have a call centre employee jotting a caller's details onto a piece of paper or into an Excel spreadsheet instead of into their centralised system, for example. But not all data silos are evil.

As Jim Harris points out here, it's not so much having data silos that is the problem: it's what is done with that data. A much more open and accepting attitude to data silos is required, because we may wail and gnash our teeth as much as we like - they're not going to go away.

Sunday, March 20, 2011

Are technology advances damaging your data quality?

Way back on a Friday evening in 1989 I took home a computer, a software package called Foxpro and a manual. By Monday morning I had written a package to manage the employees and productivity of the telephone interviewing department I was running at the time.

I've been using Foxpro ever since. It's fast, powerful, easy to use and it has unparalleled string manipulation capabilities. Unfortunately, Microsoft have stopped developing the product, and will stop supporting it in a few years time, so I recently started looking for a good replacement.

At first I thought I wasn't getting it. I checked site after site, product after product, expert after expert, and instead of finding products which were easier to use: more accessible, more flexible, more agile, more data-centric, I found products which were technically challenging: over-complex, cumbersome, which put wall after wall between me and my data, which required reams of coding to do the simplest action like add a field, remove a field, expand a field, and so on. And most of which were priced to suit the budgets of companies with turnovers matching the GDPs of a large African country. Yes, there are some applications (very few) that try to make process easier, but they are primitive and clunky.

Most are based on SQL (that's a query language, ladies and gentlemen - the clue's in the name - and really very difficult to use to do any type of string processing) and based on client-server setups that required a high level of technical knowledge. You can get a record and display a record and store the modifications (what SQL was designed to do), but if you want to do more than that it gets tough.

I tried to work with some of them and just couldn't get my head around them. Never mind a whole application in a weekend - creating a database in a weekend was a challenge. My synapses may not be working at the speeds that they did in 1989, but I'm not in my dotage just yet.

Many support packages - data profilers and so on - have been created to work solely with these high-end packages, even those free and open source variants, cutting out a huge chunk of the data market.

I wasn't getting it. And then I realized I was getting it. This is the state of play in the data industry at the moment. A chasm has grown between easy to use, cheap but less scaleable products (Microsoft Access, Visual Foxpro, FileMaker and so on) and those scaleable but complex (and far too expensive) client-server applications.

So how does this work in practice? Joe from sales wants to create a database of his contacts. He can't manage the complexity of the SQL system in which the company is working so he'd have to send e-mails, set up meetings to request this database, get the IT department to put it into their planning, wait six months and watch his sales plummet. Or he could open Access or Excel and start typing. Guess which is the option most people take?

These systems encourage the creation of data silos (more about those in my next post).

Data quality is adversely affected by these databases also because they put a distance between the user and their data. Being unable to actually browse through the raw data is a handicap in data quality terms. Data which is filtered through systems, even to the extent of any SQL query other than SELECT *, will be less useful and have less quality.

The data software industry needs to take a close look at what they're up to and ask themselves of they should really be producing data products which can only be used and understood by highly trained technical staff, because they're not giving the data quality industry an iota of help. They should open up their programs to those common file formats that they're currently ignoring, such as .dbf. They need to be made easier, more flexible, more agile and a good deal cheaper

As for me, I'm back with Foxpro, and I have decided to stop apologising for using it. It allows me to produce top quality data, and that's what it's all about.