Sunday, May 31, 2009

What about the rest of the data?

I'm currently reading a most worthy book about data quality, and, like most other books about data quality I've come across, its gaze is fixed completely on data held in large corporate entities. Large companies where data is amassed across myriad systems, current and legacy; where there is a separate IT department; where there are enough staff to create a data quality working group; where money is no object when it comes to tools to assess, process and cleanse that data.

But is that really where most of the world's data is held? Obviously it's where we most come in contact with it - when a large utility messes up its invoicing procedure, we know about it very quickly - but I would guess that more data is held in small spreadsheets and databases and documents in small- and medium-sized companies than is held in large corporations. I'm obviously not typical, but my own databases hold around 40 million records. In these small companies, there is unlikely to be distinct marketing or IT departments, no budget for data tools, not enough staff to create DQ teams.

Has anybody ever estimated how much data is held outside large corporations, or written about how they will go about improving their data quality? It must be the sun, as this is playing on my mind ...

1 comment:

Henrik Liliendahl said...

Your right. The real world is crowded with organisations where bureaucratic data owner, steward and custodian hierarchies, comprehensive data governance policies and excessive technology implementation makes no sense (and ROI).

Nevertheless many modest organisations do store – and also in my experience use – huge amounts of data. So we need more agile methods and tools to cover the need of these organisations.