Tuesday, September 20, 2011

The Kaleidoscope of Data and Information

John Owens , whom I much admire, made this comment to a Henrik Liliendahl Sørensen blog post about the difference between data quality and information quality. Taken completely out of context, he said:

“…it is the quality of the information (i.e. data in a context) that is the real issue, rather than the items of data themselves. The data items might well be correct, but in the wrong context, thus negating the overall quality of the information, which is what the enterprises uses. It will be interesting to see how long it is before data quality industry arrives at this conclusion. But, if they ever do, who will be courageous enough to say so?”

I agree entirely, yet disagree profoundly. Data and information are not the same thing yet are inextricably linked – one without the other isn’t possible but they still must not be confused. Data and information are as different as chickens and eggs, but are equally dependent upon each other.

Basically, data is stored information whilst information is perceived data.
Data and information are immutably linked – I have never found data which isn’t stored information nor information that isn’t rooted in data – but as they are different parts of a cycle they need to be defined, understood and managed as two separate entities. The challenge with data is keeping it complete, accurate and consistent. The challenge with information is to perceive the information that the data is a stored version of without alteration and in a way that gives clarity. It is at the information stage that we should be thinking about fitness for purpose, not at the data stage.

Let me give you an example from a recent episode of the BBC’s science program Bang Goes the Theory. A presenter went to a shopping centre and prepared two plates of bacon sandwiches. One was accompanied with the message that regularly eating processed meats increases the chances of getting bowel cancer by 20%. The other was accompanied by the message that regularly eating processed meats increases the chances of getting bowel cancer from 5% to 6%. Though the data underlying both pieces of information is identical, as is the information provided, the audience were understandably worried when seeing the first message but happy to tuck in after seeing the second.  The first message would be fit for the purposes of the health authority, the second for the bacon marketing board, but in neither case is the fitness for purposes related to the data - it is related to the information provision.

It is at the points where information becomes data and data becomes information that the potential for corruption and misunderstanding of the data and its perception are at their highest. We also know that once data is stored, inert though it may appear to be, it cannot be ignored as the real world entities to which the data refers may change, and that change needs to be processed to update the data.
Those of a certain age may remember having kaleidoscopes as children. Tubes of tin or cardboard with a clear bottom in which there were chips of coloured glass or plastic, a section of which could be viewed and with mirrors creating a symmetrical pattern from that section. Move the kaleidoscope and patterns form, patterns which change and are always different, though the coloured chips themselves never change their inherent properties when being viewed. Whether your data is a shopping list or a data warehouse containing hundreds of tables and millions of record, working with data and information is much like looking through the kaleidoscope.
Depending on how we view we tend to see something different every time we look. Reports, dashboards, views, queries, forms, software, hardware, your cultural background and the way your brain is wired will all alter the perception of the data for us and thus have an enormous influence on the information we’re receiving from the data.
Like a kaleidoscope we tend to extrapolate what we see to the whole universe. If a report shows a positive result in one part of the operation, the tendency is to assume this result is valid throughout. In these examples square green chips represent accurate data whilst red or other shapes is errant data.
Both human nature and data and information systems tend to filter out the negative and boost the positive, so often data looks better than it really is, and so then is the information derived from it.
But sometimes the data looks entirely bad, though it is not so. The way we look dictates the apparent quality of the data.
Yet data has tangible and innate qualities, its accuracy, completeness and consistency, which together are an indication of its quality. And any data which has these qualities provides a foundation for better information quality because the perversion caused by the view of the data is ameliorated. In these examples the data has been made consistent and accurate – the coloured chips have the same colour and shape.
And regardless how we view that data, we see green square data. Data quality and information quality are different and yet rooted in each other. Data cannot be good if it represents the information that it is the stored version of incorrectly. Information cannot be good if it is based on incomplete, inaccurate and inconsistent data.

Data quality ensures that the data represents its real world information entity accurately, completely and consistently. Information quality is working to ensure that the context in which the data is presented provides a realistic picture of the original information that the data is representing.

There’s a general feeling that only data which has a purpose should be stored. I would not agree as purpose, as with so much, depends on context and our viewpoint. Data which has no purpose now may be required to fulfil an information requirement in the future or be related to occurrences in the past; whilst for the people who are being paid to manage data, whether it is used or not, finds his or her salary is very meaningful!

Ultimately data is used to source information, and information quality is important. But we should not confuse the differences between data quality and information quality. Both are essential, and they are separate disciplines.