As data quality professionals, some of us spend far too much time philosophising, particularly about how to define the term "Data quality".
Some professionals, particularly those in a business environment, define data quality as data which is fit for purpose. To me, far from clarifying, this definition throws up far too many new questions. Fit in what way, and for which purpose? Fit for my purpose or for his? Or both? Fit for the purpose I have now or those I may have in the future?
I don't like chain definitions, phrases that becomed defined by new phrases which themselves have to be defined - for example data quality = information quality = fit for purpose = ... This simply obfuscates the issues and moves us away from their core.
I also think we should avoid trying to attempt to bring definitions under umbrella terms when we are blessed with thousands of languages, each containing thousands of words, which can be used to define each issue. Instead of
"This data has no quality, because it doesn't help me do what I want to do" wouldn't it be great if we said:
"The way this data has been provided to me is not fit for the purpose of calling all our customers as the telephone area code is not shown on the interface/printout" without feeling we needed to park this under one or other defining phrase or buzz word?
After a couple of decades of intensive work with data, I firmly believe that data quality is an inherent property of the data itself and is not definable by what can be achieved with that data. But while I juggle with this issue in my head, I am open to other input. For me, data has quality if it is a true representation of the real world constructs to which it refers; being accurate, relevant, complete and up-to-date.
To me, if your data fulfils those criteria, there's nothing that can't be done with it and it could, if used properly, be fit for each and every purpose. In all my years working with data I've not found a case when this was not true.
Do YOU know of a case, real or imaginary, in which data that is accurate, relevant, complete and up-to-date would not be fit for each and every purpose? Note: we're talking about the data here, not information. If the data is not represented on your screen with the telephone area code, that's an information quality problem; but if the data is complete, relevant and up-to-date, the data will include the telephone area code which could therefore be used to make the information fit for purpose.
I'd love to hear of any examples! Leave a comment!