Friday, October 2, 2009

Data quality definitions: fit for purpose?

As data quality professionals, some of us spend far too much time philosophising, particularly about how to define the term "Data quality".

Some professionals, particularly those in a business environment, define data quality as data which is fit for purpose. To me, far from clarifying, this definition throws up far too many new questions. Fit in what way, and for which purpose? Fit for my purpose or for his? Or both? Fit for the purpose I have now or those I may have in the future?

I don't like chain definitions, phrases that becomed defined by new phrases which themselves have to be defined - for example data quality = information quality = fit for purpose = ... This simply obfuscates the issues and moves us away from their core.

I also think we should avoid trying to attempt to bring definitions under umbrella terms when we are blessed with thousands of languages, each containing thousands of words, which can be used to define each issue. Instead of

"This data has no quality, because it doesn't help me do what I want to do"

wouldn't it be great if we said:

"The way this data has been provided to me is not fit for the purpose of calling all our customers as the telephone area code is not shown on the interface/printout"

without feeling we needed to park this under one or other defining phrase or buzz word?

After a couple of decades of intensive work with data, I firmly believe that data quality is an inherent property of the data itself and is not definable by what can be achieved with that data. But while I juggle with this issue in my head, I am open to other input. For me, data has quality if it is a true representation of the real world constructs to which it refers; being accurate, relevant, complete and up-to-date.

To me, if your data fulfils those criteria, there's nothing that can't be done with it and it could, if used properly, be fit for each and every purpose. In all my years working with data I've not found a case when this was not true.

Do YOU know of a case, real or imaginary, in which data that is accurate, relevant, complete and up-to-date would not be fit for each and every purpose? Note: we're talking about the data here, not information. If the data is not represented on your screen with the telephone area code, that's an information quality problem; but if the data is complete, relevant and up-to-date, the data will include the telephone area code which could therefore be used to make the information fit for purpose.

I'd love to hear of any examples! Leave a comment!

4 comments:

Anonymous said...

Graham,

First of all, thanks for starting this important discussion.

I agree that some of us data quality professionals spend too much time philosophizing – myself included, of
course since I currently have 29 posts on my blog tagged as Philosophy:

Data Quality Philosophy from the OCDQ Blog

I have blogged about my philosophical view that data and information are interrelated entities forming a single continuum. I use the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g. customers, vendors, suppliers).

Although a common definition for data quality is fitness for the purpose of use, the common challenge is that data has multiple uses – each with its own fitness requirements. Viewing each intended use as the information that is derived from data, I define information as data in use or data in action.

Data quality standards provide a highest common denominator to be used by all business units throughout the enterprise as an objective data foundation for their operational, tactical, and strategic initiatives. Starting from this foundation, information quality standards are customized to meet the subjective needs of each business unit and initiative. This approach leverages a consistent enterprise understanding of data while also providing the information necessary for day-to-day operations.

I believe that data's quality must be objectively measured separate from its many uses and that information's quality can only be subjectively measured according to its specific use.

Therefore, within this context, and to answer your question – if data has quality (i.e. is accurate, relevant, complete and up-to-date), then it is fit to serve as the basis for each and every purpose.

Best Regards…

Jim

Unknown said...

Graham - this is a great post, so thanks!

I agree w/ everything you said, but I can think of one little addition that helps us define "fit for purpose." In short, to answer your Q: I can't think of an example where accurate data shouldn't be good for any purpose.

I've seen many different applications that each require a slightly different set of attributes about an entity to enable their processing requirements. For example, an order processing service might only require a unique Customer ID, the Customer Name, the Billing Address, and the Delivery Address, whereas a product registration service would only require the Customer ID, the Customer Name, and the Customer Email Address. So the data in the Customer Master Data Service are unique, accurate, consistent, complete, etc. (of high quality), but not all data are required for all transactions. So they are "fitted" for a specific transactional purpose. Does that align with what it means to be "fit for purpose?"

Another use case I've seen is that a particular app requires historical data on an object that may not be accurate, up-to-date, consistent, etc. but is still "fit" for that specific purpose. Example is for an inquiry on an historical order that was incorrectly sent to a low-quality shipping address. The current data is the correct (high quality) address and the historical address is bogus (sorry, am from California), but the historical address is still valid data - just not current and just not high quality, but still purposeful...

Does this make sense?

Cheers!

Marty

Graham said...

Thanks Jim and Marty for your valuable input.

One thing I'd like to add is that any definition for data quality needs to be generic and not based on business/enterprise requirements. We often think of it in that way, but there is a huge amount of data washing around outside companies, not used for the creation of profit, and that needs quality too.

Ken said...

Graham,

You have started a very important debate - thank you. I agree with you - the quality of underlying data should be independent of the uses to which the data is put, and should be capable of supporting all uses.

One of the greatest challenges we face as Data Quality professionals is selling this message.

I believe we need to develop an armoury of case studies, analogies, ROI case studies etc. to help us make this case.

Simple analogies can help, such as: When building a wall, it is not necessary for a bricklayer to make his own bricks, since he knows that bricks are pre-cast to defined standards, to meet his needs.

Data are the bricks on which Businesses depend, for CRM, BI, AML, BASEL II, etc. etc. (Note, AML and BASEL II are examples of regulatory compliance requirements that could not have been considered when the original systems were being designed)

If the data fails to meet "standard" data quality dimensions, the applications built on the data will fail to meet user expectations.

Rgds Ken