Monday, October 18, 2010

Robert the Carrot

A friend told me an anecdote from the time he was working in a Chinese restaurant. A customer, called Robert, wanted to get a tattoo of his name in Chinese and so asked one of the Chinese staff to write down his nick name - Robbo - as a pictogram.

There's no Western "r" sound in Chinese, and the staff member subtly altered the pronunciation of Robbo to something closer to Lobbo - which means "carrot" in Chinese. The pictogram was duly drawn, and Robbo happily went off to get himself tattooed large as being "Carrot".

This resonated with my data quality genes in two ways. The first is what I call the Chinese Whisper effect. You know that game - one person whispers a word or phrase into the next's ear, and so on down the line, and what comes out at the end is often completely at odds with how it started. Data quality is like that - at every interface between information and data and between data systems, the quality of data goes a little more awry.

The second has to do with ignorance. Most organisations think their data is great simply because they don't understand it, and that's especially the case with names and addresses. If you can't see the problem, or, indeed, see that there is a problem, you can't correct it.

Robbo lives in ignorance about his tattoo and is probably still mighty proud about it. He may get a few sniggers if he ever goes to China, but that's about it. Unfortunately, data quality issues arising from processes which work like this can be much more dangerous.

Sunday, October 10, 2010

Today we welcome ....

.... five new countries and territories, and we say goodbye to one. The net number of countries and territories in the world without a postal code system increases by four.

Did you know? Had you noticed? When I asked two days ago at a speech at Post*Expo in Copenhagen, none of the 50 or so participants admitted to having any clue about it.

Today we're waving goodbye to The Netherlands Antilles. We're welcoming Curaçao and Sint Maarten as largely independent territories within the Kingdom of The Netherlands (as Aruba now is); and Bonaire, Sint Eustatius and Saba as special municipalities within The Netherlands.

How many weeks, months, years or decades will it take before organisations reflect these changes within their databases, processes and customer-facing systems? After all, Saint Martin and Saint Barthélemy (2007) are still largely unknown and unused; Serbia and Montenegro are still far too often lumped together, though they split in 2006; and some organisations still have Yugoslavia as a country, though that died a death in 2003.

Manage your own country list

Twice in the past couple of weeks, when pointing at errors in country lists, organisations have let me know that they will be "looking for a new source" for that list. Far too many organisations use incomplete and unsuitable lists provided by organisations such as the World Bank, United Nations or the ISO. These organisations have their own reasons and imperatives for creating and maintaining lists, which will not be the same as yours, and they must adjust lists to political pressures which rarely reflect reality on the ground.

If you need to keep your country list up to date, and you do, then manage your own. Any country or territory which has a de facto existence needs to be on your list. Though Guadeloupe is part of France, it's geographical location means that it needs to be listed separately to ensure correct address management. Saba may become part of The Netherlands but it won't use the same postal code system or, indeed the same currency. Kosovo must be on your lists because of the linguistic, cultural and addressing differences, regardless of how you stand on its relationship to Serbia.

If you can't rely on the list you're using now, use your own. I'm looking forward to Sint Eustatius and its new neighbours appearing on your website form dropdown very soon.

Tuesday, September 28, 2010

Avis - not trying hard enough

Before you clock off and look elsewhere, this is not going to be a rant about how bad Avis customer service is - it's an illustration of how broken processes (in this case of information exchange and actions related to that information) can affect a company's bottom line.

I was in Croatia in September and booked a car from Avis for a trip into the mountains. I chose Avis because their location was close to where I was staying. Come the day of the collection, we went to the address given (at the Sheraton Hotel) but couldn't find Avis. At the reception we asked and they informed us that Avis had moved to a location a few kilometres away on 1st July. They were turning away "tens" of customers every day, who were all coming to search for Avis.

It took us a while to get to the new location, and when I vented my frustration to the staff there they just shrugged their shoulders. "We've e-mailed head office - what more can we do?". They weren't bothered about the "tens" of customers they were losing.

Back in The Netherlands I searched the global Avis website for a complaint form. No, they don't have that - they are clearly confident of the quality of their services - so I sent "feedback". After some days the reply came back that complaints had to be dealt with in the country of my residence, The Netherlands, even though they have no responsibility for the website. After some weeks I called Avis in The Netherlands (on a premium rate number - this is me trying to help this company to get back its customers by spending my own money) and found that the complaint was being processed - by sending it on to the office in Croatia. Even if Croatia did nothing (and what could they do except send another e-mail to head office?) the Dutch office would then pass this information on to me, and nothing would have actually happened except that the time of a lot of employees and a customer had been wasted. When I explained the problem to the customer service representative at the Dutch office, she admitted it took many months to get any changes made to the website at all.

In the meantime the website is still sending people to the wrong place. If we reckon on Avis losing 10 customers per day because of this I'm reckoning that, as of today, that's 900 customers.

Avis seems to think it's big enough and profitable enough to carry this sort of loss. A slight tweak in one of its processes (which would cost nothing - in fact, it would save money because staff wouldn't have to field complaints like mine) would have a huge effect on its bottom line.

Time to try harder, Avis?

Friday, September 10, 2010

The dangers of obscenity tagging ...

I was pulled up sharp by this report about an XBox user whose account was suspended because he used "offensive language" in his details, the offence being that he lived in a city in the USA called Fort Gay.

I'm continually astonished by the stilted sensibilities of some Anglo-Saxon communities (I have never come across policies banning "offensive language" in systems for any other language) - like most of my fellow Europeans I can't imagine why the word "gay" should be deemed offensive. But this incident highlights the problematic use of obscenity lists which cause problems all the time for large numbers of innocent Internet users.

Gay, for example, is a common given and family name in the English-speaking world, and there are many streets and places named after those people. There are many people with the surnames Duck or Mouse, and some are called Donald or Michael; and they have a constant battle to achieve anything online. The good burgers of Scunthorpe in England have a very trying time with web sites and spam filters (think about it); and imagine the issues that the inhabitants of Dildo in Newfoundland, Condom in France and Fucking in Austria have.

If you do feel the need to check for obscenities, it needs to be done in a more knowledgeable, culturally-aware way than is currently the case.

Wednesday, September 8, 2010

How long is YOUR street name field?

I was walking down the main street of Bihać, Bosnia, as one does, and I noted the street name:



You'd think this was long enough for a street name, but the sign actually contains abbreviations. To write this street name in full would take 89 characters. It could be correctly abbreviated to 20 characters, but would you know how? In how many ways might this street name be written? I may be more than a little weird, but I love this kind of diversity in addressing!

Friday, July 16, 2010

Definition drift

A number of posts recently have drawn my attention again to the persistent problem of definitions and of definition drift. We are rarely able to agree a definition of any word or phrase in the data quality world before a new buzz term comes along to grab our attention. A great deal of this is due to fashion and marketing. Software and solution providers are constantly searching for new terms to launch in order to "persuade" (I'm being nice here) executives that they need to upgrade their current installations. Try as I might, for example, I cannot find a definition for Master Data Management that does not tally with what I understand to be good old plain data management, something we've been doing for years. Mark Goloboy noticed how some people are trying to replace the term "data quality" with "information quality". Though it is often not obvious to information workers, there is a huge difference between information and data (as us data workers know).

Altering terms in this way because of marketing, fashion or internal political needs is pernicious and does little to help data quality or, ultimately, your customers, for whom you should be working to improve your management of their data. The definition saga led me to start to build my own glossary of terms and to place that online for all to enjoy, in the hope that we can arrest some of the worst excesses before they take off. A recent post by Jim Harris exposed again how definitions can affect our working practices, and I felt the need to expand on his post and try to clarify my own thinking on data quality, what it is, what contributes to it, and how it affects information quality.

Data quality tends to have three main definitions: fitness for purpose (we'll come onto that later); data accurately representing the real world entity to which is refers (my own preference); and data being complete, current, consistent and accurate (or relevant, up to date and accurate; or complete, valid, consistent and timely; or accurate, correct, timely, complete, and relevant; or any of a number of similar properties...).

Without wishing to write a three-volume novel about this, let's have a look and see how some of these parameters do affect data quality, starting with continuing the discussion from Jim's post:



Validity versus Accuracy

Validity is that a piece of data satisfies a rule relating to the data itself. Accuracy indicates that the valid data applies to the entity for/about which the data is being gathered and stored. For example, US, CA, DE, and RU are all valid ISO 3166-2 country codes. XX is not. None of these are valid country codes for the country in which I live - in that case, only NL would be an accurate code.

1st January 1833 is a valid (Gregorian) date. It is a valid date for the date of birth of a human being. It is (currently) not a valid date as a date of birth for a human being still alive. 1st January 1961 is a valid data of birth for a human being still alive, but it is not an accurate data of birth if it were to be applied to me.

For these reasons, dashboards and data profiling tools need to be used with caution. They can check that every country code or date within a data file is valid, but they cannot check their accuracy.

Currency versus Timeliness versus Up to Date versus Of Its Time

Yes, even these terms vary in their definitions. For the most part Timeliness is regarded as a processing aspect, where data is made available to the worker at the time it is required - not a data quality issue, in my opinion. Currency is often regarded as synonymous with up-to-date, but data which is up to date (i.e. valid now) is not necessarily fit for a purpose. If your purpose is to know what I bought from your online shop in 2003, you'll need data from that time and of that time, rather than data from this year. This is also a definition of currency, but it does need to be mentioned. For me, this is all part of the accuracy and completeness of data - if I move then the address you may have for me is valid (the building still exists), the data is valid of its time (I did live there when you added it to your database), but I don't live there now, so it's not current and it's not accurate because the house it there but the entity for/about which you're collecting the data (me) isn't there any more.



Consistency and fitness for purpose

Consistency is NOT a pre-condition for data quality. If your database contains the information that I live in The Netherlands in a variety of forms (NL, NLD, The Netherlands, Nederland, Holland, Pays Bas, Niederlande) then the data is accurate, though represented in any number of ways. The data has quality but it is difficult to work with and process, and is therefore not fit for purpose. Counts to find the number of customers in The Netherlands will produce poor information leading to poor business decisions. Data which is made consistent can be used, and is therefore fit, for any purpose. If I know all records for entities within The Netherlands use the code NL, then I can print "The Netherlands" onto an envelope being sent from the USA, or "Niederlande" if it's from Germany, and that data need not be stored anywhere - it is derived from consistent data. I reject the definition of data quality being fitness for purpose - fitness for purpose is a consequence of data quality, not a definition of it.

OK, so I've tossed this off on a Friday morning, and may have missed some logical connections. What do you think? Please join the debate - I'll update this entry with any nuggets that are suggested.

Tuesday, July 6, 2010

Prevention or cure?


I was looking through a pile of my old and dusty university essays a few weeks ago, nicely typed on a 40-year old manual typewriter (at that time the university's only computer was in a huge, well guarded room and the only way we were allowed to interact with it was with punch cards ...) and I found an essay with this title:

"Regional Water Authorities in Britain are Dominated by Engineers Trained Largely to Solve Water Supply Problems by Constructing New Facilities Rather Than by Minimising the Need for Them" (D.J. Parker and E.C. Penning-Rowsell). Discuss."

I had discussed, as directed, and agreed: instead of tempering our profligate and ever-increasing use of water, we just kept tapping into new resources to increase supply.

It wasn't just water that had this issue. And little has changed over the past 30 years.

Looking around, you see this pattern almost everywhere. Health services, for example, spend a little on prevention and a huge amount on curing. Police try to solve crimes after they occur but rarely attempt to prevent the crimes from occurring (and in many countries they are not allowed so to do). In fact, our whole society is based on the use/consume/experience now and the resolve/cure/clean up later paradigm.

So it's hardly surprising that businesses work the same way when it comes to data quality. Like the water authorities, they are dominated by people who are trained (and indoctrinated) to resolve problems as they arise rather than to prevent the problems from arising; and when they see the problems they envisage only technical solutions without any consideration for process or business structure changes. The bigger, more expensive and flashier the product, the more likely it is to be bought, regardless of its effectiveness at reducing the problem.

Shifting spending to prevention will reduce spending on the cure, and we'd be healthier for it - stopping us from taking up smoking will always be better than treating us for lung cancer. There'll always be a need to cures - like our bodies, data decays and we have to work on it to keep it healthy - but prevention works better. And is cheaper.

Thursday, June 3, 2010

Vodafone, Samsung and a company in Almere ...

Vodafone

Oh lovely - an e-mail from Vodafone offering me lots of discount if I extend my current mobile contract with them. "Just click here to see the goodies".

Right on time - my current 'phone had just thrown a wobbly. So I click on the link, log in, and ... nothing. "You cannot extend your contract. Call this number". Like the good automaton that I am, I call the number. "Oh yes, you can extend your contract, no problem. If you'd like to see the 'phones, pop down to your local Vodafone shop"

So I pop down and choose a 'phone just a little shinier and more sparkly than the previous one. And the nice gentleman says "But your current contract only runs out in November. You're not allowed to renew yet".

So I explain about the e-mail, he calls, somebody tweaks the system, and I renew.

And no criticism of the flunkies at Vodafone - "service" is not something the Dutch understand well - they don't even have a word for it (they use the English "service"), and they tend to equate "service" with "servitude", which doesn't sit well in this egalitarian society) But these flunkies were fine and resolved the issue without any fuss. What I do wonder about, though, is why Vodafone's data systems aren't talking to each other. The e-mail system and the contact centre system think one thing, the website and the shop system think something else entirely.

A company in Almere

So I get my nice, new, shiny and sparkly 'phone home, switch it on and (you'll like this Daragh) find that it may be shiny, but it's not new. 40 contact names, e-mails (about taxi expenses), voice messages, notes ... even pictures of the baby.

Looks like the 'phone had been tested and returned,but returned without any attempt being made to delete the data on it. I could even contact the company's e-mail server and check out the rest of their mails. I wasn't tempted (I may have been put off by the baby pictures, truth be told), but a message about data not leaving the company hasn't seeped through to all staff yet.

Samsung

So I take it back and get a new 'phone. Only this one is missing it's little pen ...

OK, this cost me not a lot (a few telephone calls and some cycling around Amsterdam, which is good for my condition). It costs Vodafone more, I reckon. And as for that company in Almere ...

Wednesday, May 12, 2010

I have a dream ...

I have a dream. My dream is that one day those overpaid nincompoops who run many of our companies and organisations wake up to the importance of data, and start working with it accordingly.

If you're not persuaded of the importance of data, try imagining your organisation functioning without data (or its cousin, information, which is usually rooted in data) and see how far you get. No e-mails, no internet, no customer orders, no invoices. No telephone calls, no meetings, no discussions with colleagues, not even to discuss the weather, unless you're one of the very few organisations which is not affected by the weather (really, you'd be surprised).

How long would that situation be able to last? Minutes?

Why can't people understand the importance of data and its quality? Why don't we treat it in the same way that we treat other parts of our business? The very idea of an airline only maintaining its fleet when something went wrong with it would horrify all of us, but that's what we do with data. Few of us do not realise how preventing tooth decay not only saves us costly treatment and potentially a great deal of pain, but leaves us with far better teeth than any dentists ministrations could produce on badly maintained teeth. (Read Jim Harris' blog post on that topic here.)

So why do we wait until the CEO is told that $ 1 billion PROFIT was made instead of the actual $1 billion LOSS, with the resultant chaos, before we take data seriously? Clearly, unmaintained airlines falling from the sky make a greater immediate impact than data quality wrecks, but the results can be equally pernicious. Why must so many people waste so many hours trying to prove return on investment (ROI), when ANY and ALL data quality improvements are beneficial - I am yet to be persuaded that there is no return on any investment (in one form or another) on every improvement of data quality. Sadly, most businesses make money DESPITE their data quality, not because of it. (See Henrik Liliendahl Sørensen's post showing how simple it can be to show ROI here).



I have a dream of a revolution in data quality, where resources and focus are built into the prevention of data quality problems, rather than on trying to resolve them only when their detrimental effect becomes obvious; where as much control is put into data as is put into production, maintenance, finance, human resources and other aspects of organisations.

I have a dream. How long must it remain a dream?

Tuesday, May 4, 2010

IAIDQ Blog Carnival, April 2010



iaidq blog carnival 2010
I've dusted down the blog today to host April's IAIDQ blog carnival for information/data quality bloggers, a look at some of the month's best blog posts.

In keeping with this blog's focus, I've decided concentrate on posts about data quality (as a data issue) rather than on business or other practices, or on personnel issues; so I've largely bypassed posts about persuading executives to invest in improved data quality, data quality tools within businesses, return on investment and the like, though this is no reflection on the quality of those posts.



We can start with Daragh O Brien, a man who rarely utters a word I don't agree with, and he utters them always with great aplomb. His recounting of the difficulties of matching and moving his contact data from 'phone to 'phone in his post Do we have an App for that? shows well how real people have to grapple with data quality issues on a daily basis.

Also on the theme of data everywhere in our environment, and certainly not just within businesses, is the good Jim Harris' post Data, Data Everywhere, But Where Is Data Quality?. Jim, an obsessive compulsive blogger and independent consultant, speaker and writer looks at the avalanche of data we contend with daily, why its quality matters, and how we need to manage it.

I can't let the carnival go on without a mention of Dylan Jones, editor of Data Quality Pro and a prime mover in getting the importance of data quality recognised. Dylan's post is an Expert Interview with Jill Wanless (author of the Data Quality from the Ground Up blog). So I get to mention two data quality scions in a single paragraph.



I won't generally eat anything more exotic than a chicken and mushroom pie, and you'd have to tie me up and use a cattle prod to get me anywhere near IKEA, but Henrik Liliendahl Sørensen has continued his posts about data diversity with Data Quality and World Food. Henrik's work with, amongst others, Omikron, has given him a good understanding of the importance of understanding global diversity, and he blogs about it regularly.



Finally I'd like to make an honorable mention of Julian Schwarzenbach's final entry in his series The Data Zoo - How data personalities interact. Though I'm breaking my rule here of avoiding blog entries which revolve around data quality within businesses rather than as something generic, Julian's work in sifting and identifying the personalities involved in data quality work is a remarkable series, though I'm stuck with the feeling that I actually belong in each one of the nine categories identified, which is a trifle worrying ...

Apologies again to the writers of the excellent blog entries I had to exclude from this carnival, and I'm looking forward to next month's batch already.

Tuesday, March 23, 2010

Data Quality is a DATA issue

In the blogosphere, on Twitter and at conferences I often come across mentions and discussions of whether data quality is a business issue or whether it is a technical/technological issue. It's normally an either/or question, with no other options considered.

And when I come across such discussions and statements I stick my fingers in my ears, sing la la la and try to think happy thoughts whilst I click on to something more to my liking. But sometimes I feel an overwhelming need to comment ...

Not that data quality CAN'T be a business issue - it can ... in businesses; and it can certainly be a technical issue. What gets me is the tunnel vision that surrounds this discussion. One would think that the only place that data exists is in businesses (and large ones at that), and that no data quality professionals existed outside them; and that data only exists on computers.

We are surrounded by data (and their cousins information and intelligence). It is in businesses but it is also to be found and used in huge quantities outside them in government, public utilities, health services. It is found in large businesses but is also to be found in huge quantities in small businesses where there is no talk of warehousing, OLAP, management buy-in or any other expression we can think of.

When a patient goes into surgery, for example, the purpose of data quality is not to make money but firstly to prevent a death (for example by transfusing blood of the wrong type) and secondly to achieve a health improvement to the patient.

Obviously, for those wrestling with data quality and company politics within large corporations every day data quality can appear to be a business issue. But we need to have a much more generic outlook with data quality.

Maybe we just need to be more careful in our use of language. "Poor data quality has an effect on the economics of a business": absolutely!

Data quality CAN be a business issue.
Data quality CAN be a technical issue.
Data quality CAN be a customer issue.
Data quality CAN be a health issue.

Data quality is ALWAYS a DATA issue.

And that's how I think we should regard it.

Tuesday, January 19, 2010

A true story of how data quality issues can cripple a business.

I've just been told about a data quality issue which may not have cost millions to resolve, but illustrates very well the effect poor data quality (and lack of information quality) can have at every level.

Call 1: A café reports to its head office that its bank card payment terminal has stopped working. Head office assumes (assumption 1) that it is a technical issue and instructs the café (call 2) to call in the technical support from the equipment provider (call 3).

The equipment provider sends a technician (visit 1), finds no technical faults and assumes (assumption 2) that the problem lies with the telecommunications provider. They leave behind a hefty bill for the visit. Café calls (call 4) the telecommunications company, who check their systems and find no fault, and therefore assume (assumption 3) that the equipment manufacturer is at fault. Call 5 to the equipment manufacturer, who point the finger back to the telecommunications provider; call 6 to the telecommunications provider who .... well, you're getting the picture.

For 6 months this situation remained unresolved. Calls were made, technicians sent, invoices sent. More importantly, the café was losing customers that didn't have the cash to pay and wanted to use their bank cards. Trying to resolve the situation was costing their staff time and money and their good humour.

A new round of calls and eventually somebody at head office actually checked - and found that the bill had not been paid. The equipment was working again within 24 hours, but the costs of this situation were crippling for the café concerned.

Not only do we see how dangerous it is to make assumptions, but here also there's a clear information quality problem in the support department of the equipment manufacturers. When their technical service was called they could check on technical aspects of the equipment installed, but there was no link with the financial system, so nobody could tell the café that the switch had been thrown because the bill went unpaid.

Sometimes it doesn't matter how good your data is - you have to actually go and look at it and make sure you use it properly.