Thursday, December 31, 2009

Dear Mr Other ...

The year is ending and two calendars for 2010 arrive from "data quality" companies, each with strange ideas about my postal address.

The first is from a UK data quality exhibition organiser, who addresses me as:

Mr Other Graham Rhind

I can't imagine what is supposed to have been in the field which output as "Other", and I celebrate my individuality, but not in this particular way.

The address block (for my Dutch address) ends with a British postal code and "GB", struck out by the postal services to allow the mailing to reach me - eventually. I recognise the postal code as that I use when I am provided with a web form which does not allow me to add my Dutch postal code. This company is happy to invite foreigners to its exhibitions, it just won't allow them to register without providing false data.

The second is from a Spanish company, who puts my name below the final address line, guaranteed to delay mail because it's where the sorting machines expect to find the country name or the postal code. I am addressed a "D. Graham Rhind" - that's D for Don - and the postal code line reads:

52000 1018 VV Amsterdam

I guess that this company uses a Spanish CRM system that only allows a Spanish postal code to be entered. To allow my (not at all Spanish) details to be entered they have added the least used Spanish code in the postal code field (for Melilla) and then put my Dutch code into the place name field.

Remember, these are both data quality companies. I can see that I still have plenty of work to do to bring the message of our cultural diversity to all in 2010.

Wednesday, December 2, 2009

Informatica - a step forward in web form quality

Yet another e-mail (from Informatica Netherlands this time) with news of a new white paper. Wearily (why wearily? check this blog entry to find out), I click to check the web form ...

Hey! Hang on a minute. Informatica have actually had the nous (British slang approximately meaning common sense, intelligence ...) to have pre-filled the form with my data (which they indeed already have in their system). And what's this? No state field? And yes, there's Montenegro in the country list, back in all its glory.

Could this be a result in my crusade for better Internet data collection? I gingerly change the country to Canada and yes! The province field appears! Not in a sensible place, unfortunately (if you're going to change fields, don't change them where the customer has already been in the form - the country should be asked beforehand).

But what can I say? RESULT!!

Tuesday, November 24, 2009

B-Eye Network - another web form of shame

Another invitation to download a white paper, this time from the B-Eye Network, another organisation with data quality at its heart - except when it comes to its web form.

What hit me first with this form is that there is no indication of which fields are required, though you can be sure that they are there. In fact, to find out which fields I should fill in (according to B-Eye) I have to fill in the fields I can complete, hit the send button, and then hope for the best (or pray, depending on your personal preference). Only at that point will the form tell you what is required. And what is required shows a grave lack of understanding of the world out there.

I don't have to fill in a state (or a province), and I am very grateful for small mercies. But I do have to fill in a "last" name (that's a culturally loaded field label, by the way), even if I don't have one, and fill in a Zip/Postal Code, again, even if I live in one of the 60 or so countries or territories without one.

And then we look at the country drop down, always very instructive. I'll gloss over the "Falkland Islands (Islas Malvinas)" ... touchy subject ... and land instead on Serbia and Montenegro, a country that hasn't existed since 2006.

I await eagerly the next invitation to download a white paper. Who dares?

Monday, November 23, 2009

Chapeau Talend!

Talend have corrected their web forms! Thanks to them for listening and reacting! Chapeau!

Saturday, November 21, 2009

Broken business processes

On the day that I posted about the poor web form design at Informatica I received an e-mail from Talend (another data quality company) inviting me to download a white paper. Inevitably, this wasn't free - I had to provide them with information. And on their drop down for country name they forgot Faeroe Islands but, more damagingly, included "Yugoslavia". Yugoslavia hasn't existed since 2003, and our memories shouldn't be so short that we forget the bloodshed which accompanied its disintegration.

Look, I know how difficult it is for companies to ensure that everything is known to all people in all departments. Information doesn't flow well - there are barriers everywhere, and though there are people at Talend and Informatica who know better than to make mistakes like this, they can't be everywhere checking everything before it gets posted.

But what really gets me down about these examples is that in both cases the companies concerned contacted me and let it be known that they saw the problems and would correct them. And yet in both cases the forms are still online and are still unchanged.

So how broken do your company processes have to be to allow such obvious embarrassments (people, you purport to be DATA QUALITY companies!) to remain online? What is standing in the way of actually correcting these forms? How many dissatisfied customers do you have to lose before anything changes? How bruised does my forehead have to become from bashing my head against these brick walls?

Somebody took the time to point out the errors. Do yourselves a favour Informatica and Talend - correct your forms!

Friday, October 30, 2009

Informatica and form rage

Dear reader,

you'll know by now that the best way to achieve data quality is not by cleansing data after it has been collected - that's just an expensive way of mitigating the effects of poor data quality. You'll know that preventing data quality issues at source is a more effective and ultimately more cost effective way to manage data quality.

There are hundreds of thousands of companies who continue to attempt reactive data quality cleansing rather than instigating preventative data quality, and that won't change in the short (or even middle) term; but when you're a company working within the data quality sphere you have to be very careful about how you manage your own data quality, because if you don't ne'er do wells such as myself will be quick to pick up on it.

Informatica, a data quality company, posted a white paper - about data quality - here, and included with it a web form guaranteed to collect the worst quality data imaginable. (To me, a white paper is not free if I am expected to provide my information (which has value) in exchange for it - but that's another blog post ...). Now, don't get me wrong - I have nothing again Informatica as a company - they just seem (on this evidence) to have reached the size and structure which has stopped them being able to concentrate on data quality in all parts of their company, and with too many employees not understanding, or being part of, the data quality focus.

A quick look at the form and we can see some of the issues. Though I am allowed to enter data from my address in The Netherlands, I am forced to add an American state (or a Canadian province or territory) with which to pollute Informatica's data. The field labels suggest that my name is written in the same way as that of most Americans, that is given name first, family name last, and if I don't have a family name I'll have to make one up, because I can't leave that empty. I must add a postal code, even if my country doesn't have one, and though they have managed correctly to lose "Serbia and Montenegro" from their country list, they have lost Montenegro in the process.

When I pointed these errors out to Informatica they promised to recreate their forms, and they may be so doing; but it shouldn't take 15 days to stop a web form "State" field being a required field, one of the most obvious and widespread errors any web form can make, and the cause of more form rage than anything else. I hope they manage to get it sorted before next Tuesday, when I am presenting about web forms to the DDMA in Amsterdam - I'd like to be able to show a success story.

So, dear reader, do yourself a favour. Prevent your data becoming polluted at source. Look at your web forms. I'll make it easy for you - download my free e-book "Better data quality from your web form - Effective international name and address Internet data collection" and learn how to avoid those common errors. And I don't even ask you to fill in a form to get it ...

Friday, October 2, 2009

Data quality definitions: fit for purpose?

As data quality professionals, some of us spend far too much time philosophising, particularly about how to define the term "Data quality".

Some professionals, particularly those in a business environment, define data quality as data which is fit for purpose. To me, far from clarifying, this definition throws up far too many new questions. Fit in what way, and for which purpose? Fit for my purpose or for his? Or both? Fit for the purpose I have now or those I may have in the future?

I don't like chain definitions, phrases that becomed defined by new phrases which themselves have to be defined - for example data quality = information quality = fit for purpose = ... This simply obfuscates the issues and moves us away from their core.

I also think we should avoid trying to attempt to bring definitions under umbrella terms when we are blessed with thousands of languages, each containing thousands of words, which can be used to define each issue. Instead of

"This data has no quality, because it doesn't help me do what I want to do"

wouldn't it be great if we said:

"The way this data has been provided to me is not fit for the purpose of calling all our customers as the telephone area code is not shown on the interface/printout"

without feeling we needed to park this under one or other defining phrase or buzz word?

After a couple of decades of intensive work with data, I firmly believe that data quality is an inherent property of the data itself and is not definable by what can be achieved with that data. But while I juggle with this issue in my head, I am open to other input. For me, data has quality if it is a true representation of the real world constructs to which it refers; being accurate, relevant, complete and up-to-date.

To me, if your data fulfils those criteria, there's nothing that can't be done with it and it could, if used properly, be fit for each and every purpose. In all my years working with data I've not found a case when this was not true.

Do YOU know of a case, real or imaginary, in which data that is accurate, relevant, complete and up-to-date would not be fit for each and every purpose? Note: we're talking about the data here, not information. If the data is not represented on your screen with the telephone area code, that's an information quality problem; but if the data is complete, relevant and up-to-date, the data will include the telephone area code which could therefore be used to make the information fit for purpose.

I'd love to hear of any examples! Leave a comment!

Tuesday, September 1, 2009

Data quality tools

I've noticed an increasing tendency, especially in blogs, to use the term "data quality tool" to refer only to a piece of software.

We are doing the cause of data quality no favours at all by continuing the myth that data quality can only be improved through the purchase and use of a piece of software. Almost everything you do and touch within a company can affect data quality, and therefore a tool for improving data quality can range from a pencil and a piece of paper all the way to that Cray supercomputer over there in the corner. In fact, by far the most useful and effective data quality tool is the human brain. If we'd use it a bit more effectively, a great deal of data quality improvement would result!

Saturday, July 11, 2009

Whois needs address validation

A blog post here about a scam where people are sent to a site selling false anti-virus software demonstrated to me how introducing address validation to the Internet Corporation for Assigned Names and Numbers (ICANN) Whois database could help in the fight again such scammers.

The blog documents the websites through which the user is sent by the scam, and shows the whois entry for each of the sites.

Here's the first:

I don't think there is a Booth Street in Edmonton, but the real giveaway is the 6-digit postal code - not even close to a Canadian format.

The second:

Again, not a valid Brazilian postal code format.

The third:

An English-language address, in Melbourne, but in Russia? And with a 5-digit postal code? Shouldn't there be alarm bells going off somewhere about this obvious fakery??

Wouldn't we be better served if ICANN introduced validation, ranging from simple postal code format validation, through "can this address be in that country" validation right down to address-level validation, into their web registration database? I'm not fool enough to think that having to add a real address would stop scammers, but it would slow them down, as they'd have to find somebodys else's real address to add. Furthermore, having a database of only real addresses (i.e. of high data quality) instead of the current hodge-podge based on trust, would enable analysis of the data to improve identification of potential criminal activity. The current whois database may be fit for ICANN's purposes, but data quality it ain't.

Sunday, July 5, 2009

Citroen and their Balkan confusion

I was sent the link here, which shows a scan from a publication which appears to come from the car manufacturers Citroen and to date from 2009.

All looks fine until one looks more closely at the Balkans on this map.

Slovenia has expanded southwards, absorbing Croatia proper (including the capital, Zagreb) and Istria. Croatia has become but a rump of its former self, but has absorbed about half of Bosnia-Hercegovina, which is but a memory, as it's nowhere to be found on this map.

Citroen have succeeded where Slobodan Milosovic failed, and created a greater Serbia, including the Croatian area of Slavonia (and thus giving it a common border with Slovenia), Montenegro, Kosovo and a big chunk of Bosnia. In fact, only Macedonia is shown correctly, and even then the border has been drawn very approximately.

In fact, it looks as though somebody has taken a pre-1991 map showing Yugoslavia, and drawn in some borders where they think they might be, clearly more in hope than in expectation. Is a reminder needed of the number of people who died or were pushed out of their homes to fix the current borders of those countries? Should Citroen be thoroughly ashamed of publishing such a map?

Should road users everywhere be worried that the level of quality control during the manufacture of their cars matches that applied to their publications?

I do wonder how such errors get through the net - are people too lazy to reach for an atlas?

Thursday, June 4, 2009

There goes another one

My initial reaction to hearing that AddressDoctor was being taken over by Informatica was a sinking feeling - another one bites the dust.

Acquisitions of companies specialising in international name and address data are hardly new: Postalsoft (as First Logic) went to Business Objects before that was subsumed into SAP, who also snapped up Fuzzy! Informatik. QAS went to Experian and Global Address to Harte-Hanks.

For me, the best outcome of a takeover is that the acquired company is allowed to continue to run its business as independently of its owner as possible. Small companies tend to be more dynamic and react more quickly to change than their larger rivals. Global name and address data quality is a specialised business, and it requires the freedom to concentrate upon it without worries about the needs of a mother company.

The worst outcome is the absorption of the company into the mother company, so that it becomes a cog within a much larger corporate wheel. In my experience, when this happens the focus is lost, and, I have to say, the quality of the solution suffers. The needs of the larger company and its customers take precedence. I weep when a mature product disappears in this way. Even when it becomes part of the mother company's data quality suite, access to it is lost from the vast majority of small- and medium-sized companies who have the need for data quality but not the budget.

The vacuum created by the disappearance of a good global data quality company is not quickly or easily filled. Global address processing is an extraordinarily complex business. It takes years of knowledge collection, development of algorithms, trial and error before a product even approaches acceptable levels of quality in international data. There are quicker and easier ways to make money in the data world. It is eternally frustrating to witness this cycle of re-inventing the wheel as companies form, develop, are taken over and are absorbed into other companies, effectively disappearing.

I sincerely hope that the independent global data quality company, producing good and affordable solutions, don't become an endangered species.

Sunday, May 31, 2009

What about the rest of the data?

I'm currently reading a most worthy book about data quality, and, like most other books about data quality I've come across, its gaze is fixed completely on data held in large corporate entities. Large companies where data is amassed across myriad systems, current and legacy; where there is a separate IT department; where there are enough staff to create a data quality working group; where money is no object when it comes to tools to assess, process and cleanse that data.

But is that really where most of the world's data is held? Obviously it's where we most come in contact with it - when a large utility messes up its invoicing procedure, we know about it very quickly - but I would guess that more data is held in small spreadsheets and databases and documents in small- and medium-sized companies than is held in large corporations. I'm obviously not typical, but my own databases hold around 40 million records. In these small companies, there is unlikely to be distinct marketing or IT departments, no budget for data tools, not enough staff to create DQ teams.

Has anybody ever estimated how much data is held outside large corporations, or written about how they will go about improving their data quality? It must be the sun, as this is playing on my mind ...

Thursday, May 21, 2009

How Microsoft Access can damage your data quality

OK, so I've bitten the bullet. Much against my better judgement I'm making a concerted effort to learn Microsoft Access 2007, to allow me to produce some data tables with Unicode characters.

Plugging away, I found an interesting aspect of field input masks which is guaranteed to produce data quality issues. When adding a field mask to Access 2007, it "helpfully" provides a number of ready-made options:

Having a UK edition of the software, Microsoft have helpfully provided a telephone number mask and a postal code mask that it thinks covers the UK. Looking at the postal code mask itself:

and we see how it is made up. '0' indicates a required digit, 'L' a required letter, and '>' forces upper case.

Now, in the UK, AA99 9AA is indeed a valid postal code format. It is, however, only one of seven valid formats:

A9 9AA
A99 9AA
AA99 9AA

Thus, whilst this mask can take OX19 6RY, it can't take OX9 6RY or SW1A 4WW or S1 1AA or N45 1AP or .... well, most of them.

Microsoft may feel that they are being helpful by adding this sample mask, but we all know that programmers, like most of us, will take any route that make their life easy, and are unlikely to make any attempt to alter this input mask to make it valid for the UK, let alone valid for postal codes in every other country. And we know that this happens - my God, don't we all suffer regularly at the hands of forms designed like this? Many programmers would not even be aware that the mask is not valid for most UK postal codes - they trust that the software provider has done their homework.

Back to the drawing board, please, Microsoft. This is not helping in the fight for better data quality.

Friday, May 15, 2009


A slight move off topic, if I may. It's that time of the year again when the Eurovision Song Contest takes place, accompanied, as always, by the sounds of weeping, wailing and gnashing of teeth from the losers, who always try to find anything to blame apart from their own performances or the quality of their songs.

One of the most common complaints is the idea of block-voting, accompanied by a rather embarrassing lack of geographical knowledge. This year's Dutch song is doubtless very suitable for, and popular in, the local pubs of The Jordaan in Amsterdam, but is hardly material that will be popular elsewhere in Europe. Inevitably it failed to get through the semi-final. René Froger, one of the Dutch singers, immediately complained that this was due to block voting: "it's so strange that all the Balkan countries are through to the final", he wailed. And worst, no report I've seen pulled him up on it, seeding and strengthening the myth that everybody hates us/loves them.

Grabbing a pencil and a scrap of people, it took me no time at all to work out that six Balkan countries indeed made it to the final. And five did not. Not quite "all", and certainly no evidence of the great conspiracy so many profess to see. He'd have done better to point to the Nordic block, or the Caucasus block, or the Mediterranean block. If anybody should complain, it's the Central Europeans, none of whom got through.

Back to school, René!

For a map of those through, check Wikipedia here

Thursday, May 14, 2009

Flying share

Another great example of how ignorance affects data quality from web forms (and can lose customers and money!) has come my way, this time courtesy of the site Flying Share. Flying Share are offering their users in the US, UK, Canada and Australia, via this form, a free USB drive.

All well and good until you get to the "ZIP" field (ZIP? Not outside the USA, good people of Flying Share - you probably mean "Postal code"). And there you find that the field will accept five characters and no more. So anybody in the UK or Canada wanting a free drive must either give up at this point, or provide a truncated code.

As drives are being sent out postally, there will clearly be a huge problem of undeliverable and returned drives. I know that this error has been pointed out to the company concerned, but no action has yet been taken, though correcting it would take seconds.

I do wonder at what point anybody at Flying Share will feel the need to act on this. Perhaps when the costs of returns reaches astronomical proportions? How many (potential) customers might they have lost before then?

Tuesday, May 5, 2009

I made a blog entry in March about Vietnam, where a postal code exists but without the populace knowing about it.

Two more examples caught my eye this week, in both cases where a country announced the introduction of a postal code system: Nigeria and Dominican Republic.

In both countries a postal code exists - in Nigeria's case since 2000 - and in both cases this information has been placed at one time or another on the national postal authority's website. It is very interesting that in all of the comments placed in reaction to these news items up to this point, only one persons suggested that they thought that a postal code system was already in place.

Regardless of the reasons why the existing postal code systems in either country had not been fully publicised before now, this is another indicator of how careful form designers need to be in their use of required fields. Nigeria may have a postal code system, but if nobody knows about it or uses it, requiring that field on a form will only lead to customer loss and data quality reduction.

For more information about web forms for an international audience, download the free e-book for here.

Friday, April 24, 2009

Free web form e-book

I released today my new book: “Better data quality from your web form - Effective international name and address Internet data collection”. The good news is that it is completely free to download and use, so go to to download it. Spread the word!

Web forms are a source of immense and continuous frustration for many people as the forms almost universally fail to take any account of variations in personal name and addressing conventions used throughout the world, so that customers have to struggle to clear the hurdles placed in their way by the forms. Reports suggest that almost 9 out of 10 customers have problems when attempting to carry out online transactions, and the result is a huge loss of income from lost custom paired with the collection of very poor quality data.

This e-book (in Acrobat pdf format), which is free to download and use, attempts to fill the gap left by most works about usability by concentrating on the experience that your international customers have with your form, and how it affects them, their relationship with you and your data quality.

This has been done using many examples and without going into too much depth about the idiosyncrasies of international personal names and addresses, and avoids technical discussions – links to get more information are provided where appropriate in the text.

Any feedback is most welcome.


Thursday, April 2, 2009

New product releases at GRC Database Information

We've been busy in March preparing updates for most of our products. These include:

- an update to the Global Sourcebook for Address Data Management

- a new release for many of our data tables: place name/postal codes, address elements, job titles and others

- a new version of our address parsing, standardisation and formatting software GRCTools.

More information on our website.

Saturday, March 7, 2009

Now you have your postal code system, how about telling us about it?

My attention was caught today by an online discussion here hosted by the English-language Vietnam News, about introducing a postal code system to Vietnam to resolve the problem of duplicate thoroughfare names within cities.

Of interest to me is that Vietnam has had a postal code system for years, updating it from a 5-digit code to a 6-digit code in 2004. Clearly, though the system has been designed, it has not been implemented to the extent of informing the residents.

This is not as unusual as you might think. Some countries - Bahrain and Nicaragua spring to mind - are coy about their postal code systems, even to the extent of requesting people not to use them because sorting continues to be done manually. I've had people from Costa Rica swear blind to me that their country has no postal code system, even after I've pointed them towards the postal website describing it. Clearly, whilst some countries are happy to design postal code systems, the expense of implementing them - mechanised sorting systems, information dissemination and so on, tends to put a brake on following the projects through.

Tuesday, February 17, 2009

For the love of web forms!

It may seem to you, faithful reader, that I have become a little obsessed with web data entry forms lately. Apart from being the front end of company databases that we crash into most regularly, it is also a reflection of the fact that I am currently working on a book about .... web data entry forms!

But I am not alone in this obsession. My colleagues across at the Data Value Talk blog have decided to try to think positively and to collect some examples of GOOD web data entry screens. If they get enough, they'll open up a competition for the best examples.

So, if you come across any well-designed (or badly designed!) web data entry forms, take a screen dump and scoot along to the Data Value Talk blog - I, for one, look forward to seeing them.

Tuesday, February 10, 2009

Stop. Step back. Think.

Has creating web input forms become automatic? Does anybody ever test the forms they put online? Does anybody ever stop, take a step back, and think things through?

Yesterday, faced with yet another clone form expecting of me a typical United States' address, I came to the inevitable drop down requesting (no, demanding!) my state. The company gave no option to clear this field, nor to choose a non-US variant, so, as they had kindly defaulted to Alabama, I moved on.

For the next field, country, I chose "Netherlands". And look what happened:

Those who know me and my work will know that I have hammered on for years about making web forms dynamic, and that input fields should change on the basis of country and language. But why (oh why?), when a company follows this mantra, are the fields that cause this dynamic change ALWAYS added AFTER the fields that they change? Is it a conspiracy to annoy the customer? To make the customer work at their very hardest in order to buy the products?

Or is somebody just not thinking things through?

Wednesday, January 7, 2009

Data Quality: Perception versus Reality

A new white paper, "Data Quality: Perception versus Reality", by Graham Rhind and sponsored by Capscan Ltd, reporting on the results of a data quality survey, is available for download here