Wednesday, September 17, 2014

Sod's law

A recent blog post from a data quality company, which shall remain nameless, was in the form of a quiz. Could the reader spot the errors in the address formats from various countries?  The idea was good – trying to get the reader to appreciate that these differ throughout the world, whereas most companies think they are the same for everybody.  Unfortunately Sod’s Law intervened – the “corrected” addresses were mostly not, and the post was full of inconsistencies.  A quiet tweet in their direction and the post was removed. 
And that was the end of that. 

Except that this is a very common topic for blogs from data quality experts and providers alike. “Look”, they say, “this is wrong, and this is right, and we help you get from the state of being incorrect to the state of being correct.” 

Again, all good stuff. But there are a couple of important points which I rarely see addressed in those posts.

Firstly, correct according to whom? According to the local postal services? According to the local government? According to the emergency services? According to the bible of Graham Rhind? According to local cultural norms?  Although postal services are often the managers of street address files, they may not originate that data, and increasingly alternative resources, such as land registries, are becoming available to use instead. Often “correct” is taken to mean the form that an address takes in the local postal address file, if one exists. Those files are often held for a single purpose – to facilitate the efficient handling of mail – and, as postal authorities face the same problems of data quality and management as the rest of us, they may differ substantially from how the local populace actually write that addresses.  The data may, for example, be stored only in capital letters, without punctuation and without diacritical marks.

The fact that postal address files are used primarily for mail delivery brings me to the second point I miss when companies talk about what they are able to do – what is the address to be used for?  The blog post I mentioned suggested, for example, that an arrondissement (district) of Paris should be added to a French address and a county added to a British one.  We know that a county isn’t required in a UK address used for mailing, provided the postal code is there, and an arrondissement is not a requirement in a French address on a letter, especially as that information is also included in the postal code.  But if that address is being used in a travel guide, or on a website to show a business’s location, or to provide a route description for a person, then the additional data improves the usefulness of the address information and won’t be wrong in the address unless the different pieces of information don’t match (for example, if the wrong county information is provided).


I look forward to blog posts and articles about address data glitches. But is it time to move on from postal address files being regarded as the (only) holders of the golden record?