A recent blog post from a data quality company, which shall
remain nameless, was in the form of a quiz. Could
the reader spot the errors in the address formats from various countries? The idea was good – trying to get the reader
to appreciate that these differ throughout the world, whereas most companies
think they are the same for everybody.
Unfortunately Sod’s Law intervened – the “corrected” addresses were
mostly not, and the post was full of inconsistencies. A quiet tweet in their direction and the post
was removed.
And that was the end of that.
Except that this is a very common topic for blogs from data
quality experts and providers alike. “Look”, they say, “this is wrong, and this
is right, and we help you get from the state of being incorrect to the state of
being correct.”
Again, all good stuff. But there are a couple of important
points which I rarely see addressed in those posts.
Firstly, correct
according to whom? According to the local postal services? According to the
local government? According to the emergency services? According to the bible of Graham Rhind? According to
local cultural norms? Although postal
services are often the managers of street address files, they may not originate
that data, and increasingly alternative resources, such as land registries, are
becoming available to use instead. Often “correct” is taken to mean the form
that an address takes in the local postal address file, if one exists. Those
files are often held for a single purpose – to facilitate the efficient
handling of mail – and, as postal authorities face the same problems of data
quality and management as the rest of us, they may differ substantially from
how the local populace actually write that addresses. The data may, for example, be stored only in
capital letters, without punctuation and without diacritical marks.
The fact that postal address files are used primarily for
mail delivery brings me to the second point I miss when companies talk about
what they are able to do – what is the
address to be used for? The blog
post I mentioned suggested, for example, that an arrondissement (district) of
Paris should be added to a French address and a county added to a British
one. We know that a county isn’t
required in a UK address used for mailing, provided the postal code is there,
and an arrondissement is not a requirement in a French address on a letter,
especially as that information is also included in the postal code. But if that address is being used in a travel
guide, or on a website to show a business’s location, or to provide a route description
for a person, then the additional data improves the usefulness of the address
information and won’t be wrong in the address unless the different pieces of
information don’t match (for example, if the wrong county information is
provided).
I look forward to blog posts and articles about address data
glitches. But is it time to move on from postal address files being regarded as
the (only) holders of the golden record?