Tuesday, February 21, 2012

Mouthfuls of common sense

I try to avoid getting into discussions about company structures and spoon-feeding executives with mouthfuls of common sense (known by the cognoscenti, I think, as getting C-level buy-in).  If you're going to pay me a couple of mill plus another couple as a bonus, regardless of my success rate, I'll pop over and sort out your company for you.  Otherwise, you're on your own!

But occasionally I come across something that does show how internal structures and communication can provide a death knell for data quality.  I was reading a technical book (published 2011), which will remain nameless.  As will its author, for whom I have high regard, and I don't plan to bad mouth him for a couple of pages of loose advice.

His book is explaining to IT staff how to manage certain aspects of international data.  Then he gets on to postal addresses.  He then makes a number of recommendations that will turn the blood of most non-technical staff cold.  I paraphrase:

  • Postal addresses are only used for sending post [!]...
  • ... so, as validation is so difficult, don't bother with it [!!] - just collect them as a long text string.
  • But force an upper limit in the number of characters you allow - who needs the hassle of working with those long-winded addresses [!!!],
  • and strip out all those nasty accents, as they can only cause issues in your old, legacy mainframe ...
  • And getting country name drop downs correct is such a hassle [true] that you should just allow a free text box to collect a country name [eek]. 
What wasn't clear was that his advice was based on collecting addresses for the use of sending post, and not for other purposes.  What is clear is that this is the approach taken by a good many IT staff when faced with the challenge of international data.  They are trying to fit the data to their tools and hardware instead of looking at what is required to accommodate the data to be collected.

Never mind about executive buy in - let's work at educating our staff in what they need to know not just to get the job done, but to get the job done well.

Tuesday, February 7, 2012

Blind Angel Egg The Dog

A little aside on the topic of linguistics - sort of.  I could think up some parable linking this to data quality, but I'll leave that to you.

Languages vary a lot.  In my mother tongue, English, we separate words with spaces.  In my second language, Dutch, words are grouped together into long strings.  These strings sometimes need a little time to decipher.

On a metro station a few days ago a poster caught my eye, especially the word BLINDENGELEIDEHOND.  I didn't immediately recognise it, so I automatically started splitting up the string in my head.

BLIND|ENGEL|EI|DE|HOND

Blind Angel Egg The Dog.  Sounds great, but it doesn't make a lot of sense. Except that the post has a Labrador puppy on it, so maybe the dog part is close.



Let's try again.

BLIND|EN|GELEI|DE|HOND

Blind and Jelly The Dog.  No, that doesn't make sense either.

BLINDEN|GELEI|DE|HOND

Blinds Jelly The Dog. No, not getting any warmer.

BLIND|EN|GE|LEI|DE|HOND

Blind And You Slate The Dog.  With a Flemish accent. No no no, unless somebody was on drugs when they made the poster.

Oh, hang on ....

BLINDEN|GELEIDEHOND

Guide Dog For The Blind!

It's not just me.  I know quite a number of people who see

BOMMELDING

and read BOMMEL|DING (something that putters along, like an old diesel locomotive) instead of BOM|MELDING (bomb alert).

Well, it kept me amused until the train arrived!