Friday, October 17, 2014

Is the IAIDQ dead?

A recent post by Daragh O Brien about the International Association for Information and Data Quality (IAIDQ) and its future got me thinking.  I’ve never been deeply involved in the IAIDQ, unlike Daragh, but I was also a charter member, and I have experienced a definite change recently. Or, perhaps, experienced that I was no longer experiencing anything, if you get my drift.

Many of the people I knew who were involved in the early years of the IAIDQ have retired or moved on, and requests to me for information, articles and so on from the current leadership have dropped to nothing. Which may not be surprising, as they probably don’t know me from Adam. Indeed, as Daragh points out, a new edition of the Journal has become as rare as a web form which can correctly collect address data from more than one country (i.e. almost non-existent!). Members used to have a vote on members of the committee – that seems to have been quietly dropped too. 

I know that Daragh won’t agree, but I began to be concerned when the organisation started to busy itself creating Information Quality Certified Professional” (IQCP) qualification. What’s it for? I am a firm believer in educating people about data quality, but I don’t see how a qualification is a useful part of that apart from filling up space on a CV. I have no idea what the qualification entails – my services when it was being formulated weren’t required – but my impression is that it deals essentially with theory and not practice.  And it’s clear to me that those at the doing end of this data quality thing have no better understanding of data quality and how to achieve it than they did ten years ago. In fact, as businesses perceive that they need to obtain and manage ever larger amounts of data, even though often they don’t, accuracy and quality are diminishing – gather enough data and take a swipe at it, and you’ll hit a few targets on the way. Maybe the IQCP qualification is useful for some, and shouldn’t be harmful for others, but it does seem to me to have become the central focus of an organisation that should be doing more than counting the number of paying students they can hustle through an exam.

I don’t know if the IAIDQ is dead. I’m not close enough to it and they seem not to want to be too close to me. But one thing I do know. Much earlier this year I received an e-mail requesting that I renew my membership. Instead of immediately doing so, I cogitated on what I was getting out of the IAIDQ (nothing I could think of) and so, in straitened times, I decided to put off the decision on whether to renew until they sent a reminder.

I’m still waiting.

If an organisation dedicated to data quality can’t manage its own data, doesn’t that say something?

Wednesday, September 17, 2014

Sod's law

A recent blog post from a data quality company, which shall remain nameless, was in the form of a quiz. Could the reader spot the errors in the address formats from various countries?  The idea was good – trying to get the reader to appreciate that these differ throughout the world, whereas most companies think they are the same for everybody.  Unfortunately Sod’s Law intervened – the “corrected” addresses were mostly not, and the post was full of inconsistencies.  A quiet tweet in their direction and the post was removed. 
And that was the end of that. 

Except that this is a very common topic for blogs from data quality experts and providers alike. “Look”, they say, “this is wrong, and this is right, and we help you get from the state of being incorrect to the state of being correct.” 

Again, all good stuff. But there are a couple of important points which I rarely see addressed in those posts.

Firstly, correct according to whom? According to the local postal services? According to the local government? According to the emergency services? According to the bible of Graham Rhind? According to local cultural norms?  Although postal services are often the managers of street address files, they may not originate that data, and increasingly alternative resources, such as land registries, are becoming available to use instead. Often “correct” is taken to mean the form that an address takes in the local postal address file, if one exists. Those files are often held for a single purpose – to facilitate the efficient handling of mail – and, as postal authorities face the same problems of data quality and management as the rest of us, they may differ substantially from how the local populace actually write that addresses.  The data may, for example, be stored only in capital letters, without punctuation and without diacritical marks.

The fact that postal address files are used primarily for mail delivery brings me to the second point I miss when companies talk about what they are able to do – what is the address to be used for?  The blog post I mentioned suggested, for example, that an arrondissement (district) of Paris should be added to a French address and a county added to a British one.  We know that a county isn’t required in a UK address used for mailing, provided the postal code is there, and an arrondissement is not a requirement in a French address on a letter, especially as that information is also included in the postal code.  But if that address is being used in a travel guide, or on a website to show a business’s location, or to provide a route description for a person, then the additional data improves the usefulness of the address information and won’t be wrong in the address unless the different pieces of information don’t match (for example, if the wrong county information is provided).

I look forward to blog posts and articles about address data glitches. But is it time to move on from postal address files being regarded as the (only) holders of the golden record?

Thursday, July 10, 2014

Do I want online advertising to be relevant to me?

I recently read an article in Database Marketing Magazine by Paul Kennedy about the myths and reality of data (online version here). In it Kennedy suggests, and I paraphrase, that consumers would rather see offers and advertising online which is of relevance to them than generic advertisements, a point often made. Is this assertion true?

I don’t have any figures which support or refute this, but naturally the answer to a question depends on the question being asked. I suspect that given a choice most people would simply rather see less or no advertising than relevant advertising, or would rather see advertising of any type which is easier to distinguish from content than what is currently on offer. But most people also understand that the current financial model for online content is to provide it for “free”, paid for by advertising and often in exchange for people’s personal data. Without advertising the larger online companies wouldn’t be so rich and those of us with a smaller online presence wouldn’t still be in business.

Regardless, I’m not one of those who wants to see relevant advertising.  And I’ll tell you for why.

When I receive mail, or an e-mail, from a company, then I like the offer to be relevant to me, to be of interest, because I am offended by the waste involved, in time and resources, when it isn’t. But when it isn’t I can easily take action. I can dispose of the communication, which is a separate unit which I can choose to pick up and read when I want to, or discard, and then forget about. In many countries legislation exists which would allow me to turn these communications off. When the advertising block starts on the TV, I can turn it off, turn the sound down, or walk away for the duration. The advertising is isolated from the content (though increasingly less so), and that gives me, the consumer, the power of choice.

Upselling in mailings, such as with orders or statements, has been around for a while, but at least it is generally in distinct units – I can discard the guff and concentrate on the content. Up to now no company has tried to upsell to me on the same piece of paper as the invoice etc. with which it was enclosed, and let’s hope that that doesn’t happen.

Online advertising is different.  It is pervasive and invasive. It doesn’t form a separate unit which I can view or ignore, as appropriate.  It is woven into any content that I have actively sought out, and it is becoming increasingly difficult to distinguish as advertising. It is intrusive and sometimes so invasive that its purpose is defeated. A well-disguised audio advertisement on a page will have me backing out of that page as fast as my mouse can reach the button, surely to the detriment of the content provider. No legislation exists to allow me to view my content without advertising. It’s the equivalent of being sent a bank statement and then trying to find and view my account balance amongst the advertisements for fast cars and Ukrainian mail-order brides.  Unthinkable offline, but run of the mill online.

Online advertising is often dishonest. It lies or disguises itself as content to attract my click which, whilst profitable in the short-term for the pay-per-click provider, won’t help a brand in any way in the consumers’ eyes. I’ve seen pop-up advertisements in mobile apps with either no close button or one which is so small that a human finger will often miss it.

Emphasis on what adverts are shown is placed on the person viewing a page, which is why online advertisers are so keen to find out all they can about you and I. Why there isn’t more emphasis on the content we are looking for and looking at is a mystery to me.  If I’m looking at a page of reviews for hotels in London, then advertisements for hotels in London would probably be a better bet to get my click than ones trying to sell me a lawnmower. Once I leave those hotel pages and move on, though, I don’t want to be continuously subjected to adverts for hotels in London – that was then. I’ve moved on. Shouldn’t the advertising move on with me?

When I go online to look for something, a new watch for example, then I would like to see information about watches when I’m looking for it. Just as I would choose to go to a jewellers to find a watch when visiting my nearest shopping centre. Once I’ve left that shop/search, though, do I still want to be constantly marketed to about watches? Do I want to read about watches when I’m shopping for a fire extinguisher, or reading the news, or chatting to friends? Why would I welcome that distraction? Fine to see something while I’m looking for that product – it’s fair game that, if I’m looking for a watch you want me to buy yours – but afterwards? There are tracking cookies, more like stalking cookies actually, which keep presenting the items you viewed in one site on other pages you might visit.  Amazon does this.  It’s like walking out of the jewellers and having somebody follow you shouting a constant refrain of “BUY THE WATCH!  BUY THE WATCH! YOU KNOW YOU WANT TO! BUY THE WATCH” until you either give in or, like me, find and change the tracking preferences for that retailer.

So, as advertising is there and isn’t going away, do I want the advertising I see online to be relevant and “interesting” to me, in the same way as with direct mail?


I’m clearly not the target of most online advertising, which is aimed at people who are as lax with their purse strings as they are with their personal data, but I don’t want online advertising to be relevant to me because, if I can’t choose whether and when to view it, then I’d like to be able to block it out as easily as possible. Whilst the pages I view are full of advertisements for cars, singles matching sites, holidays in the sun, football tat and flat rentals, in language(s) I don’t speak and none of which have any relevance to me at all, I can concentrate on the site’s content without distraction. This also reassures me that either companies haven’t got much personal data about me, or they don’t know how to use it. Either way, that’s fine by me!

You may have it. But do you know it?

A country may have a postal code, but is it used? Do people know there is a code system, and, if they do, what their code is? And when is it safe to make "postal code" a required field for forms for that country? Read more here.

Monday, June 23, 2014

Worts' Causeway

The potential for data to be corrupted and polluted increases as  it gets passed through interfaces and contact points, and as it passes from process to process and from system to system. This makes data hard to keep clean/ Those of us in the data quality world often hammer at the point that getting the data right at source is the ideal for a high level of data quality. But not all data is correct or standardised at source. When the originator of that data, as much prone to data quality defects as the rest of us, can't decide on what form it takes, what chance for getting it right in your systems?

Read more in my blog post here.

Sorry, you are an invalid character

In a bid to promote equality, and bringing it more into line with other European countries, a proposed new law in Belgium would automatically assign a baby the surnames of both the father and the mother (in that order) instead of only the surname of the father. However, the parents may also choose to give the child the surname of either of the parents, of to have the mother's surname precede that of the father.

Read more in my blog post here.

And your point is?

The costs of data quality failures are often very difficult to quantify, and, given the way that businesses operate, being unable to put a price tag on these failures usually means that they are given a low priority by organisations. Recently Amsterdam City Council made a data quality error which illustrates just how much they can cost.
Read more in my blog post here

Wednesday, January 8, 2014

Are search engine polices creating a second class internet?

My main company website has been online now for almost 20 years. My motto has always been to keep things simple and honest, and it hasn't done me any harm. The site appears high up in most search engines (depending on the search terms) and I have an average Google PageRank.

I'm often asked to add links to help other sites up their search engine results, but I never do. I like to think of my sites as free resources, a place for the discerning visitor to use to get some quality information. They are not designed to manipulate visitors, be they human or bots, in any way. Whilst this may cost me some visitors in terms of numbers, I like to think that more of those that do come are looking for something rather than just clicking through.

Recently, though, I've been noticing a worrying search engine listing obsession which is threatening to turn much of the web into a second-class backwater. Sometimes, when one person from a company has asked me to add a link to their company, a few days later another asks me to remove it because it might negatively effect their search rankings. This has recently happened with a company with whom I have previously had very good relations and who directly benefits from being named on my site. Whilst the webmaster who requested the link removal won't know me from Adam, you can be sure that the CEO knows me well. When people ask my advice about companies working in the data quality field, I point them to a page with links.  If I were to remove the links those companies would get no referrals from me.

A shot in their own foot.

So why the obsession with numbers instead of quality?

The worrying aspect in the latter case was the threat to bad mouth me to the search engines by posting a disavow report to them, which would have grouped my site with spammers, link farmers and those trying to manipulate search engine results in shady ways.  It appears that these companies are trying to associate themselves only with the highest ranking sites on the internet, and will do this by pushing the rest, by fair means of foul, out of the way and down the listings.

As you can tell, I'm not happy about this. But I still refuse to stop being honest and open with my sites. I don't add links to increase their rankings or mine, and I don't remove links for those reasons either.  Isn't there room on the internet any more for high quality sites with smaller numbers of high quality visitors?  What do you think?

Obamacare: a lesson in data entry design

The Patient Protection and Affordable Care Act health insurance system (better known as Obamacare) currently being rolled out in the United States is about as complex a project as any data professional is ever likely to face. State-specific portals must be consolidated into a federal data hub, and data from that hub is validated against a number of other data resources, including those holding social security, justice, security and financial data for each individual. Unfortunately, its implementation has been plagued by problems, widely reported and too often experienced by the people trying to register with the system.  Servers have crashed, websites have crawled, security has failed, coding has been poor.

Read more in my blog post here.

The politics of postal codes

Let’s face it, I’m a geek. After more than 20 years of minutely studying address systems, I still come across new information and it still interests me.  For example, you wouldn’t think that politics would affect postal codes, would you?  But in many parts of the world, it does.

Read more in my blog post here.

Out and outliers

Ms Keihanaikukauakahihuliheekahaunaele was an outlier. Her 36-letter Hawaiian surname couldn’t be entered into state data files or reproduced on her ID card or driving licence – it had to be truncated.  Her frustration was shared by other people whose cultures provide them with names or other details which systems designed for other cultures can’t deal withMs Pontes da Costa Granja James y Savill, Karl-Theodor Maria Nikolaus Johann Jacob Philipp Franz Joseph Sylvester Freiherr von und zu GuttenbergSiddig El Tahir El Fadil El Siddig Abderrahman Mohammed Ahmed Abdel Karim El Mahd(better known as the actor Alexander Siddig) and His Imperial Highness Prince Şehzade Nazım Ziyaeddin Nazım Osmanoğlu(comedian Naz Osmanoglu) to name but a fewUlrika Örtegren-Kärjenmäki, whose name is hardly gargantuan, was refused a flight because her name would not fit onto Ryanair’s boarding pass, leaving aside the issue of the confusion caused at security by the diaereses over the letters in her name.

Read more in my blog post here.

The balancing act between business and customer

Anybody in business would have come across those companies whose internal procedures make buying from or selling to them a daunting undertaking.  Quotes have to be formatted just so, invoices laid out exactly thus, purchase orders routed through these channels and not those.  These procedures have the positive effect of enabling the company to control its finances and spending better. They have the negative effect of reducing its ability to do business.

Read more in my blog post here.

Faster to your Doormat with the Correct Format!

A statement from my British bank arrived recently with a label covering the address window.  The label, added by Deutsche Post, explained that the address used by the bank had been formatted in such a way that it could not be machine read and sorted (and, by implication, explaining why it may have been delayed).

Read more in my blog post here.

Dynamism and vanity

The need for organisations to collect, store and maintain consistent and accurate data cannot be understated.Though it would seem self-apparent that certain types of data are either correct or incorrect, accurate or inaccurate, in many cases a variation in data may be influenced by human perception or cultural and linguistic background, so that data referring to the same physical entity may be expressed in a number of ways, none of which are wrong.

Read more in my blog post here.

When the Golden Record is Tarnished

Golden records, single customer views, call them what you will, are the El Dorado for many organisations struggling with large amounts of data from multiple sources.  They’re a great asset when they’re accurate, but can cause a lot of problems in downstream data quality when they’re not.

Read more at my blog post here.

Are You in Debt to Data Quality?

Data quality should be a core concern for any business, but for many companies it isn’t – any success that those companies have could often be multiplied many times by attending to their data quality issues.
But for financial institutions, such as banks and insurance companies, the luxury of ignoring data quality is not there. They live or die by the quality of their data – its accuracy, completeness and currency.
Read more in my blog post here.