Monday, December 12, 2022

Claim inflation

 

I’ve specialised in international data and its management for 30 years now, and everything I learn (which I still do daily) enforces the fact that the foundation anybody needs for successful international data management is knowledge. Knowledge, that is, of the world, its systems, conventions, languages, cultures and ways. Knowing how to code brilliantly is unhelpful unless you know what you need to achieve, and knowledge is essential when choosing a partner to help you with your international data quality.

I recently came across a provider of international address validation which claimed to support “250+ countries”. Defining what a country is is not as straightforward as you might suppose. It depends on who you are, where you are, and your political background. There are unrecognized de facto countries and non-existent de jure countries. Even so, however liberal your definition, you would not get anywhere near 250. If you’re counting the more accurate “countries and territories” then you’d get closer, but 250 remains claim inflation. There was a time when every address validation company was trying to outdo the others with country support number inflation. One supported 240 so the next claimed 250 and one even went for 300 plus, which is just ludicrous. This had calmed down, so I rather hope that this new claim is not the start of a new round of unsupportable claims. The company claiming 250+ includes uninhabited rocks (they may have an ISO code, but there are no addresses to validate) and non-existent political entities such as Antarctica. Check the claims in more detail, and they become more preposterous – they claim validation to postal code level even for countries and territories which do not have postal codes.

I would feel better about seeing claims like this if I thought that most people dealing with international data were well enough informed to be able to go to this company and say “you claim to support more countries than there are, how can we be expected to trust you with our data?” This wouldn’t have to happen often for providers of these services to sober up and start telling the truth. The company concerned claims 2800+ customers, including many large companies which should understand addresses. I understand the pressures that companies put themselves under to market and sell their products, but claims need to be based on truth. I did contact the company to ask about this – I received no response. If more people working with international data would educate themselves better in … international data… then that data would be better managed, cleaner and better governed. Let’s hope that things improve in the next 30 years.

Sunday, October 23, 2022

Yes, but Google ...

 

Yes, but Google …..

That’s the start of so many sentences that I hear and read, and the prologue to having to explain, over and over, that the mighty Google are as prone to errors and have to follow the same data paradigms as other companies. Google is very good at some of what it does. In other fields it is average or, if I dare blaspheme, it is poor.  Yet Google is constantly being held up as the arbiter of everything that is correct. If Google says it, it must be so. It is the law, even in aspects as esoteric as language translation. In some cases this is just ignorance. In other cases organisations know that Google is wrong, but follow anyway because they make a commercial decision that they cannot go against the direction of the unstoppable machine that is Google.

This is a worrying trend which goes against the dictates of data quality.

Every database contains errors. Every database contains duplicates. Every database. Including Google’s. Google also lack knowledge, or lack the ability or desire to apply knowledge, in many areas. Problems may be the result of poor data management practices, of which Google is the victim just as much as anybody else; and of the perennial and ubiquitous problem of lack of knowledge or lack of motivation to acquire the required knowledge.

Thinking very specifically now of Google Maps, at the time of writing you may see a lot of duplicate information where they have merged sources and been loose with their de-duplication.  That single electric vehicle charge point at my local railway station? Google shows three. Those multiple building numbers on Hawaiian buildings on their maps? Duplication, because Google doesn’t have or apply the available knowledge about their format so doesn’t realise that 91-123, 123 and 91123 are all the same building. The failure of Google to find addresses in the borough of Queens in New York? Again, a failure of knowledge about local variations in address systems. And, more often than not these days, the format of addresses displayed in Google Maps for many countries is demonstrably incorrect for that country.

That’s how things are now, and Google does change things around a lot so these aspects may no longer be an issue as you read this. Instead, other problems will pop up. Because Google makes mistakes, just like anybody else. What really worries me, though, is how people can’t see, or can’t accept, that Google is anything but perfect. Will Google’s errors cause institutions to start formatting addresses the wrong way, because “Google”? I hope not. In the meantime, I shall keep plugging away and explaining, every time I hear “but Google …”, that Google has a long way to go before they reach omnipotence in knowledge and its application. It’s not even close. So, please spare me the “Yes, but Google …”