you'll know by now that the best way to achieve data quality is not by cleansing data after it has been collected - that's just an expensive way of mitigating the effects of poor data quality. You'll know that preventing data quality issues at source is a more effective and ultimately more cost effective way to manage data quality.
There are hundreds of thousands of companies who continue to attempt reactive data quality cleansing rather than instigating preventative data quality, and that won't change in the short (or even middle) term; but when you're a company working within the data quality sphere you have to be very careful about how you manage your own data quality, because if you don't ne'er do wells such as myself will be quick to pick up on it.
Informatica, a data quality company, posted a white paper - about data quality - here, and included with it a web form guaranteed to collect the worst quality data imaginable. (To me, a white paper is not free if I am expected to provide my information (which has value) in exchange for it - but that's another blog post ...). Now, don't get me wrong - I have nothing again Informatica as a company - they just seem (on this evidence) to have reached the size and structure which has stopped them being able to concentrate on data quality in all parts of their company, and with too many employees not understanding, or being part of, the data quality focus.
A quick look at the form and we can see some of the issues. Though I am allowed to enter data from my address in The Netherlands, I am forced to add an American state (or a Canadian province or territory) with which to pollute Informatica's data. The field labels suggest that my name is written in the same way as that of most Americans, that is given name first, family name last, and if I don't have a family name I'll have to make one up, because I can't leave that empty. I must add a postal code, even if my country doesn't have one, and though they have managed correctly to lose "Serbia and Montenegro" from their country list, they have lost Montenegro in the process.
When I pointed these errors out to Informatica they promised to recreate their forms, and they may be so doing; but it shouldn't take 15 days to stop a web form "State" field being a required field, one of the most obvious and widespread errors any web form can make, and the cause of more form rage than anything else. I hope they manage to get it sorted before next Tuesday, when I am presenting about web forms to the DDMA in Amsterdam - I'd like to be able to show a success story.
So, dear reader, do yourself a favour. Prevent your data becoming polluted at source. Look at your web forms. I'll make it easy for you - download my free e-book "Better data quality from your web form - Effective international name and address Internet data collection" and learn how to avoid those common errors. And I don't even ask you to fill in a form to get it ...