Data cleansing is a process whereby missing or invalid data is corrected before it is loaded into a data warehouse for reporting.
This process is normally carried out using data quality software such as that offered by Trillium Software, or for address checking and verification – QAS. Custom code can also be written to check data for inconsistencies and errors. The software works by applying a set of business rules and checks to a specified data set. The software will then load the data, normally into it’s own temporary repository, and check for anomolies. The data cleaning process can then be fixed
manually, with the erroneous data highlighted to the user, or the application can cleanse data automatically.
This step in the load process is particularly important when populating data from multiple data sources, particularly as there may be inconsistencies in
data dictionary definitions, user entry errors or missing data.
Part of this process may involve the matching of a list of customer addresses against, for example, a post office address database. This will ensure that that addresses are correct, and any missing data such as post codes can be entered in to the destination load.
This step should not be underestimated or overlooked. The initial expense of the software normally pays for itself many times over by providing your users with accurate information.
Trillium Software
The Trillium Software solution is built upon the Avellino Discovery product. Discovery was a flagship product for the Avellino company, and was the market leader for data profiling and analysis.
Avellino was acquired by Trillium Software in 2004.