ETL needs a C phase to migrate data
December 21, 2015
Extract, Transform, and Load make up the needed steps for a successful data migration. The larger an organization has become — or the longer its history — the greater the need to add a C, for Cleanse, to the ETL. Cleaning data is an essential part of decommissioning a 3000's data on the way to migration. MB Foster has been using its UDACentral for more than 10 years to do the ECTL steps in preparation to decommission 3000s. The software's gained some new features recently, on the way to becoming a tool for sale to system integrators and consultants. It's been in use at MB Foster's migration engagements up to now.
The product now can produce an Entity-Relationship diagram. This visual map can be created for documentation of existing database structures. It can be printed, or shared via email, because it's a PDF document.
During the ECTL process, UDACentral now can call a URL to pass in data and get back values that will be inserted into a virtual column. One customer, according to MB Foster's CEO Birket Foster, "had five scripts they ran in a row to clean up a phone number field. This enabled them to use those scripts. When transferring data, they're moving that column out, get the five scripts run, that place the result in that column." This kind of cleaning does slow down the transformation and loading, "but for most people that's not as big a thing as having clean data."
Clean data eliminates errors on a new platform. Data decommissioning typically occurs when
• An application is being replaced – by a new application or an upgrade.
• Hardware or an application no longer has support
• An OS vendor obsoletes a platform or chipset
• An operating system has reached its usable lifecycle
• A company has a change in status – being merged into or acquired, or an insolvency — and an application will no longer be used.
In order to begin such a decommissioning, IT managers should determine owners and key stakeholders. With the patience 3000 managers are known for, they must discover the data owners' requirements for movement and maintenance of legacy app data. "It's not only important from a compliance perspective, but it may be critical to a line of business or department," Foster says. He advises that IT managers adopt a business process and legal view of long-term requirements, rather than just a technology approach.
During this ID phase, understand that data is indelible. "They say there are two things to count on, death and taxes," Foster said. "Now there are three -- let's add data to the list." When you plan to migrate data to the new app as required, you separate data into transaction categories. For active data, figure out a data migration. For inactive data, determine a historic data plan. A single plan doesn't fit both types of data very well.
Because the company has specialized in data and reports for so many years, MB Foster reminds managers that reporting requirements are a key element of both kinds of plan. Longer-term reporting requirements are often not considered during a data decommissioning process. When you need that kind of information, how will it be presented back to the users who need it? One best practice is collecting line of business and departmental-level reporting requirements -- before you decommission legacy applications and data. The apps and platform may change, but the reporting needs are likely to remain the same.
In some cases, HP 3000 apps are part of a larger, global IT structure. It's a good idea to document corporate data retention policies with the global perspective as a guide. Factor in rules according to individual countries, and remember that specific industries will sometimes dictate compliance policies. It's possible to reach out to corporate internal resources which deal with records management to address their policies.
Once the policies and processes have been established, it's time to standardize on an archiving system. This is the time to work on unifying data and content. That's a process that will proceed smoothly if you can provide a complete view of the archiving system to the data stakeholders and departments. Rules for access to the data will include the issues of security, privacy, and compliance with regulations such as HIPAA for health care, or PCI credit card transactions.
A plan and schedule for decommission will impact plenty of departments and people. When presenting the plan, be prepared to address questions such as "Where will the information go once it's decommissioned?" and "What new application replaces the legacy system?" You'll want ask questions, too, like "What is an acceptable decommissioning timeframe?"
Like any good project that impacts a company asset like data, yours will demand that you create a plan for data integrity validation and auditing. Articulate what quality data means, and share it across the enterprise. "The integrity of your data is vital," Foster said, "and there are many threats to data quality." For example there are hardware problems, old tape archive issues, data entry errors, or carelessness.
Project to decommission legacy data often must consider who owns it. Issues of regulatory and governace must be met, including access to historic data over required periods of time. Some alternatives in these instances include moving the data, providing searchable formatted data, and even having an auditable instance for the "system of record" -- the retiring HP 3000 application or server.