LENGTH: 24 Hours
Contains: PDF course guide, as well as a lab environment where students can work through demonstrations and exercises at their own pace.
This course will step you through the QualityStage data cleansing process. You will transform an unstructured data source into a format suitable for loading into an existing data target. You will cleanse the source data by building a customer rule set that you create and use that rule set to standardize the data. You will next build a reference match to relate the cleansed source data to the existing target data.
If you are enrolling in a Self Paced Virtual Classroom or Web Based Training course, before you enroll, please review the Self-Paced Virtual Classes and Web-Based Training Classes on our Terms and Conditions page, as well as the system requirements, to ensure that your system meets the minimum requirements for this course. http://www.ibm.com/training/terms
1: QualityStage Review • Course project • QualityStage review • Data Quality • Master Data Management • Investigate • Standardize • Match
2: Structure of a Rule Set • Rule Sets and Rule Set files • Classes and Classification tables • Thresholds • Dictionary files • Pattern action files • Optional tables
3: Creation of a Custom Rule Set • Custom Rule Set development cycle • Investigate data file • Parsing • SEPLIST/STRIPLIST updates
4: Initial Investigation of Data to Be Standardized • Word Investigation • Pattern report • Token report
5: Classification Table • Create the Classification Table • Classification schema • What to classify • Process • Resulting Classification File with Legend • Pattern review: refining the Classification Table
6: Pattern Action File • Pattern Action Language • Development of Pattern Action Sets • Refining Pattern Action Sets • Investigation of Standardized Results
7: Standardization Rules Designer • What is Standardization Rules Designer or SRD? • Using the SRD • SRD work areas • Rule Set revision and selection • Embedded assistance
8: Match Frequency • Match frequency job • Column mapping • Match frequency data set • Using match frequencies in a match job
9: Two-Source (Reference Match) Advanced Implementation • Create a reference match between standardized product data and warehouse data • Refine the match results using the description fields of the standardized product data and the warehouse data.
The intended audience for this course are: • QualityStage programmers • Data Analysts responsible for data quality using QualityStage • Data Quality Architects • Data Cleansing Developers • Data Quality Developers needing to customize QualityStage rule sets
Participants should have: • Compled the QualityStage Essentials course, or have equivalent experience • familiarity with Windows and a text editor • familiarity with elementary statistics and probability concepts (desirable but not essential)
After completing this course, you should be able to:• Modify rule sets• Build custom rule sets• Standardize data using the custom rule set• Perform a reference match using standardized data and a reference data set• Use advanced techniques to refine a Two-source match
Prior to enrolling, IBM Employees must follow their Division/Department processes to obtain approval to attend this public training class. Failure to follow Division/Department approval processes may result in the IBM Employee being personally responsible for the class charges.
GBS practitioners that use the EViTA system for requesting external training should use that same process for this course. Go to the EViTA site to start this process:
Once you enroll in a GTP class, you will receive a confirmation letter that should show:
03 Jun 2023
Self Paced Training