Errors in data can be (roughly) categorized as problems in either
structure or
consistency. Problems with structure may require a different approach to correction than will problems with validity or consistency. For this reason, many users choose to implement a two-stage edit, where the data are first rendered structurally correct, and then passed through a second edit to ensure that all items are valid and consistent. However, it is expected that while the subject-matter specialists will have established rules for validity and consistency, the computer specialist may have equally important input into the definition of structural validity.
After keying or scanning your data, there will be errors in the data file. This is unavoidable, and will be a combination of human and computer error. It will therefore be necessary to correct the data by writing a series of edit routines (procedures) to systematically and consistently clean your data files.
The Batch Edit Designer module allows you to create and modify batch edit applications. A batch editing application contains logic which you can apply against one set of files to produce another set of files and reports. Batch editing applications can be used to gather information about a data file.
To create these edit routines, you will use CSPro to develop the batch editing application based on the dictionary that describes your data files. If you received this data file from someone else and do not have a dictionary that describes it, you will need to
create a dictionary before you are ready to develop programming logic for it. You use CSBatch to run the application. For small surveys and for testing applications, you can
run CSBatch directly from CSPro, on the same computer. For large surveys and censuses, which require a production environment, you can transfer the application files to other computers and
run CSBatch on them.
You can have the following runtime features in your batch editing application:
These checks ascertain that all records that should be present for a particular questionnaire (case) are supplied and that no extra records are included.
These checks are designed to determine whether a response has a value that is inside or outside the valid limits for that response. Although these checks are normally performed during data entry, you can also perform them with CSBatch.
In these checks, two or more responses in a questionnaire are compared for consistency. For example a male person reporting having any children is an inconsistency between the response to sex and the response to children born. (Only females report children born.) The responses being compared may be in the same record or in different records within a questionnaire.
When a census or other large survey is being processed, it would be unduly cumbersome to make most data corrections by visually examining the errors. CSPro provides the facility for not only finding incorrect or inconsistent data, but also making modification to the data. Needless to say, modifications are not always corrections. Moreover, any modifications to data that have been collected must be carefully thought out and monitored through CSPro's edit statistics reports.
CSPro generates comprehensive statistics about the edit tests carried out and the number of changes made to the data. The user may also create a customized report including or excluding any of the information generated by CSPro during the editing process.
You can write customize reports to a file.
CSPro allow for matching two files and gathering information from both. The feature is useful, for example, when a file must be created which has a combination of data from two other files.
During the editing process, values that are invalid, inconsistent, or otherwise unacceptable will need to be replaced with correct values.
Whether you choose to correct data using hotdeck or cold-deck methodology, the arrays are easily defined, accessed, and updated.
CSBatch will automatically keep track of changes, but users my choose to format reports to their specifications. By specifying a denominator variable, users may obtain rates of imputation, including rates for individual values (e.g., male or female when imputing the variable Sex).
Recoded or composite variables may be created during the editing process. The only requirement is that space be allocated (via the dictionary for the file) in the output record. The updated output file is automatically created when a file name is specified at the time of execution.
An essentially unlimited number of secondary or auxiliary files may be attached to an application and used as reference or lookup files. The application may read from and write to any of these files.