Introduction to Index Data

Index Data (CSIndex) is a utility that creates index files (.csidx) for text data files. It also identifies duplicate case IDs if they exist within one or more data files and, optionally, removes duplicate cases.

Text data files and lookup files must have a corresponding index file in order to be properly loaded by a data entry application and by some tools. When you run an application, CSPro automatically creates an index file when loading a text data file if the index file does not exist or if the data file is more recent than the existing index file. You can also use this tool to create this index file if it does not exist.

This utility is also useful for removing duplicate cases from a data file because an index cannot be created for a file if duplicate cases exist. When used to remove duplicates, Index Data can operate on text data files or other data files (such as CSPro DB files). Index Data has several modes of operating:

Output duplicate keys to listing: The tool will write information about all duplicates in a listing file.
View duplicate cases: The tool will display the case contents of each duplicate case.
Prompt to delete duplicates: The tool will display the case contents of each duplicate case, allowing the user to specify which case to keep in the output data file. The Delete identical duplicates automatically option will suppress the displaying of the contents of cases that are identical.
Delete duplicates automatically (keeping the first case): The tool will keep the first instance of a case within the data file and delete each subsequent case with the same case ID when writing the output data file.

Index Data can operate on one or more data files. When more than one input file is selected, the tool operates slightly differently from its behavior for one file. The tool attempts to create a temporary "global index" with the case IDs from each data file checked against the case IDs of the other files. This is useful for checking whether duplicate cases exist across several data files, and deleting specific cases if desired.

How to ...