Removing duplicates from a data file can occur manually or automatically.
If you elect to run Index Data in manual mode, you will be prompted on how you want to handle duplicate cases found in your data files:
The tool lists the name(s) of the data file(s) in which a duplicate was found. In the above image, two cases were found in the file codesC.txt that shared the same case ID. You can view the contents of the case by clicking on the line number where the case was found. Checkmarks indicate cases that you want to keep in your output data file, so to remove a case you must remove a checkmark next to the line number indicating the case to be removed. In the above example, the user is keeping the first incidence of case 151011 from the file codesC.txt.
The above example shows a "global duplicate" in an instance when the user has run the tool on multiple input data files. Case 193099 exists in both codesC.txt and codesD.txt but occurs only once in each file. In the above example, the user has decided to remove case 193099 from codesC.txt while keeping the case in codesD.txt.
The tool is not able to generate an index for output files with duplicates, so it is recommended that you remove a sufficient number of duplicates such that there will no longer be duplicate cases existing within the output data file.
If you run Index Data in automatic mode, the tool will keep the first instance of a duplicate case that it encounters and remove the remaining duplicate cases. This is true regardless of whether you run the tool on one or more data files. For instance, in automatic mode the following will occur:
FILE 1
111.........Glenn
222.........Greg
333.........Tom
111.........Bruce
444.........Joshua
FILE 2
555.........Oliver
222.........Jenny
666.........Savy
222.........Melissa
If the three digits above represent a case ID, in automatic deletion mode, the following data files will be produced:
FILE 1 (fixed)
111.........Glenn
222.........Greg
333.........Tom
444.........Joshua
FILE 2 (fixed)
555.........Oliver
666.........Savy
However, if you ran the indexing tool on File 2 only, the resulting data file would be:
FILE 2 (fixed)
555.........Oliver
222.........Jenny
666.........Savy
It is important to consider the order in which the input files are listed. If File 2 had been listed before File 1, the resulting data files would be:
FILE 2 (fixed)
555.........Oliver
222.........Jenny
666.........Savy
FILE 1 (fixed)
111.........Glenn
333.........Tom
444.........Joshua