Impute Function

Format

d = impute(item_name, new_value)
ʃtitle(frequency_table_title)ʅ
ʃvalueset(valueset_name)ʅ
ʃspecificʅ
ʃstat(ʃitem_name1, ..., item_nameNʅ)ʅ;

Description

The impute function assigns a new value to an item. The item_name is a dictionary item, either numeric or alphanumeric, and new_value is an expression that matches the type of the item. The function is similar to using the assignment operator:

item_name = new_value;

However, unlike using the assignment operator, the impute function keeps track of these assignments and generates a report on the frequency of values used in the imputations. These imputation statistics are useful when cleaning data in a batch application. If your program contains any impute statements, the results of this function will be written a frequencies file. The default file extension is .impute_freq.lst, but you can use whatever extension you prefer.

The function has several optional arguments:

Specify a frequency title (title): If supplying a string expression as a frequency_table_title, this title will be used when the frequency writer creates the imputation frequencies. If no title is specified, a default title such as "Imputed Item SEX: Sex" will be used.

Specify a frequency value set (valueset): By default, when the frequency writer creates the imputation frequencies, it shows each value imputed and it looks at the item's primary value set, if one exists, to find a label that matches the value. If you would prefer to use a different value set when creating the imputation frequencies, you can specify a valueset_name that belongs to the item.

Create a specific frequency table (specific): Typically, if you have multiple impute statements for one item (with the same valueset setting), only one frequency table will be written, with the frequencies for all imputations combined. Even if differing titles are specified, one table will be written, with the title coming from the last executed imputation. If you would like a frequencies table for a particular imputation statement, you can use the specific command to indicate that a frequency table should be created for that imputation.

Impute Stat Data

Create a data file with frequency details (stat): If you want more details about imputations beyond the frequencies showing the imputed values, you can use the stat command to generate a data file that will contain information about each imputation. For each case in the input data that results in any imputations, the stat data file will contain an entry with the case IDs, the original value of the imputed item, the new value used in the imputation, and the line number of the impute statement that resulted in the imputation. For example:

impute(SEX, 2)
stat();

This would result in a data file with a record, IMPUTE_SEX_REC, with three items: IMPUTE_SEX_INITIAL (the initial value of SEX), IMPUTE_SEX_IMPUTED (the imputed value; in this case 2), and IMPUTE_SEX_LINE_NUMBER (the line number of the imputation).

If you would like to see the value of other items that might be useful during analysis, you can specify item_name1, item_name2, and so on. The values of these items will be included in the stat data file. For example:

impute(EDUCATION, getdeck(educationHotdeckBySexAge))
stat(SEX, AGE);

By default, the only entries written to the stat data file are imputations where stat is included as part of the impute statement. Alternatively, you can specify an override:

set impute(stat, on ‖ off ‖ default);

If an override is coded, any impute statement that follows will either:

on: automatically be included in the stat data file as if stat() were coded.
off: any stat commands will be ignored.
default: behave using the default behavior, where the stat data file only includes entries for imputations with stat commands.

Imputation Files

Applications using the impute function can generate up to three files:

Imputation frequencies (with the default extension .impute_freq.lst)
Imputation stat dictionary (if using stat, with the default extension .impute_stat.dcf)
Imputation stat data (if using stat, with the default extension .impute_stat.csdb)

You can specify the names of these data files in the File Associations dialog or in your application's PFF file.

The frequencies report contains five columns:

                                  Imputed Item SEX: Sex
                                _____________________________ _____________
  Categories                         Frequency        CumFreq      %  Cum %
_______________________________ _____________________________ _____________
  1 Male                                   271            271   52.9   52.9
  2 Female                                 241            512   47.1  100.0
_______________________________ _____________________________ _____________
  Total                                    512            512  100.0  100.0

Categories: Lists the values that were assigned during the imputations and a value set label for the value (if applicable). For example: "2 Female."
Frequency: Shows the frequency (that is, the total number of times) each value was assigned. For example: 241 (code 2 assigned 241 times).
CumFreq: Displays the cumulative totals of the Frequency column.
%: Indicates what percentage each imputation represents from the total number of imputations made. For example: 47.1 (code 2 assigned 47.1% of the total number of imputations of SEX made).
Cum %: Displays the cumulative totals of the % column.

Return Value

When imputing a numeric item, the function returns the numeric expression new_value. When imputing an alphanumeric item, the function returns 1 (true).

Example

PROC SEX

if not invalueset(SEX) then

// set all heads to men and everyone else to the opposite of the head's sex
// (note that this is not a good imputation but is just a simple example)
if curocc() = 1 then
impute(SEX, 1)
title("Head's Sex")
specific;

else
impute(SEX, 3 - SEX(1));

endif;

See also: Imputation