CSPro 7.4 has introduced a new and improved recode statement that will allow you to get more done in a single recode. To see the differences between the old and new recode you will compare implementations of a typical education edit. Note that the old recode has been deprecated.

Specification

The goal of the two recode implementations will be to verify that students currently attending school are within the allowable age range for their grade. The specfication below defines the valid combination of grades and ages.

alt text

Old Recode Implementation

Using the old recode syntax notice that you will need two recode statements. One for the minimum age and another for the maximum age.

PROC P10_GRADE_NOW_ATTENDING

preproc

    ask if
P09_ATTEND = 1 and P04_AGE >= 3; // Currently attending school

postproc

    numeric
minAgeForGrade, maxAgeForGrade;

   
recode P10_GRADE_NOW_ATTENDING => minAgeForGrade;
                                 
0 => 3;  // Preschool and kindergarten
                                 
1 => 5;
                                 
2 => 6;
                                 
3 => 7;
                                 
4 => 8;
                                 
5 => 9;
                                 
6 => 10;
                                 
7 => 11;
                                 
8 => 12;
                                 
9 => 13;
                               
10 => 14;
                               
11 => 15;
                               
12 => 16;
                               
13 => 16; // University but not graduate school
                               
14 => 18; // Graduate school
                               
15 => 15; // Trade or technical school
   
endrecode;

   
recode P10_GRADE_NOW_ATTENDING => maxAgeForGrade;
                                 
0 => 6;  // Preschool and kindergarten
                                 
1 => 8;
                                 
2 => 9;
                                 
3 => 10;
                                 
4 => 11;
                                 
5 => 12;
                                 
6 => 13;
                                 
7 => 14;
                                 
8 => 15;
                                 
9 => 18;
                               
10 => 20;
                               
11 => 21;
                               
12 => 22;
                               
13 => 95; // University but not graduate school
                               
14 => 95; // Graduate school
                               
15 => 95; // Trade or technical school
   
endrecode;

   
if P04_AGE in minAgeForGrade:maxAgeForGrade then
        errmsg
("P10_GRADE_NOW_ATTENDING(=%d) is valid for age(=%d)", $, P04_AGE);
   
else
        errmsg
("P10_GRADE_NOW_ATTENDING(=%d) NOT valid for age(=%d)", $, P04_AGE);
       
reenter;
   
endif;

Syntax Change Between Old and New Recode

To make use of the new recode statement you will have to write your recodes with a slighly modified syntax. The table below documents these changes.

Operator Old Recode Syntax New Recode Syntax
Assignment => ->
And : ::
Range - :

New Recode Implementation

With this new recode statement you can determine the minimum and maximum ages within a single recode. Making your logic easier to understand and maintain.

PROC P10_GRADE_NOW_ATTENDING

preproc

    ask if
P09_ATTEND = 1 and P04_AGE >= 3; // Currently attending school

postproc

    numeric
minAgeForGrade, maxAgeForGrade;

   
recode P10_GRADE_NOW_ATTENDING -> minAgeForGrade :: maxAgeForGrade;
                                 
0 ->              3 :: 6;  // Preschool and kindergarten
                                 
1 ->              5 :: 8;
                                 
2 ->              6 :: 9;
                                 
3 ->              7 :: 10;
                                 
4 ->              8 :: 11;
                                 
5 ->              9 :: 12;
                                 
6 ->             10 :: 13;
                                 
7 ->             11 :: 14;
                                 
8 ->             12 :: 15;
                                 
9 ->             13 :: 18;
                               
10 ->             14 :: 20;
                               
11 ->             15 :: 21;
                               
12 ->             16 :: 22;
                               
13 ->             16 :: 95; // University but not graduate school
                               
14 ->             18 :: 95; // Graduate school
                               
15 ->             15 :: 95; // Trade or technical school
   
endrecode;

   
if P04_AGE in minAgeForGrade:maxAgeForGrade then
        errmsg
("P10_GRADE_NOW_ATTENDING(=%d) is valid for age(=%d)", $, P04_AGE);
   
else
        errmsg
("P10_GRADE_NOW_ATTENDING(=%d) NOT valid for age(=%d)", $, P04_AGE);
       
reenter;
   
endif;

Flag Implementation

Another approach is to create a test flag. In the logic below, grade_is_valid is used to show whether or not the combination of grades and ages are valid. This can further increase the readability of your logic.

PROC P10_GRADE_NOW_ATTENDING

preproc

    ask if
P09_ATTEND = 1 and P04_AGE >= 3; // Currently attending school

postproc

    numeric
grade_is_valid = false;

   
recode P10_GRADE_NOW_ATTENDING :: P04_AGE -> grade_is_valid;
                                 
0 ::  3:6    -> true; // Preschool and kindergarten
                                 
1 ::  5:8    -> true;
                                 
2 ::  6:9    -> true;
                                 
3 ::  7:10   -> true;
                                 
4 ::  8 11   -> true;
                                 
5 ::  9:12   -> true;
                                 
6 :: 10:13   -> true;
                                 
7 :: 11:14   -> true;
                                 
8 :: 12:15   -> true;
                                 
9 :: 13:18   -> true;
                               
10 :: 14:20   -> true;
                               
11 :: 15:21   -> true;
                               
12 :: 16:22   -> true;
                               
13 :: 16:95   -> true; // University but not graduate school
                               
14 :: 18:95   -> true; // Graduate school
                               
15 :: 15:95   -> true; // Trade or technical school
   
endrecode;

   
if grade_is_valid then
        errmsg
("P10_GRADE_NOW_ATTENDING(=%d) is valid for age(=%d)", $, P04_AGE);
   
else
        errmsg
("P10_GRADE_NOW_ATTENDING(=%d) NOT valid for age(=%d)", $, P04_AGE);
       
reenter;
   
endif;

#Logic   


CSPro 7.4 has a new function ischecked. This function returns whether a code is part of a check box field’s selections. Prior to CSPro 7.4 we would use the pos function. So why use the ischecked function rather than the pos function?

Issue with Pos

Let’s look at how the LANGUAGE_SPOKEN variable would be set up. Since a person could speak multiple languages we will use a check box and the language question might look like:

alt text

Here English, French, Russian, and Spanish are checked. In previous versions of CSPro we would use the pos function to see if a specific language was checked. For example, if we wanted to know if French was checked we would use:

if pos("21", LANGUAGE_SPOKEN) then
    errmsg
("French is checked");
else
    errmsg
("French is not checked");
endif;

The error message “French is checked” would be issued. Continuing with our example we could ask if Russian is checked:

if pos("23", LANGUAGE_SPOKEN) then
    errmsg
("Russian is checked");
else
    errmsg
("Russian is not checked");
endif;

In this case pos would return a 1 (true) since Russian is checked and the error message “Russian is checked” would be issued.

If we asked if Hindi is checked:

if pos("33", LANGUAGE_SPOKEN) then
    errmsg
("Hindi is checked");
else
    errmsg
("Hindi is not checked");
endif;

The pos function would return a 0 (false) and the message “Hindi is not checked” would be issued.

Now let’s test if Bengali is checked:

if pos("32", LANGUAGE_SPOKEN) then
    errmsg
("Bengali is checked");
else
    errmsg
("Bengali is not checked");
endif;

The pos would return a 6 (true) and the message “Bengali is checked” would be issued.

But Bengali is not checked. What happened? The pos (“32”, LANGUAGE_SPOKEN) searched the string “11212324” found “32” in positions 6-7 returning a 6.

Explanation

The check box codes are placed at uniformly spaced offsets based on the size of the code. For example, if the check box field has a length of 20 and each code has a length of 2, then each selected code is placed in the respective 2-digit offset. That is, positions 1-2 for the 1st code, position 3-4 for the next code, and so on.

alt text

The data are stored in the file as shown here:

alt text

The pos function does not look by offset, but instead looks for a substring match. Unfortunately, the substring “32” does exist in the data and a false match is found. In previous versions of CSPro we would need to loop through the string being searched by 2 for the language code:

numeric languageFound = 0;

do varying numeric idx = 1 while idx <= length(LANGUAGE_SPOKEN) by 2
   
if LANGUAGE_SPOKEN[idx:2] = "32" then
       
languageFound = 1;
       
break;
   
endif;
enddo;

if languageFound then
    errmsg
("Bengali is checked");
else
    errmsg
("Bengali is not checked");
endif;

Moving Forward with IsChecked

CSPro 7.4 greatly simplifies this check with the ischecked function. The ischecked function checks for the codes at the appropriate offsets, in this case the function checks positions 1-2, 3-4, 5-6, 7-8, …, 19-29 for the code “32”. To check for Bengali we simply use:

if ischecked("32", LANGUAGE_SPOKEN) then
    errmsg
("Bengali is checked");
else
    errmsg
("Bengali is not checked");
endif;

The ischecked function returns a 0 (false) since “32” is not found within one of the uniformly spaced offsets. To do this CSPro requires the codes to be a uniform length. Notice all codes in this example were length 2.


#Logic   


The selcase (“select case”) function is used to display a list of cases in an external dictionary, letting an interviewer select a case to load. One function not mentioned in that page’s help documentation is the ability for the user to select multiple cases. By using the multiple keyword, the interviewer can select more than one case and then iterate over each of those cases using a for loop.

An undocumented feature allows for all qualified cases to be automatically marked. Using the automark keyword, the selcase dialog is not shown to the interviewer. These two sets of code are the same:

// create a dynamic value set showing the EAs in Region 1 / District 1
valueset ea_vs;

// code demo 1 --- using a forcase loop
forcase EA_DICT where REGION = 1 and DISTRICT = 1 do
   
ea_vs.add(EA_NAME, EA);
endfor;

// code demo 2 --- automatically marking cases and then using a for loop
selcase(EA_DICT, "") where REGION = 1 and DISTRICT = 1 multiple(automark);

for EA_DICT do
   
ea_vs.add(EA_NAME, EA);
endfor;

Because selcase allows you to pass a key to match (“” in the example above, which means all case keys), you can use this as a trick to efficiently create value sets if you know what part of the key is. For example, if you have a data file with 50,000 cases, forcase will always loop through all 50,000 cases, whereas providing a key match may limit your loop to substantially fewer cases.

To show a possible use for this trick, we will look at two ways of creating hierarchical value sets for geocodes. Supposing we have three levels of geography—Region, District, and EA—one way to structure a geocode lookup file is as follows:

Region District EA Geocode Name
1     Region 1
1 2   District 2 in Region 1
1 2 5 EA 5 in Region 1 / District 2

That is, when defining regions, the district and EA codes are left blank, and when defining districts, the EA code is left blank.

Using a forcase loop to populate the districts based on a selected region, we would loop over the entire data file, filtering on cases where the geocode region matches the selected region, where the geocode district is defined, but where the geocode EA is blank:

PROC DISTRICT

preproc

    valueset
geography_vs;

   
forcase GEOCODES_DICT where GEOCODE_REGION = REGION and
                               
GEOCODE_DISTRICT <> notappl and
                               
GEOCODE_EA = notappl do

       
geography_vs.add(GEOCODE_NAME, GEOCODE_DISTRICT);

   
endfor;

   
setvalueset(DISTRICT, geography_vs);

Prior to this, to generate the region value set, we would look for cases where the geocode district is blank, and then following this, to generate the EA value set, we would look for cases where the region and district match the selected codes and where the EA is defined. To generate the hierarchical value sets for the three levels of geography would require fairly different loops.

With the selcase automark trick, we can create a single function that can be used to generate the value set for each level of geography:

PROC GLOBAL

valueset
geography_vs;

function CreateGeographyVS(string already_selected_key, numeric geocode_length)

    // clear the dynamic value set
   
geography_vs.clear();

    // automatically select all cases that match the key passed in as a parameter
   
selcase(GEOCODES_DICT, already_selected_key) multiple(automark);

    // this geocode starts at the position after the already selected key
   
numeric new_key_offset = length(already_selected_key) + 1;

    // loop over the cases that match the already selected key
   
for GEOCODES_DICT do

        // extract the remaining geocodes
       
string new_key_portion = strip(key(GEOCODES_DICT)[new_key_offset]);

        // when the remaining geocodes match the geocode length we are expecting,
        // this is a match so add it to the value set
       
if length(new_key_portion) = geocode_length  then
           
geography_vs.add(GEOCODE_NAME, tonumber(new_key_portion));
       
endif;

   
endfor;

    // make sure that there was at least one geocode for this hierarchical level
   
if geography_vs.length() = 0 then
        errmsg
("Geocode lookup error ... the geocode database is not complete.");
       
stop(1);
   
endif;

end;

We call this function from each procedure, specifying the currently selected geocode and the geocode length at each level. In this example, we assume that the region is length 1 and that the other two geocodes are length 2:

PROC REGION

preproc

   
CreateGeographyVS("", 1);
   
setvalueset(REGION, geography_vs);

PROC DISTRICT

preproc

   
CreateGeographyVS(maketext("%v", REGION), 2);
   
setvalueset(DISTRICT, geography_vs);

PROC EA

preproc

   
CreateGeographyVS(maketext("%v%v", REGION, DISTRICT), 2);
   
setvalueset(EA, geography_vs);

Now we have a generalizable function that we can use in our censuses or surveys, a function that will work with any number of levels of geography.


#Logic   


CSPro 7.3 introduces new ways to work with dynamic value sets. Dynamic value sets define the acceptable options for a field and they vary based on responses previously given. Typical value sets, defined in the data dictionary, define a fixed set of responses for a field, but with a dynamic value set, you can customize these responses based on specific conditions.

Prior to CSPro 7.3, you could create dynamic value sets using arrays, but working with these was cumbersome and not intuitive. Now there is a valueset object that allow for simpler, and more sophisticated, value set creation. Four scenarios are presented below that show how to use the new valueset object.

Easily Create a Dynamic Value Set in a Loop

A typical task is to create a value set based on some attributes entered previously. For example, you might want to present a list of people in a household who are aged 15+ as eligible heads of household. Using the valueset object with a for loop with a where condition makes this task trivial:

PROC HOUSEHOLD_HEAD

preproc

    valueset
household_head_vs;

   
for numeric line_number in PERSON_ROSTER where AGE >= 15 do
       
household_head_vs.add(NAME, line_number);
   
endfor;

   
setvalueset(HOUSEHOLD_HEAD, household_head_vs);

Combining Value Sets

Suppose you have a question that asks about the way that someone deceased. In the dictionary there is one set of responses that applies to all people and an additional set of responses that applies to females aged 12+. Now you can easily create a dynamic value set, conditionally adding the female aged 12+ responses:

PROC MORTALITY_REASON

onfocus

    valueset
mortality_reason_vs = MORTALITY_REASON_ALL_PEOPLE_VS;

   
if SEX = 2 and AGE >= 12 then
       
mortality_reason_vs.add(MORTALITY_REASON_FERTILE_WOMEN_VS);
   
endif;

   
setvalueset(MORTALITY_REASON, mortality_reason_vs);

Removing a Value Based on a Previous Selection

Sometimes a questionnaire has a series of questions that asks about preferences, such as, “What is your favorite color?,” and then, “What is your second favorite color?” The list of options for the second question can exclude the selected answer to the first question. The valueset object makes this task very easy:

PROC SECOND_FAVORITE_COLOR

preproc

    valueset
second_favorite_color_vs = FAVORITE_COLOR_VS;

    second_favorite_color_vs.remove(FAVORITE_COLOR);

   
setvalueset(SECOND_FAVORITE_COLOR, second_favorite_color_vs);

Iterate Through Value Set Codes and Labels

Finally, there are two lists that are part of a value set, accessed using the codes and labels attributes. Just as valueset is a new object in CSPro 7.3, lists, though around in some form for years, are now fully useable objects. This simplifies iterating through the codes and labels of a value set. For example, if the first two digits of the county code are equal to the state code, a dynamic value set for counties could be created as follows:

PROC COUNTY

preproc

    valueset
filtered_county_vs;

   
numeric first_county_code = STATE * 100;
   
numeric last_county_code = first_county_code + 99;

   
do numeric counter = 1 while counter <= COUNTY_VS.codes.length()

       
if COUNTY_VS.codes(counter) in first_county_code:last_county_code then
           
filtered_county_vs.add(COUNTY_VS.labels(counter), COUNTY_VS.codes(counter));
       
endif;

   
enddo;

   
setvalueset(COUNTY, filtered_county_vs);

#Logic   


There are many ways of formatting the text data you collect in a CSPro application. For example, in the United States it is common to write a telephone number as xxx-xxx-xxxx or (xxx) xxx-xxxx. If only a text field is used, the interviewer could enter either format. However, not knowing the format creates extra work post-data collection, so as the application developer you will want to accept a single format.

This is done using the regexmatch function which was introduced in CSPro 7.2. The function takes two strings, the target and a regular expression and returns whether there is a match or not. In this example, the target string is the telephone number and the regular expression string describes the valid variations of the telephone number.

Regular expressions have their own syntax separate from CSPro logic. To help write your regular expression you can use any regular expression editor that supports the ECMAScript (JavaScript) engine (or flavor).

Writing a Regular Expression

Let us write a regular expression that describes a telephone number in the following format: xxx-xxx-xxxx. We will use the online regular expression editor regex101, make sure to select ECMAScript as the flavor. Start by typing the phone number 123-456-7890 into the test string field. As you write the regular expression, you will notice that the test string is highlighted as it is described by the regular expression.

Step 1

alt text

Begin your regular expression by asserting its position at the start of a newline. This will keep your phone number from matching something like otherData123-456-7890.

Step 2

alt text

The first character is any number from 0 to 9.

Step 3

alt text

The following two characters are also any numbers from 0 to 9. Signal that the pattern will repeat three times.

Step 4

alt text

The next character is a hyphen, and will match nothing else, so enter the literal hyphen character.

Step 5

alt text

Notice the pattern of the next four characters is the same as the past four. Wrap everything, but the caret in parentheses to create a capture group and signal that the pattern will repeat two times.

Step 6

alt text

The last four characters are any numbers from 0 to 9. Signal that the pattern will repeat four times.

Step 7

alt text

Finally, end your regular expression by asserting its position at the end of a newline. This will keep your phone number from matching something like 123-456-7890otherData.

Validating a Text Field

With your regular expression in hand, you are ready to validate the telephone number in CSPro. Call regexmatch passing in the telephone number and the regular expression. If 0 is returned then display an error message and re-enter. This allows the interviewer to correct the telephone number. Otherwise, if 1 is returned, do nothing and let the interview continue.

PROC TELEPHONE_NUMBER

postproc

    if regexmatch(TELEPHONE_NUMBER, "^([0-9]{3}-){2}[0-9]{4}$") = 0 then
        errmsg("Invalid format! Use the following format: xxx-xxx-xxxx.");
        reenter;
    endif;

To see a working example, download the regexmatch application.


#Logic