Hierarchical Value Sets Using a Selcase Trick


The selcase (“select case”) function is used to display a list of cases in an external dictionary, letting an interviewer select a case to load. One function not mentioned in that page’s help documentation is the ability for the user to select multiple cases. By using the multiple keyword, the interviewer can select more than one case and then iterate over each of those cases using a for loop.

An undocumented feature allows for all qualified cases to be automatically marked. Using the automark keyword, the selcase dialog is not shown to the interviewer. These two sets of code are the same:

// create a dynamic value set showing the EAs in Region 1 / District 1
valueset ea_vs;

// code demo 1 --- using a forcase loop
forcase EA_DICT where REGION = 1 and DISTRICT = 1 do
   
ea_vs.add(EA_NAME, EA);
endfor;

// code demo 2 --- automatically marking cases and then using a for loop
selcase(EA_DICT, "") where REGION = 1 and DISTRICT = 1 multiple(automark);

for EA_DICT do
   
ea_vs.add(EA_NAME, EA);
endfor;

Because selcase allows you to pass a key to match (“” in the example above, which means all case keys), you can use this as a trick to efficiently create value sets if you know what part of the key is. For example, if you have a data file with 50,000 cases, forcase will always loop through all 50,000 cases, whereas providing a key match may limit your loop to substantially fewer cases.

To show a possible use for this trick, we will look at two ways of creating hierarchical value sets for geocodes. Supposing we have three levels of geography—Region, District, and EA—one way to structure a geocode lookup file is as follows:

Region District EA Geocode Name
1     Region 1
1 2   District 2 in Region 1
1 2 5 EA 5 in Region 1 / District 2

That is, when defining regions, the district and EA codes are left blank, and when defining districts, the EA code is left blank.

Using a forcase loop to populate the districts based on a selected region, we would loop over the entire data file, filtering on cases where the geocode region matches the selected region, where the geocode district is defined, but where the geocode EA is blank:

PROC DISTRICT

preproc

    valueset
geography_vs;

   
forcase GEOCODES_DICT where GEOCODE_REGION = REGION and
                               
GEOCODE_DISTRICT <> notappl and
                               
GEOCODE_EA = notappl do

       
geography_vs.add(GEOCODE_NAME, GEOCODE_DISTRICT);

   
endfor;

   
setvalueset(DISTRICT, geography_vs);

Prior to this, to generate the region value set, we would look for cases where the geocode district is blank, and then following this, to generate the EA value set, we would look for cases where the region and district match the selected codes and where the EA is defined. To generate the hierarchical value sets for the three levels of geography would require fairly different loops.

With the selcase automark trick, we can create a single function that can be used to generate the value set for each level of geography:

PROC GLOBAL

valueset
geography_vs;

function CreateGeographyVS(string already_selected_key, numeric geocode_length)

    // clear the dynamic value set
   
geography_vs.clear();

    // automatically select all cases that match the key passed in as a parameter
   
selcase(GEOCODES_DICT, already_selected_key) multiple(automark);

    // this geocode starts at the position after the already selected key
   
numeric new_key_offset = length(already_selected_key) + 1;

    // loop over the cases that match the already selected key
   
for GEOCODES_DICT do

        // extract the remaining geocodes
       
string new_key_portion = strip(key(GEOCODES_DICT)[new_key_offset]);

        // when the remaining geocodes match the geocode length we are expecting,
        // this is a match so add it to the value set
       
if length(new_key_portion) = geocode_length  then
           
geography_vs.add(GEOCODE_NAME, tonumber(new_key_portion));
       
endif;

   
endfor;

    // make sure that there was at least one geocode for this hierarchical level
   
if geography_vs.length() = 0 then
        errmsg
("Geocode lookup error ... the geocode database is not complete.");
       
stop(1);
   
endif;

end;

We call this function from each procedure, specifying the currently selected geocode and the geocode length at each level. In this example, we assume that the region is length 1 and that the other two geocodes are length 2:

PROC REGION

preproc

   
CreateGeographyVS("", 1);
   
setvalueset(REGION, geography_vs);

PROC DISTRICT

preproc

   
CreateGeographyVS(maketext("%v", REGION), 2);
   
setvalueset(DISTRICT, geography_vs);

PROC EA

preproc

   
CreateGeographyVS(maketext("%v%v", REGION, DISTRICT), 2);
   
setvalueset(EA, geography_vs);

Now we have a generalizable function that we can use in our censuses or surveys, a function that will work with any number of levels of geography.


#Logic