Gregory Martin

 

Recently I put up two new tools that may be useful to people using CSPro to edit data.

The first tool, Listing File Comparer, provides a way of quickly looking at the error percentages across a group of listing files. This is useful for people who process data, typically census data, on files split by geography.

The second tool, Save Array Viewer, is a program that visually displays the contents of save array files. This program is especially useful if you use DeckArrays for hotdeck imputation.

Also newly posted on the site is a tutorial about creating CAPI applications written by Anne Abelsæth of Statistics Norway: “Development of Data Entry and CAPI Applications in CSPro

 

Unfortunately, CSPro does not have a way, within a data entry application, to get a listing of the IDs of the other cases that have been entered to the primary data file. If your application is somewhat simple, you can write a two-dictionary application to facilitate a basic version of case management.

In this example, the main dictionary of the application is a junk dictionary. Any data entered to this dictionary will be ignored. We only use this dictionary to provide the framework for the main data entry application and as a way to enter a menu selection.

The external dictionary and form is actually where data is entered for this application. By using loadcase and writecase statements, it is possible to add and modify cases.

Whenever the data entry application is started, an array is populated with information about all of the cases in the data file. In this example, the program is hardcoded to expect that information about households 1-8 will eventually be added to the file. The program reports on what has been entered and what cases are remaining.

This list is created by using the find statement to check on all of the expected IDs in the external file (which really is the main data file for this application). Then the setvalueset and setcapturetype functions display the results on the screen.

Download the example here, or view the code below.

PROC GLOBAL

array numeric casesIDs(100);
array alpha (30) casesLabels(100);

numeric numberCasesExpected = 8; // eight cases expected for the cluster

PROC MENU_FF

PROC MENU_ID

onfocus

    MENU_ID = notappl; // reset any value that might be here

    numeric cnt,numLabels,someCasesNotEntered;

    do cnt = 1 while cnt <= numberCasesExpected
        
        HHID = cnt;
        casesIDs(numLabels) = HHID;

        if find(QUESTIONNAIRE_DICT,=,itemlist(HHID)) then
            casesLabels(numLabels) = maketext("Modify Household %d",cnt);

        else
            casesLabels(numLabels) = maketext("Add Household %d",cnt);
            someCasesNotEntered = 1;

        endif;

        inc(numLabels);

    enddo;
    
    casesIDs(numLabels) = 99;
    casesLabels(numLabels) = "Quit";

    inc(numLabels);
    casesIDs(numLabels) = notappl; // end the dynamic value set
    
    setvalueset(MENU_ID,casesIDs,casesLabels);
    setcapturetype(MENU_ID,1);

postproc

    if MENU_ID = 99 then

        if someCasesNotEntered then
        
            if ( errmsg("You are not finished entering cases. Are you sure you want to quit?") select("Yes",continue,"No",continue) ) = 2 then
                reenter;
            endif;
        
        endif;
        
        stop(1);

    endif;

    HHID = MENU_ID;

    if not loadcase(QUESTIONNAIRE_DICT,HHID) then
        clear(QUESTIONNAIRE_DICT); // we are adding a new case so we must make sure the fields are blank
        HHID = MENU_ID;
    endif;

    enter QUESTIONNAIRE_FF;

    writecase(QUESTIONNAIRE_DICT); // write the case to the data file

    reenter;

 

As mentioned in my previous post, Unicode support (and thus internationalization) will be a great addition to CSPro, coming out in the next half year. After that, the development team plans to focus on the CAPI (computer assisted personal interviewing) world. CSPro currently supports a very basic version of CAPI, but only for Windows platforms. With the proliferation of Android devices, as well as the upcoming Windows 8 tablets, CSPro must adapt to this new world of enumeration.

The world of small-scale surveys may not change dramatically, but the impact of technology on censuses is huge. This is a photo from an East African country that recently conducted a census. The warehouse in this photo stores all of the census forms and requires many workers to operate:

What if an EA were not in the final set … how easy would it be to find in a mountain of forms? Imagine a world in which all data collection is conducted on a phone or a tablet and immediately sent to the operation headquarters. Data editing would be minimized and the time from collection to publication could effectively be cut to almost zero. Such a world will be an exciting one.

 

To all CSPro users around the world, the CSPro development team wishes you a happy holiday season. I have been fortunate this year to have had the chance to work on CSPro data processing with users in Armenia, Bangladesh, Cambodia, Kenya, and Paraguay. Add in the hundreds of users who have emailed with questions, and CSPro users add up to a nice community. I hope that this website, and increased content next year, will help you fulfill your needs.

2012 should be a good year for CSPro, first with the addition of Unicode support, and then with steps towards the future of data entry, with focus on Android and other handheld development.

 

When CSPro has an interface for a tool, I suggest that you use the interface as much as possible, but on occasion a user might want to write code for more advanced functionality. One such example is when writing code for advanced export operations.

The CSPro help documents include some information on the export statement, but I often forget the syntax or what exactly the parameters should be. Fortunately, now with CSPro 4.1.002, a feature of the Export Data tool allows you to view the code that powers the export. By selecting Options -> Copy Logic to Clipboard, you can then take the export logic and insert it in the batch application of your choosing. This is a nice way to quickly get the basic export code needed and then you can build off of that.

For an example of a batch export that uses a lookup file, imagine that I have a data file describing stores and their customers. One of the attributes of a customer record is a country code that describes where the customer lives. When exporting the names of the customers, I do not want the two-letter country code, but the full name of the country. Using this list of country codes as a lookup file, I would go into the export tool, select the fields that I want to export, and then after copying the logic to the clipboard I would see code like this:

PROC GLOBAL
SET EXPLICIT;

NUMERIC rec_occ;

FILE cspro_export_file_var_f;

PROC COUNTRYCODEEXPORT_QUEST
PreProc
    set behavior() export ( CommaDelim , ItemOnly );

    For rec_occ in RECORD CUSTOMER_REC do
        EXPORT TO cspro_export_file_var_f
        CASE_ID(STORE_ID)
        STORE_REC, CUSTOMER_REC;
    Enddo;

I would insert this code into a new batch application, add my lookup file code, and I would end up with something like this:

PROC GLOBAL

NUMERIC rec_occ;

FILE cspro_export_file_var_f;

PROC COUNTRYCODEEXPORT_FF

PROC COUNTRYCODEEXPORT_QUEST

PreProc

    set behavior() export ( CommaDelim , ItemOnly );

    For rec_occ in RECORD CUSTOMER_REC do
    
        // look up the country name
        if not loadcase(COUNTRYCODES_DICT,CUSTOMER_COUNTRY) then
            errmsg("Could not find country code: %s",CUSTOMER_COUNTRY);
            FULL_NAME = "<invalid>";
        endif;
    
        EXPORT TO cspro_export_file_var_f
        CASE_ID(STORE_ID)
        STORE_REC, CUSTOMER_REC FULL_NAME; // FULL_NAME is added here (it comes from the lookup file)
    Enddo;

This feature in the Export Data tool vastly simplified my task, allowing me to focus on the lookup file programming, rather than the syntax of the export statement. My exported file now contains data from two files:

500  Pastry Pantry     Barack Obama          US  United States
500  Pastry Pantry     Angela Merkel         DE  Germany
500  Pastry Pantry     Hu Jintao             CN  China
800  Chocolate Heaven  Jacob Zuma            ZA  South Africa
800  Chocolate Heaven  Alexander Lukashenko  BY  Belarus

This is the input file:

10500Pastry Pantry
20500Barack Obama                                      US
20500Angela Merkel                                     DE
20500Hu Jintao                                         CN
10800Chocolate Heaven
20800Jacob Zuma                                        ZA
20800Alexander Lukashenko                              BY

Download this example.

 

In CSPro 4.1.002, you can customize the menus of the data entry application, CSEntry, to change the menu options to be in the language of your choice, as long as the language can be represented in ASCII characters (most European languages). In the future, when CSPro supports Unicode, all language scripts will be supported.

To override the default text options, you must create a file called csentry.menu and place it either in the Program Files\CSPro 4.1\ folder, or in the folder where the PFF file for your application is located. This file has a format that is easy to follow. For example, to override the File menu text, you would place this text in the file:

File=Fichier
File_Open=Ouvrir une application
File_OpenDat=Ouvrir un fichier de données
File_Save=Sauvegarde partielle du questionnaire
File_Exit=Fermer

To add shortcut keys, place an ampersand before the shortcut letter. For example:

File=&Fichier
File_Open=&Ouvrir une application
File_OpenDat=Ouvrir un fichier de &données
File_Save=&Sauvegarde partielle du questionnaire
File_Exit=&Fermer

Placing the file in the Program Files\CSPro 4.1\ folder means that it will affect every single data entry application run on that machine. Alternatively, placing it in a folder with the PFF allows you to have different menus for different users.

Soon on this site I will post French and Spanish language menus on the Tools page. For now, if you would like to create your own menus, use this template file.

 

CSPro 4.1.002 comes with a few new functions that may come in handy for your data processing needs. The function getusername returns the name under which the user logged onto Windows. This can be useful if you want to restrict access to a data entry program for only a few users. Imagine a control system for a data entry operation in which only supervisors have access to parts of the system. By giving the supervisors logins such as super1, super2, etc., you could restrict access as follows:

OPERATOR_NAME = getusername();

if pos("super",tolower(OPERATOR_NAME)) <> 1 then
    errmsg("You must be logged in as a supervisor on this machine to access this program.");
    stop(1);
endif;

OPERATOR_NUMBER = tonumber(OPERATOR_NAME[6]); // starting at position 6 will skip past "super"

The function randomin works by accepting as an argument either an in list or a value set. The function then returns a random value that falls somewhere in the listed values. It is now easy to get a random value within a non-continuous range:

numeric randomElectionYear = randomin(2000,2004,2008,2012);

You can weight certain values by repeating the occurrence of the value in the in list. For example, the following example will return three times as many 1 values as 2 values:

randomin(1,1,1,2);

The function is probably most useful when used with a value set. For example, if you have a data file for a survey and you are planning on adding a question to the survey about religion, you may want to test out your edits for this variable. Your old data does not have this variable, so you would write a batch edit program and then add random religion data to a test data file, which you could pass through your edit programs:

PROC RELIGION

    RELIGION = randomin(RELIGION_VS1);

The randomizevs function is useful for data entry applications that use the extended controls introduced in CSPro 4.1. Sometimes you may want to ask questions and randomize the order of the possible responses to avoid bias due to the item positioning. In the preproc of the relevant question, you could execute the randomizevs function to present a different order each time:

PROC POLITICAL_PARTY

preproc

    randomizevs(PARTY_LIST_VSET,exclude(98,99));

In the above example, 98 might be a “None of the above” code and 99 might be a “Do not know” code. You do not want these codes randomized with the rest, you always want them at the bottom of the value set list, so you exclude them from the randomization.

 

This code looks like it will not compile correctly:

PROC GLOBAL

PROC APPLICATION_FF

preproc

    errmsg("Welcome to CSPro, this is version %s",versionNumber());

The compiler may complain that versionNumber is not a function that exists. However, starting in CSPro 4.1.002, you can attach multiple logic files to an application. In this case, if versionNumber is defined in a different file, the code will compile successfully.

When compiling your code, CSPro will load any additional logic files as if the text of these files were inserted in PROC GLOBAL. This means that only things that can go in PROC GLOBAL can be declared in an external logic file, but that includes variables, arrays, and user-defined functions that you use across many applications. In the above example, I might create a file, myfunctions.app, that contains code that I use frequently. It would look like this:

function alpha versionNumber()

    versionNumber =  "4.1.002";

end;

There is no need to write PROC GLOBAL in this external logic file. To add the file to the application, I select File -> Add Files, and then select my External Logic File:

These files are listed in the tree in the Files window under the main logic file for your application. External logic files can be removed by selecting File -> Drop Files.

Download the above example as a batch application.

 

In CSPro 4.1.002 there are three small language changes and additions that may make writing code easier. First, the change:

numeric numChildren = count(POP_REC where RELATIONSHIP = 3); // old
numeric numChildren = count(RELATIONSHIP = 3); // new

In the past, when using the count function it was necessary to specify what record or group you wanted to search through (in the above case, POP_REC). Now you can write the code without specifying this clause.

The seek function, new to CSPro 4.1, searches a record for the first instance of something being true. Now, in CSPro 4.1.002, you can search for the nth occurrence of the conditional statement. For example, this loop will continue until there are no longer two spouses in a household:

do while seek(RELATIONSHIP = 2,@2)

    // change the second spouse’s relationship

enddo;

Finally, the sort function has long been a useful feature of the CSPro language, but now you can use it with a where statement. For example, a common request is to have a roster sorted based on relationships. This example sorts a roster by relationship, and then sorts the children in order by descending age:

sort(POP_REC_EDT using RELATIONSHIP);
sort(POP_REC_EDT using -AGE where RELATIONSHIP = 3);
 

CSPro does not have a constant modifier that can be used to indicate that a variable cannot be modified, but the principle of using constant values is still one that is useful for CSPro users. For example, if your task is to write an edit program that replaces invalid sex values with a random selection of the previous 10 valid sex values, here are two ways you could code it.

Version 1

PROC GLOBAL

array sexHD(10);
numeric sexCounter;

PROC APPLICATION_FF

preproc

    sexHD(1) = 1;
    sexHD(2) = 2;
    sexHD(3) = 1;
    sexHD(4) = 2;
    sexHD(5) = 1;
    sexHD(6) = 2;
    sexHD(7) = 1;
    sexHD(8) = 2;
    sexHD(9) = 1;
    sexHD(10) = 2;
    
    sexCounter = 1;
    seed(systime());

PROC SEX

    if SEX in 1:2 then { valid sex }
        sexHD(sexCounter) = SEX;
        
        sexCounter = sexCounter + 1;
        
        if sexCounter > 10 then
            sexCounter = 1;
        endif;
    
    else { invalid sex, use the hotdeck }
        impute(SEX,sexHD(random(1:10)));
    
    endif;

Version 2

PROC GLOBAL

numeric sizeSexHD = 10;
array sexHD(sizeSexHD) = 1 2 ...;
numeric sexCounter = 1;

PROC APPLICATION_FF

preproc

    seed(systime());

PROC SEX

    if SEX in 1:2 then // valid sex
        sexHD(sexCounter) = SEX;
        
        inc(sexCounter);
        
        if sexCounter > sizeSexHD then
            sexCounter = 1;
        endif;
    
    else // invalid sex, use the hotdeck
        impute(SEX,sexHD(random(1:sizeSexHD)));
    
    endif;

The first version is the way you might have coded this application using older versions of CSPro. The second version uses features in CSPro 4.1. The second version shows how you can initialize the values for temporary variables and arrays while declaring them in PROC GLOBAL. It also shows the use of the inc (increment) function, as well as the new CSPro 4.1.002 feature which allows you to specify the size of your array by using a declared variable.

The advantages of the second version are two-fold. First, the amount of code is reduced. Secondly, the code is much more dynamic. If, for instance, you are told that the edit should now choose between the previous 20 valid sex values, you only need to change the value of sizeSexHD. Changing this one value affects the size of the array, the number of times that the array is initialized with the alternating values of 1 and 2, the reset of the sexCounter increment variable, and the selection of values from the array.

Using constants when possible is a great idea for edits as well. Using these arrays to store constant values makes code much clearer, and allows for the easy editing of the parameters under which your program runs. Note in the following example the use of a value set to define the size of the educationAges array. These constant values can now be accessed using the getdeck function.

//                                      Male    Female
array minAgeForChildbearing(P16_VS1) =  15      12;
array maxAgeForChildbearing(P16_VS1) =  99      49;

//                                      Min     Max
array educationAges(P21_VS1,2) =        0       99      // no schooling
                                        5       12      // class 1
                                        6       13      // class 2
                                        7       14      // class 3
                                        8       15      // class 4
                                        9       16      // class 5
                                        10      17      // class 6
                                        11      18      // class 7
                                        12      19      // class 8
                                        13      20      // class 9
                                        14      99      // class 10
                                        16      99      // bachelor’s
                                        19      99      // master’s
                                        20      99      // PhD
                                        21      99      // postdoc
                                        ;

View the contents of this blog by using the RSS feed or sign up for the CSPro News mailing list.