CSPro 5.0.1 to be Released Soon

By Gregory Martin / On July 17, 2012

CSPro 5.0.1 will be officially released soon, hopefully before the end of July.

The big feature that makes this a major release is the inclusion of Unicode support. This means that now dictionaries, forms, tables, etc. can support Arabic, Chinese, and all the various languages with scripts that do not use Latin characters. Data files can also include Unicode characters so they are now saved in UTF-8 format. When CSPro is released, there will be a Unicode Primer in the helps that will include more information about the implications of the change.

Another change is that PDA (PocketPC) support has been dropped, so anyone using CSPro for Windows Mobile devices must still use version 4.1.

Over the next few days I will blog about some new features that exist in this upgrade.

October 2012 Training Opportunity

By Gregory Martin / On June 6, 2012

The Statistical Services Centre at the University of Reading will conduct a six-day workshop from October 24 – 31, 2012. The workshop is entitled "Data Management Using CSPro: A Hands-On Approach." The workshop focuses on data entry and management of data, including using CSPro data with other statistical software packages. For more information, visit the SSC training page.

Calculating Population Densities

By Thomas Ondra / On June 6, 2012

(This example makes use of area name processing. Make sure that you understand area processing before you proceed with the example.)

Censuses often contain tables of population densities, e.g., population per square kilometer. While the census data file contains population totals, it generally does not contain information about the area (e.g., square kilometers, square miles, hectares, etc.) of a geographic level. This data is usually maintained in a separate file. This example illustrates how you can bring square area data into your application so that you can calculate population densities. Our standard for square area will be square kilometers. If you are using square miles, hectares, acres, etc., substitute accordingly. In the example below I am using the Popstan example in CSPro's example folder. Download the example.

1. Add a record to your census data dictionary to contain the square kilometer information. You will need to give this record a unique record type identifier. In this example I use "8" for the record type identifier for the square kilometers record.

2. Change all required records to "not required." (Make sure that your data has been properly edited!)

3. Obtain a file of square kilometers for the geographic levels. In this example, this data is contained in an Excel spreadsheet. You will need the lowest level of geography for which you are calculating population densities. In the attached example, I use Province and District, with District being my lowest level of geography.

4. Export the square kilometer data into a fixed form text file (*.prn). The format of this file must match the format specified in the record type created in step 1. Space fill or zero fill ID fields that are not used for the levels of geography of the population densities. You can use Text Viewer to view the square kilometers file (PopStan_Sq_Km.prn). Below shows a portion of the square kilometers data used in this example.

a. The first column contains "8." This is the record type of the square kilometers record.
b. Columns 2 and 3 contain the province code; 4 and 5 the district code. These are the geographic levels for which we will calculate population densities.
c. Columns 6 to 19 are the remaining ID files. These are space filled.
d. Columns 20 to 31 contain the name of the district. This is for information purposes only and is not used in the application. Note that we only have square kilometer data at the district level. This is because CSPro will calculate the province level data by summing the districts, and the Popstan total by summing the provinces.
e. Columns 32 to 36 contain the square kilometers for the district.

5. Now prepare your tables. In this example, the first table shows the distribution of square kilometers and the total square kilometers of a given geographic area. The second table shows the population for a given geographic area, the total square kilometers for that area, and the population density. Both tables use the same methods. We will focus on Table 2 because that table contains the population density.

You will need a column for square kilometers in your table. This will actually be a subtable in your table. Do this by dragging the square kilometers value set to the table. Since you only need one column, remove unneeded attributes for the square kilometers column by right-clicking on the column header, clicking on Tally Attributes for the variable, and removing "Total" from the selected attributes.

Since you are tallying the total square kilometers, you need to tally the value. Enter the variable name (SQUARE_KILOMETERS) for the "Value Tallied."

6. Run your tables. When you run the tables you will need to select both the data file and the square kilometers file.

The following is a portion of the resulting table:

How does this work?

The basic concept is that we are adding new cases that contain only ID and square kilometer information for the geographic level. When CSPro processes the record it tallies them to the appropriate level of geography. There is one and only one record for that level of geography.

After CSPro tallies the table, in consolidates them; i.e., it puts together the level of geography to sum up to higher levels and then sorts the tallied tables.

Try running against only the square kilometers file. Note that Table 1 looks the same but Table 2 contains no population data because that file was not included. Now run against only the census data file. Notice now that Table 1 has no data. This is because the square kilometers file was not included. Table 2 contains only population data put no square kilometers data. Put the Census Population Data file and the Square Kilometers file together in a single run and you have all you need to calculate population densities.

Splitting a Data File Using Batch Logic

By Gregory Martin / On June 6, 2012

Using only CSPro there is no simple way to split a data file into several parts. Someone asked me: "How would I split a file with 300 cases into six files, each with 50 cases?" It is possible to do this by writing a recursive batch program. This is not a particularly efficient way to split a file into parts, but it works fine for data files that are not so large. This code is probably not worth using if your data file contains more than a million cases.

What I do here is use the skip case statement to selectively write out cases. The first run of the program, I do nothing but create a PFF that calls the program again with the starting position. Then that program runs, writing out certain cases and skipping others, and then calls the program again, with a new starting position. This continues until the whole file has been processed. In the above example, the program would be run seven times, once to initialize the PFF, and then six times for each block of 50 cases. See the code below:

PROC GLOBAL

numeric numCasesPerFile = 50;

numeric currentCase,currentIteration,desiredStartCase,desiredEndCase;

file pffFile;

function writeOutPffAndStop(nextStartIteration)

setfile(pffFile,maketext("%s%d%d_%d.pff",pathname(temp),sysdate("YYYYMMDD"),systime(),nextStartIteration));

filewrite(pffFile,"[Run Information]");
filewrite(pffFile,"Version=CSPro 4.1");
filewrite(pffFile,"AppType=Batch");

filewrite(pffFile,"[Files]");
filewrite(pffFile,"Application=%ssplitFile.bch",pathname(application));
filewrite(pffFile,"InputData=%s",filename(CEN2000));
filewrite(pffFile,"OutputData=%s_%d",filename(CEN2000),nextStartIteration);
filewrite(pffFile,"Listing=%s.lst",filename(pffFile));

filewrite(pffFile,"[Parameters]");
filewrite(pffFile,"ViewListing=Never");
filewrite(pffFile,"ViewResults=Yes");
filewrite(pffFile,"Parameter=%d",nextStartIteration);

close(pffFile);

execpff(filename(pffFile));
stop();

end;

PROC DICTIONARY_FF

preproc

if sysparm() = "" then // we're on the first run
writeOutPffAndStop(1);

else
currentIteration = tonumber(sysparm());
desiredStartCase = 1 + ( currentIteration – 1 ) * numCasesPerFile;
desiredEndCase = desiredStartCase + numCasesPerFile – 1;

endif;

PROC QUEST

preproc

inc(currentCase);

if currentCase > desiredEndCase then
writeOutPffAndStop(currentIteration + 1);

elseif currentCase < desiredStartCase then
skip case;

endif;

You can use this code almost exactly as is, with the following modifications:

Modify the numeric numCasesPerFile from 50 to your liking.
Replace "CEN2000" with the name of your dictionary. (There are two places where this appears.)
Replace "DICTIONARY_FF" with the name of your top-level batch PROC. (It will end with _FF.)
Replace "QUEST" with the name of your dictionary's first level.

See here for an example of this application using the Popstan dictionary.

Tools for Edit Processing

By Gregory Martin / On April 18, 2012

Recently I put up two new tools that may be useful to people using CSPro to edit data.

The first tool, Listing File Comparer, provides a way of quickly looking at the error percentages across a group of listing files. This is useful for people who process data, typically census data, on files split by geography.

The second tool, Save Array Viewer, is a program that visually displays the contents of save array files. This program is especially useful if you use DeckArrays for hotdeck imputation.

Also newly posted on the site is a tutorial about creating CAPI applications written by Anne Abelsæth of Statistics Norway: "Development of Data Entry and CAPI Applications in CSPro"

←Newer | Older→