Jun 062012
 

Using only CSPro there is no simple way to split a data file into several parts. Someone asked me: “How would I split a file with 300 cases into six files, each with 50 cases?” It is possible to do this by writing a recursive batch program. This is not a particularly efficient way to split a file into parts, but it works fine for data files that are not so large. This code is probably not worth using if your data file contains more than a million cases.

What I do here is use the skip case statement to selectively write out cases. The first run of the program, I do nothing but create a PFF that calls the program again with the starting position. Then that program runs, writing out certain cases and skipping others, and then calls the program again, with a new starting position. This continues until the whole file has been processed. In the above example, the program would be run seven times, once to initialize the PFF, and then six times for each block of 50 cases. See the code below:

PROC GLOBAL

numeric numCasesPerFile = 50;

numeric currentCase,currentIteration,desiredStartCase,desiredEndCase;

file pffFile;

function writeOutPffAndStop(nextStartIteration)

    setfile(pffFile,maketext("%s%d%d_%d.pff",pathname(temp),sysdate("YYYYMMDD"),systime(),nextStartIteration));

    filewrite(pffFile,"[Run Information]");
    filewrite(pffFile,"Version=CSPro 4.1");
    filewrite(pffFile,"AppType=Batch");
    
    filewrite(pffFile,"[Files]");
    filewrite(pffFile,"Application=%ssplitFile.bch",pathname(application));
    filewrite(pffFile,"InputData=%s",filename(CEN2000));
    filewrite(pffFile,"OutputData=%s_%d",filename(CEN2000),nextStartIteration);
    filewrite(pffFile,"Listing=%s.lst",filename(pffFile));
    
    filewrite(pffFile,"[Parameters]");
    filewrite(pffFile,"ViewListing=Never");
    filewrite(pffFile,"ViewResults=Yes");
    filewrite(pffFile,"Parameter=%d",nextStartIteration);
    
    close(pffFile);
    
    execpff(filename(pffFile));
    stop();

end;

PROC DICTIONARY_FF

preproc

    if sysparm() = "" then // we’re on the first run
        writeOutPffAndStop(1);

    else
        currentIteration = tonumber(sysparm());
        desiredStartCase = 1 + ( currentIteration – 1 ) * numCasesPerFile;
        desiredEndCase = desiredStartCase + numCasesPerFile – 1;

    endif;

PROC QUEST

preproc

    inc(currentCase);
    
    if currentCase > desiredEndCase then
        writeOutPffAndStop(currentIteration + 1);
    
    elseif currentCase < desiredStartCase then
        skip case;
    
    endif;

You can use this code almost exactly as is, with the following modifications:

1) Modify the numeric numCasesPerFile from 50 to your liking.

2) Replace “CEN2000″ with the name of your dictionary. (There are two places where this appears.)

3) Replace “DICTIONARY_FF” with the name of your top-level batch PROC. (It will end with _FF.)

4) Replace “QUEST” with the name of your dictionary’s first level.

See here for an example of this application using the Popstan dictionary.

Sorry, the comment form is closed at this time.