Deconcatenation of CSDBE Files

Discussions about tools to complement CSPro data processing
Forum rules
New release: CSPro 8.0
YFT_CBSD
Posts: 47
Joined: January 3rd, 2023, 12:36 am

Deconcatenation of CSDBE Files

Post by YFT_CBSD »

Hello,

Is there a way to deconcatenate a csdbe into a per case csdbe?

Example

I have a csdbe containing 100cases and i want it to be 1 csdbe file per case id.
Im currently using skip case but it is time consuming, is there a way to automate or make it faster?

Thanks
Gregory Martin
Posts: 1777
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Deconcatenation of CSDBE Files

Post by Gregory Martin »

Take a look at the setoutput function: https://www.csprousers.org/help/CSPro/s ... ction.html

In a batch application, you could change the output filename per case. Something like:
setoutput(maketext("%v%v.csdbe|password=1234", ID1, ID2));
YFT_CBSD
Posts: 47
Joined: January 3rd, 2023, 12:36 am

Re: Deconcatenation of CSDBE Files

Post by YFT_CBSD »

Thanks for this.

Additional question:

How can i automate it into 10case per csdb? and if possible random selection.

Thanks again.
Gregory Martin
Posts: 1777
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Deconcatenation of CSDBE Files

Post by Gregory Martin »

There's no way to determine the size of the output file while you're writing it. (Actually, you could close the file, check the size, and then reopen it.) Once thing you could do is try to approximate the number of cases in ten megabytes. Something like this:
// in the application preproc
numeric total_file_size = filesize(filename(ASSIGNMENTS_DICT));
numeric total_cases = countcases(ASSIGNMENTS_DICT);
numeric number_cases_per_10_mb = ( 10 * 1024 * 1024 ) / ( total_file_size / total_cases );

// in the level preproc or postproc
if inc(case_counter) >= number_cases_per_10_mb then
   
// setoutput to new file
   
case_counter = 0;
endif;
You could also work with random selection. For example this would roughly divide all your cases into 20 files:
setoutput(maketext("output-data-%02.csdb", random(1, 20)));
YFT_CBSD
Posts: 47
Joined: January 3rd, 2023, 12:36 am

Re: Deconcatenation of CSDBE Files

Post by YFT_CBSD »

I have an error on case_counter saying is not a declared variable.

or did i miss something out?

Thanks again.
sherrell
Posts: 397
Joined: April 2nd, 2014, 9:16 pm
Location: Washington, DC

Re: Deconcatenation of CSDBE Files

Post by sherrell »

You would have to declare the variable. It should be a global variable, as you only want to reset it to zero as indicated in the code block Greg posted; i.e., you wouldn't want it local to a dictionary variable PROC where it would be getting reset to zero each time it was executed.
YFT_CBSD
Posts: 47
Joined: January 3rd, 2023, 12:36 am

Re: Deconcatenation of CSDBE Files

Post by YFT_CBSD »

Yup, tried some global variable. but still cant figure it out.


// in the level preproc or postproc
if inc(case_counter) >= number_cases_per_10_mb then <-- in this if i use a variable(example CITY) then if CITY = 1, i dont get it how this code count
the cases which is >= number_cases_per_10_mb.
// setoutput to new file
case_counter = 0;
endif;

please enlighten me. thanks
sherrell
Posts: 397
Joined: April 2nd, 2014, 9:16 pm
Location: Washington, DC

Re: Deconcatenation of CSDBE Files

Post by sherrell »

First, apologies for taking so long to respond, I missed your post.

Second, you wrote above: How can i automate it into 10case per csdb? and if possible random selection.

So my understanding is you want to write 10 cases to each output file. Randomizing is a bit harder. See attached for what I've done to support these requests. If you're still having problems, please zip up your application so we can see what you're doing.

Sherrell
Attachments
test deconcatenate.zip
(7.99 KiB) Downloaded 82 times
YFT_CBSD
Posts: 47
Joined: January 3rd, 2023, 12:36 am

Re: Deconcatenation of CSDBE Files

Post by YFT_CBSD »

Hi It's working , thank you so mcuh.

Just a follow-up question,

total_cases = countcases(POP_DICT);
how_many_files = total_cases / 10;

I want to set the number in the total cases is it possible? Thanks
Example:
I only want 200 cases out of 1000.
so it wil be 200/10 .

Tried some random stuff but it always count the original countcases.

Thanks
sherrell
Posts: 397
Joined: April 2nd, 2014, 9:16 pm
Location: Washington, DC

Re: Deconcatenation of CSDBE Files

Post by sherrell »

>Hi It's working , thank you so mcuh.

great!

>I want to set the number in the total cases is it possible? Thanks
>I only want 200 cases out of 1000.

I'm not quite sure what you want to do. Meaning, you only want 200 out of the 1,000 cases to be exported, i.e., a 20% sample of your total cases?

If so, then you only need to make a a few adjustments. See attached for revised pgm.

Sherrell
Attachments
test deconcatenate v2.zip
(8.08 KiB) Downloaded 81 times
Post Reply