Page 1 of 2

Deconcatenation of CSDBE Files

Posted: January 17th, 2023, 8:47 pm
by YFT_CBSD
Hello,

Is there a way to deconcatenate a csdbe into a per case csdbe?

Example

I have a csdbe containing 100cases and i want it to be 1 csdbe file per case id.
Im currently using skip case but it is time consuming, is there a way to automate or make it faster?

Thanks

Re: Deconcatenation of CSDBE Files

Posted: January 18th, 2023, 7:48 am
by Gregory Martin
Take a look at the setoutput function: https://www.csprousers.org/help/CSPro/s ... ction.html

In a batch application, you could change the output filename per case. Something like:
setoutput(maketext("%v%v.csdbe|password=1234", ID1, ID2));

Re: Deconcatenation of CSDBE Files

Posted: January 26th, 2023, 1:22 am
by YFT_CBSD
Thanks for this.

Additional question:

How can i automate it into 10case per csdb? and if possible random selection.

Thanks again.

Re: Deconcatenation of CSDBE Files

Posted: January 26th, 2023, 6:15 am
by Gregory Martin
There's no way to determine the size of the output file while you're writing it. (Actually, you could close the file, check the size, and then reopen it.) Once thing you could do is try to approximate the number of cases in ten megabytes. Something like this:
// in the application preproc
numeric total_file_size = filesize(filename(ASSIGNMENTS_DICT));
numeric total_cases = countcases(ASSIGNMENTS_DICT);
numeric number_cases_per_10_mb = ( 10 * 1024 * 1024 ) / ( total_file_size / total_cases );

// in the level preproc or postproc
if inc(case_counter) >= number_cases_per_10_mb then
   
// setoutput to new file
   
case_counter = 0;
endif;
You could also work with random selection. For example this would roughly divide all your cases into 20 files:
setoutput(maketext("output-data-%02.csdb", random(1, 20)));

Re: Deconcatenation of CSDBE Files

Posted: January 26th, 2023, 10:10 pm
by YFT_CBSD
I have an error on case_counter saying is not a declared variable.

or did i miss something out?

Thanks again.

Re: Deconcatenation of CSDBE Files

Posted: January 26th, 2023, 10:53 pm
by sherrell
You would have to declare the variable. It should be a global variable, as you only want to reset it to zero as indicated in the code block Greg posted; i.e., you wouldn't want it local to a dictionary variable PROC where it would be getting reset to zero each time it was executed.

Re: Deconcatenation of CSDBE Files

Posted: January 27th, 2023, 3:51 am
by YFT_CBSD
Yup, tried some global variable. but still cant figure it out.


// in the level preproc or postproc
if inc(case_counter) >= number_cases_per_10_mb then <-- in this if i use a variable(example CITY) then if CITY = 1, i dont get it how this code count
the cases which is >= number_cases_per_10_mb.
// setoutput to new file
case_counter = 0;
endif;

please enlighten me. thanks

Re: Deconcatenation of CSDBE Files

Posted: February 2nd, 2023, 10:19 pm
by sherrell
First, apologies for taking so long to respond, I missed your post.

Second, you wrote above: How can i automate it into 10case per csdb? and if possible random selection.

So my understanding is you want to write 10 cases to each output file. Randomizing is a bit harder. See attached for what I've done to support these requests. If you're still having problems, please zip up your application so we can see what you're doing.

Sherrell

Re: Deconcatenation of CSDBE Files

Posted: February 3rd, 2023, 5:06 am
by YFT_CBSD
Hi It's working , thank you so mcuh.

Just a follow-up question,

total_cases = countcases(POP_DICT);
how_many_files = total_cases / 10;

I want to set the number in the total cases is it possible? Thanks
Example:
I only want 200 cases out of 1000.
so it wil be 200/10 .

Tried some random stuff but it always count the original countcases.

Thanks

Re: Deconcatenation of CSDBE Files

Posted: February 3rd, 2023, 5:38 pm
by sherrell
>Hi It's working , thank you so mcuh.

great!

>I want to set the number in the total cases is it possible? Thanks
>I only want 200 cases out of 1000.

I'm not quite sure what you want to do. Meaning, you only want 200 out of the 1,000 cases to be exported, i.e., a 20% sample of your total cases?

If so, then you only need to make a a few adjustments. See attached for revised pgm.

Sherrell