duplicate keys

AriSilva · Post by **AriSilva** » June 5th, 2020, 8:51 am

The csdb file accepts duplicate records, although some of the tools do not, like concatenate and sort. Those tools seem to be working still in the ".dat" mode. Besides suggesting that they should be upgraded somehow to be able to deal with the new data format, I have a specific problem on handling duplicate keys in a batch program.
I can easily detect the duplicate records, comparing the key with the previous one, and if they have some filed which make them different, i can skip this case, provided the one I want to keep is the previous one.
But this is irrelevant.
What I really want to have access is to know how to differentiate one record from the other, based in which internal field.
Can I have access to this field (or property) and write something like

if fieldxx = xxxxxxxx then skip case; endif;
Would I need to do some kind of sql operation for that?

aaronw · Post by **aaronw** » June 5th, 2020, 3:10 pm

To clarify you want to be able to identify identical records by looking at some sort of uniquely identifying record field?

No such field is going to exist. However, I don't see why you couldn't compare the results of two subqueries. Let me look into this a bit more.

AriSilva · Post by **AriSilva** » June 6th, 2020, 6:35 am

OK, I got it, let´s change a little bit about the subject.
If it is not possible to look at the csdb file and use any filed there to distinguish one case from the other with the same key, is it possible to implement an option in the sort and concatenate programs to keep the duplicates?

Gregory Martin · Post by **Gregory Martin** » June 8th, 2020, 10:24 am

You can use the UUID to differentiate one case from another. Every case is going to have a unique UUID. You'll see in CSPro 7.4 that you can load cases by UUID ... look at Data Viewer's Logic Helper for how to do that.

So if you wanted to remove certain duplicates, you could do something like:

if uuid(MY_DICT) = "..." then

    skip case;

endif;

You can get around your sort/concatenate issues by skipping those tools and just using batch programs. To combine multiple files while keeping duplicates, just run an empty batch program with multiple input files ... your output file will then have all the cases, including duplicates. If you want to sort that file by case ID, add this to your PFF:

InputOrder=Indexed

The output file will then be sorted.

AriSilva · Post by **AriSilva** » June 9th, 2020, 4:41 pm

I was using the uuid to catch it and save it into one of my questionnaire variables, but it did not solve my problem because I still was getting records with the same "faked" uuid, the one stored in the record.
I thought that information was stores=d in the csdb at creation tie, wen the record was generated. But as you are explaining, this might not be the case, so, I´m wondering when this information goes to the csdb? When the record is uploaded to the server?
By thinking in this direction, instead of using the one stored in my file, I did an exercise by detecting the duplicates and displaying the uuid(dict) as you said, and I´m not sure but looking at them with a naked eye I did not find any duplicity in the uuids.

Regarding the concatenation and using the indexed parameter, I was already doing that, and the records are coming sorted by he record key. My problem is that I need to sort them by this key AND another field n the dictionary, and there is no solution for that
I have a variable that keeps the status of the questionnaire. 0 for complete 1 for interrutped, 3 for not initiated (just for the sake of an example)
So, if I could sort my file by the key AND this status variable, I could easily discard the the ones after getting one with the same key that was completed.
Is it that difficult to implement a sort that has an option to keep the duplicates? Sorry, am I asking too much?

CSPro Users Forum

duplicate keys

duplicate keys

Re: duplicate keys

Re: duplicate keys

Re: duplicate keys

Re: duplicate keys