duplicate keys

Discussions about editing and cleaning data
Post Reply
AriSilva
Posts: 470
Joined: July 22nd, 2016, 3:55 pm

duplicate keys

Post by AriSilva » June 5th, 2020, 8:51 am

The csdb file accepts duplicate records, although some of the tools do not, like concatenate and sort. Those tools seem to be working still in the ".dat" mode. Besides suggesting that they should be upgraded somehow to be able to deal with the new data format, I have a specific problem on handling duplicate keys in a batch program.
I can easily detect the duplicate records, comparing the key with the previous one, and if they have some filed which make them different, i can skip this case, provided the one I want to keep is the previous one.
But this is irrelevant.
What I really want to have access is to know how to differentiate one record from the other, based in which internal field.
Can I have access to this field (or property) and write something like

if fieldxx = xxxxxxxx then skip case; endif;
Would I need to do some kind of sql operation for that?
Best
Ari

aaronw
Posts: 399
Joined: June 9th, 2016, 9:38 am
Location: Washington, DC

Re: duplicate keys

Post by aaronw » June 5th, 2020, 3:10 pm

To clarify you want to be able to identify identical records by looking at some sort of uniquely identifying record field?

No such field is going to exist. However, I don't see why you couldn't compare the results of two subqueries. Let me look into this a bit more.

AriSilva
Posts: 470
Joined: July 22nd, 2016, 3:55 pm

Re: duplicate keys

Post by AriSilva » June 6th, 2020, 6:35 am

OK, I got it, let´s change a little bit about the subject.
If it is not possible to look at the csdb file and use any filed there to distinguish one case from the other with the same key, is it possible to implement an option in the sort and concatenate programs to keep the duplicates?
Best
Ari

Gregory Martin
Posts: 1418
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: duplicate keys

Post by Gregory Martin » June 8th, 2020, 10:24 am

You can use the UUID to differentiate one case from another. Every case is going to have a unique UUID. You'll see in CSPro 7.4 that you can load cases by UUID ... look at Data Viewer's Logic Helper for how to do that.

So if you wanted to remove certain duplicates, you could do something like:
if uuid(MY_DICT) = "..." then
    skip case
;
endif;
You can get around your sort/concatenate issues by skipping those tools and just using batch programs. To combine multiple files while keeping duplicates, just run an empty batch program with multiple input files ... your output file will then have all the cases, including duplicates. If you want to sort that file by case ID, add this to your PFF:

InputOrder=Indexed

The output file will then be sorted.

AriSilva
Posts: 470
Joined: July 22nd, 2016, 3:55 pm

Re: duplicate keys

Post by AriSilva » June 9th, 2020, 4:41 pm

I was using the uuid to catch it and save it into one of my questionnaire variables, but it did not solve my problem because I still was getting records with the same "faked" uuid, the one stored in the record.
I thought that information was stores=d in the csdb at creation tie, wen the record was generated. But as you are explaining, this might not be the case, so, I´m wondering when this information goes to the csdb? When the record is uploaded to the server?
By thinking in this direction, instead of using the one stored in my file, I did an exercise by detecting the duplicates and displaying the uuid(dict) as you said, and I´m not sure but looking at them with a naked eye I did not find any duplicity in the uuids.

Regarding the concatenation and using the indexed parameter, I was already doing that, and the records are coming sorted by he record key. My problem is that I need to sort them by this key AND another field n the dictionary, and there is no solution for that
I have a variable that keeps the status of the questionnaire. 0 for complete 1 for interrutped, 3 for not initiated (just for the sake of an example)
So, if I could sort my file by the key AND this status variable, I could easily discard the the ones after getting one with the same key that was completed.
Is it that difficult to implement a sort that has an option to keep the duplicates? Sorry, am I asking too much?
Best
Ari

Post Reply