adding unique ID post-survey

Discussions about editing and cleaning data
Forum rules
New release: CSPro 8.0
Post Reply
brettkeller
Posts: 7
Joined: April 12th, 2018, 6:09 am

adding unique ID post-survey

Post by brettkeller »

As mentioned in my other question on recovering deleted cases (to which I received fast, helpful responses), I'm working with data that has already been collected in CS Pro. The output of the survey is four different .csdb databases (household characteristics and files on three subsets of household members). The analysis requires matching up the household characteristics to the other databases, so I need a unique ID across the datasets. Household data has about 7 geographic variables (region, county, etc), date of interview, GPS, etc -- plenty of variables to construct a unique ID after the fact. The three other databases just have admin unit 1, admin unit 5, and household number. In theory those should be unique, but in reality some enumerators selecting the wrong administrative units and/or household numbers, such that there are many duplicates before data cleaning.

In order to clean the data and do these merges, I need a unique ID. When I open the data in DB Browser for SQLite, I can see additional variables that aren't displayed in CSPro such as "id" and "key", which appear to be what I need. I imagine I could export the variables I need from this SQL browser directly. However, the data will be analyzed in Stata and exporting the data directly from CSPro to Stata automatically labels all the data, and importing it from a SQL export would create a ton of extra work.

I believe what I need to do is run a batch edit in CSPro (similar to what is described here: http://www.csprousers.org/forum/viewtop ... f=8&t=1559) that creates a new variable equal to these underlying id and key variables. However I'm new enough to CSPro that I can't figure out what.

Can someone help with how I would execute a batch edit to make these variables visible in the final CSPro export? Or is there another way to add a unique ID across these different databases? Thanks!
aaronw
Posts: 561
Joined: June 9th, 2016, 9:38 am
Location: Washington, DC

Re: adding unique ID post-survey

Post by aaronw »

I'm not certain I understand what the SQLite identifier gives you. Why not use the geographical ids from the household dictionary?

For example, let's say your household dictionary is constructed like this:
region + country + province + district + city + ea + household number

Then we could simply append a number from 1 - 3 to uniquely identify the household members:
region + country + province + district + city + ea + household number + 1/2/3

The part that seems challenging or tedious to me is linking the household members to the household? From what you've said, I assume this will be done manually? Is that correct?

Let me know if this makes sense, or if I am missing something. Then we can continue with the next steps.
brettkeller
Posts: 7
Joined: April 12th, 2018, 6:09 am

Re: adding unique ID post-survey

Post by brettkeller »

Sure, let me try to explain a bit further. Thanks for your help!

Household data set has, when viewed in CSPro: Admin1 + Admin2 + Admin 3 + Admin4 + Admin5 + HouseholdNumber + other data. These can be concatenated to make a unique ID. When viewed in SQLite browser it also has key and id, long text strings that appear to be unique for each household.

Other data sets for household members have: Admin2 + Admin5 + HouseholdNumber and other data, but none of the other data from the household data. These data sets do not have date of interview, GPS, or any other variables that overlap with the Household data set. Admin2 + Admin5 + HouseholdNumber when concatenated are not unique - there are several hundred duplicates. Without other variables that appear in both these data sets and the household data set, I don't think it would be possible to manually link these at all. These data sets also have the key and id text strings when browsed in DB SQLite.

I.e, if I can get the key and id variables to show up in CSPro, by editing with a batch edit, then I could simply use those variables to merge the data sets together. Otherwise I don't see how it can be done manually.
brettkeller
Posts: 7
Joined: April 12th, 2018, 6:09 am

Re: adding unique ID post-survey

Post by brettkeller »

Also, I did already construct a unique ID within the household data set from just the sort of concatenation you described. But then I realized I couldn't make the same variable with the available data in CSPro.
aaronw
Posts: 561
Joined: June 9th, 2016, 9:38 am
Location: Washington, DC

Re: adding unique ID post-survey

Post by aaronw »

Here's how you can get the underlying uuid and key:
UNIQUEID = uuid(SIMPLECAPI_DICT);
KEYID = key(SIMPLECAPI_DICT);
To demonstrate I took Simple CAPI from the examples folder and entered three cases. I then opened the Export Data tool and selected my export options. This allows me to generate logic that will run the export. I copy the logic from Options > Copy Logic to Clipboard. Here's what it looks like:
PROC GLOBAL
SET EXPLICIT;

NUMERIC rec_occ;

FILE cspro_export_file_var_f;

PROC SIMPLECAPI_LEVEL
PreProc
    set behavior() export (TabDelim, ItemOnly, ANSI);

    For rec_occ in RECORD PERSON_REC do
        EXPORT TO cspro_export_file_var_f
        CASE_ID(HOUSEHOLD_ID)
        PERSON_REC;
    Enddo;
Next, create a batch application and copy this logic into it. Our batch application will have the same behavior as the Data Export tool. However, now we can add additional logic. Add a working storage dictionary, so we can add a temporary dictionary variable UNIQUEID. Then I store the uuid for each case in the UNIQUEID and export it as the case id.
PROC SIMPLE_CAPI_FF

PROC GLOBAL
SET EXPLICIT;

NUMERIC rec_occ;
FILE cspro_export_file_var_f;

PROC SIMPLECAPI_LEVEL

PreProc

    set behavior() export (TabDelim, ItemOnly, ANSI);
    UNIQUEID = uuid(SIMPLECAPI_DICT);
    //UNIQUEID = key(SIMPLECAPI_DICT);

    For rec_occ in RECORD PERSON_REC do
        EXPORT TO cspro_export_file_var_f
        CASE_ID(UNIQUEID)
        PERSON_REC;
    Enddo;
The exported data will be written to "NEW SIMPLE CAPI DATA.TAB."
Attachments
Simple CAPI.zip
(15.26 KiB) Downloaded 444 times
brettkeller
Posts: 7
Joined: April 12th, 2018, 6:09 am

Re: adding unique ID post-survey

Post by brettkeller »

Thanks so much. I was able to run an export in the Simple CAPI dataset and copy the code to notepad, and then create a batch edit program that would run that export from Simple CAPI. So I did the same with both a tab and Stata export using my dataset. I got that work, but only after editing the dictionary file to add the UNIQUEID / uuid. Did you have to do that? If not, do you know why I would get an error there?

As is, when I run this code I get a Compile error:

Code: Select all

PROC GLOBAL
SET EXPLICIT;

NUMERIC rec_occ;
FILE cspro_export_file_var_f;

PROC Survey2017_HOUSEHOLD_FF
PROC Survey2017_QUEST

PreProc

	set behavior() export (Stata, ItemOnly, Unicode);
	UNIQUEID = uuid(Survey2017_DICT);
	//UNIQUEID = key(Survey017_DICT);

	For rec_occ in RECORD INTERVIEW_RECORDS_EDT do
		EXPORT TO cspro_export_file_var_f
		CASE_ID(UNIQUEID)
		INTERVIEW_RECORDS_EDT;
	Enddo;
The error reads:

Code: Select all

 
 ERROR:  Record name expected near line 8 in Survey2017_QUEST procedure
WARNING: File handler 'CSPRO_EXPORT_FILE_VAR_F' is not used in the application
The cspro_export_file_var_f was in the code generated by the export for this household dataset (i.e., the original I copy/pasted from) so I'm not sure why that's the error, unless I missed a step somewhere else?
aaronw
Posts: 561
Joined: June 9th, 2016, 9:38 am
Location: Washington, DC

Re: adding unique ID post-survey

Post by aaronw »

I did not edit SIMPLECAPI_DICT.

Instead, I added a working storage dictionary.
working-storage-dict.GIF
working-storage-dict.GIF (12.56 KiB) Viewed 10290 times
In the working storage dictionary I added the UNIQUEID.
unique-id.GIF
unique-id.GIF (8.05 KiB) Viewed 10290 times
Let me know if this fixes your issues.
brettkeller
Posts: 7
Joined: April 12th, 2018, 6:09 am

Re: adding unique ID post-survey

Post by brettkeller »

This did work! Thanks so much for your help.
Post Reply