CSPro DB files are
SQLite database files. SQLite is a file-based relational database which means that it is possible to use SQL commands to query the CSPro data. SQL queries can be run from a CSPro program using the
sqlquery function. In addition, there are also many free viewers for SQLite that allow you to access the data outside of CSPro.
Note that this information applies only to CSPro DB files created in version 7.4 or later. Data files created in earlier versions store all the data from a case in a single database column so it is not possible to access individual items using SQL.
The structure of the database is based on the data dictionary. Each record in the data dictionary corresponds to a table in the database. The database tables have the same name as the record in the dictionary. Each item in the record corresponds to a column in the table with the same name as the dictionary item. The ID items from the dictionary are stored in a table named level-1 (two level dictionaries will have a second table named level-2 for the second level id-items). The tables for each record are linked to the table for the level by the primary key of the level table which is named level 1-id.
In the example above, the dictionary has three records: PERSON_REC, HOUSING_REC and DEATHS_REC. The database therefore has the corresponding records person_rec, housing_rec and deaths_rec. The dictionary ID items are in the table level-1 which is linked to the record tables by the column level-1-id.
Each record table has an auto-increment integer primary key column named <record name>-id. This column does not correspond to any dictionary item and its value is set automatically by SQLite.
Records with multiple occurrences have an additional column named occ which represents the occurrence number. If a case in the data file has, for example, three occurrences of a particular record then the values of the occ column for the rows in the record table for that case will be 1, 2 and 3.
All CSPro DB files also have a table named
cases that contains additional information about each case in the data file such as the case label, whether or not the case was deleted, and whether or not it was partially saved. The ID column of the
cases table is a
UUID stored as a string. The database table for the first level in the dictionary is linked to the cases table by this ID column.
CSPro DB files also contain additional tables that store field notes as well as information required for data synchronization.
To find the rows in the record tables that correspond to a particular case in the data file, join the record table with the level table using the linking column and filter on the ID items. For example, the following query on the example database above lists all the individuals in province 1, district 1, enumeration area 3, household 1:
SELECT * FROM person_rec
JOIN `level-1` ON `level-1`.`level-1-id` = person_rec.`level-1-id`
WHERE province = 1 AND district = 1 AND enumeration_area = 3 AND household_number = 1
Note the use of ` around level-1-id. This is required so that SQLite does not treat the "-" character in the name as a minus sign which would cause a syntax error.
The query above could potentially include data from deleted cases. To exclude deleted cases, join with the cases table and filter on the deleted column:
SELECT * FROM person_rec
JOIN `level-1` ON `level-1`.`level-1-id` = person_rec.`level-1-id`
JOIN cases ON cases.id = `level-1`.`case-id`
WHERE deleted = 0 AND province = 1 AND district = 1 AND enumeration_area = 3 AND household_number = 1
Using GROUP BY, the following query gets the number of households by enumeration area:
SELECT province, district, enumeration_area, COUNT(*) AS `number of households`
FROM `level-1`
JOIN cases ON cases.id = `level-1`.`case-id`
WHERE deleted = 0
GROUP BY province, district, enumeration_area
Modifying a CSPro DB file using SQL commands is strongly discouraged. Editing the schema or using update and insert queries to modify the data in a CSPro DB file can leave the file in a state where it can no longer be read by CSPro. In addition, any modifications to cases in the data file that do not correctly update the file_revisions and vector_clock tables will leave the file in a state where data synchronization will no longer work correctly. All modifications to the CSPro DB file should be done using CSPro data entry or batch edit applications or CSPro tools.