Data Sources

Overview

In CSPro, a data source is a file used to store case data. Traditionally, CSPro data files were stored as text files, but starting with CSPro 7.0, the suggested data file format is CSPro DB, a proprietary format that stores all data about cases and supports synchronization.

The format of the cases in a data source is described by a dictionary containing levels, records, and items. The way that these entities are serialized depends on the data source used, and is described on the help pages for each data source.

CSPro Data Sources

The following are full-fledged data sources that work with CSPro data to facilicate reading and writing cases:

Data Source	Default Extension	Description
CSPro DB	.csdb	Cases are stored in a SQLite database in a relational format. This data source has the most functionality and you are encouraged to use it when possible.
Encrypted CSPro DB	.csdbe	A version of the CSPro DB data source that supports AES-256 encryption.
Text	.dat	Cases are represented as text lines, with one record per line, one case following another in the file.
JSON	.json	Cases are represented in JSON as an array of case objects.
None	—	A data source, not associated with any file, that does not contain any data.
In-Memory	—	A data source, not associated with any file, that stores cases in memory for the duration of the running application.

Export Data Sources

The following export data sources only support writing cases:

Data Source	Default Extension	Description
Comma Delimited (CSV)	.csv	Cases from a single record are written to a comma-separated values file.
Semicolon Delimited	.skv	Cases from a single record are written to a semicolon-separated values file.
Tab Delimited	.tsv	Cases from a single record are written to a tab-separated values file.
Excel	.xlsx	Cases are written to a Microsoft Excel file.
R	.RData / .rda	Cases are written to a R Data file that can be read in R.
SAS	.xpt	Cases are written to a SAS Transport file that can be read in SAS.
SPSS	.sav	Cases from a single record are written to a SPSS Statistics Data file that can be read in SPSS.
Stata	.dta	Cases from a single record are written to a Stata Data file that can be read in Stata.
CSPro Export	—	A data source that wraps another data source, allowing you to restrict what records are written.

Functionality

The functionality of each data source is summarized in the following table:

Feature	CSPro DB	Encrypted CSPro DB	Text	JSON	None	In-Memory	Comma Delimited	Semicolon Delimited	Tab Delimited	Excel	R	SAS	SPSS	Stata
Reading cases	✔	✔	✔	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘
Writing cases	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔
Notes, case labels, and case statuses	✔	✔	✔	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘
Storage of more than one kind of record	✔	✔	✔	✔	✔	✔	✘	✘	✘	✔	✔	✔	✘	✘
Binary data items	✔	✔	✘	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘
Deleting cases	✔	✔	✔	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘
Undeleting cases	✔	✔	✘	✔	✘	✔	✘	✘	✘	✘	✘	✘	✘	✘
Syncing data	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘
Cases with duplicate keys	✔	✔	✘	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘
Case identification via UUID	✔	✔	✘	✔	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘
Contains an embedded dictionary	✔	✔	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘	✘
Allows record sorts	✘	✘	✔	✘	✔	✘	✘	✘	✘	✘	✘	✘	✘	✘

Data sources that support notes, case labels, and case statuses store notes entered by the operator or set in logic, case labels set in logic, and case status information such as whether a case has been partially saved or verified.

Data sources that contain an embedded dictionary can be opened in Data Manager and some tools without the need to specify a dictionary.

Determining What Data Source Is Used

All data sources have behavior that can be customized by specifying properties in the connection string. When CSPro analyzes a connection string to determine what data source to use, the default behavior is to match the data source using the file extension. If the extension matches any of the extensions listed in the data source tables above, that data source is used. If it matches none, the Text data source is used. In other words, most extensions will map to a Text data source.

The "type" property can be used to override this behavior for data sources that do not use a proprietary file format. For example, the .tsv extension is associated with the Tab Delimited data source, but if you instead wanted to use that extension for a Text data source, you could use the connection string:

filename.tsv|type=Text

These are the "type" values that can be used to override the default behavior associated with the file extension: "Text", "JSON", "CSV", "Semicolon", and "Tab".

The data sources that do not use file names must be specified by specifying the "type": "None" and "Memory".

Finally, because CSPro Export wraps other data sources, to use it you must specify the "type": "CSProExport".

See also: Case Read Optimization