Text Attributes

When CSPro reads or writes text files, there are two primary attributes to consider when determining how the contents of the text file are read, and how the contents are written such that they can be successfully processed by other applications:

Text encoding: Whether the file is encoded as UTF-8 or ANSI.
Newline characters: Whether line endings are written in Windows or UNIX formats.

Text Encoding

An encoding defines how a text file's bytes are represented as characters. Most modern programs use UTF-8, a Unicode-based variable-length encoding scheme that uses between one and four bytes to represent a character. CSPro fully supports the encodings ANSI and UTF-8, and partially supports UTF-16 LE.

Prior to version 5.0, CSPro interpreted files as ANSI, but when CSPro became Unicode compliant, it started writing out files using UTF-8 with a byte order mark (BOM). This three-byte BOM helped CSPro differentiate between ANSI files, created prior to version 5.0, and UTF-8 files created with newer versions.

In this release, CSPro 8.1, a file without a BOM is considered to be ANSI. However, modern programs generally use UTF-8 without a BOM, and CSPro will adopt this approach in a future release.

The following table outlines how CSPro currently interprets and writes text encodings:

Text File Type	Input Encoding If No BOM	Output Encoding	Override

Specification files:
JSON-based files	UTF-8	UTF-8	—
Non-JSON specification files	ANSI	UTF-8 (BOM)	—

Text files created in logic:
Files written using the Action Invoker	—	UTF-8	encoding argument
Files accessed using the File object	ANSI	UTF-8 (BOM)	🔜

Data sources:
Text data files	ANSI	UTF-8 (BOM)	🔜
Text notes files	ANSI	UTF-8 (BOM)	"encoding" connection string property
Text status files	ANSI	UTF-8 (BOM)	"encoding" connection string property
Comma Delimited	—	UTF-8 (BOM)	"encoding" connection string property
Semicolon Delimited	—	UTF-8 (BOM)	"encoding" connection string property
SAS syntax file	—	UTF-8 (BOM)	"encoding" connection string property
Tab Delimited	—	UTF-8 (BOM)	"encoding" connection string property

Other files:
Logic code	UTF-8	UTF-8 (BOM)	—
Question text	UTF-8	UTF-8 (BOM)	—
User-defined messages	UTF-8	UTF-8 (BOM)	—
Listing files	—	UTF-8 (BOM)	🔜
Operator statistics	ANSI	UTF-8 (BOM)	🔜

Other functionality:
Concatenated text files	ANSI	UTF-8 (BOM)	"encoding" connection string property

Newline Characters

Due to a variety of historical reasons, operating systems use different control characters when writing newlines. As a program with its roots as a Windows application, CSPro traditionally used two characters to signify a newline: a carriage return and a line feed character ("\r\n"). Unix-based systems use only the line feed character ("\n").

When reading text files, CSPro can process files specified with either Windows- or Unix-style newlines, "\r\n" or "\n".

When writing text files, CSPro is inconsistent in the way that newlines are written. This will be standardized in a future release so that "\n" is the default for all files. For some files, it will be possible to override this setting. The following table outlines the current state of writing newlines:

Text File Type	Default Newline	Override

Specification files:
JSON-based files	"\n"	—
Non-JSON specification files	"\r\n"	—

Text files created in logic:
Files written using the Action Invoker	"\n"	newline argument
Files written using the File object	"\n"	🔜

Data sources:
Text data files	"\r\n"	🔜
Text notes files	"\r\n"	"newline" connection string property
Text status files	"\r\n"	"newline" connection string property
Comma Delimited	"\r\n"	"newline" connection string property
Semicolon Delimited	"\r\n"	"newline" connection string property
SAS syntax file	"\r\n"	"newline" connection string property
Tab Delimited	"\r\n"	"newline" connection string property

Other files:
Logic code	"\r\n"	—
Question text	"\r\n"	—
User-defined messages	"\r\n"	—
Listing files	"\r\n"	🔜
Operator statistics	"\r\n"	🔜

Other functionality:
Concatenated text files	"\r\n"	"newline" connection string property

See also: Newline Handling, Unicode Primer