When CSPro reads or writes text files, there are two primary attributes to consider when determining how the contents of the text file are read, and how the contents are written such that they can be successfully processed by other applications:
- Text encoding: Whether the file is encoded as UTF-8 or ANSI.
- Newline characters: Whether line endings are written in Windows or UNIX formats.
An encoding defines how a text file's bytes are represented as characters. Most modern programs use
UTF-8, a
Unicode-based variable-length encoding scheme that uses between one and four bytes to represent a character. CSPro fully supports the encodings
ANSI and
UTF-8, and partially supports
UTF-16 LE.
Prior to
version 5.0, CSPro interpreted files as
ANSI, but when CSPro
became Unicode compliant, it started writing out files using UTF-8 with a
byte order mark (BOM). This three-byte BOM helped CSPro differentiate between ANSI files, created prior to version 5.0, and UTF-8 files created with newer versions.
In this release, CSPro 8.1, a file without a BOM is considered to be ANSI. However, modern programs generally use UTF-8 without a BOM, and CSPro will adopt this approach in a future release.
The following table outlines how CSPro currently interprets and writes text encodings:
Due to a variety of historical reasons, operating systems use different control characters when writing
newlines. As a program with its roots as a Windows application, CSPro traditionally used two characters to signify a newline: a carriage return and a line feed character (
"\r\n"). Unix-based systems use only the line feed character (
"\n").
When reading text files, CSPro can process files specified with either Windows- or Unix-style newlines, "\r\n" or "\n".
When writing text files, CSPro is inconsistent in the way that newlines are written. This will be standardized in a future release so that "\n" is the default for all files. For some files, it will be possible to override this setting. The following table outlines the current state of writing newlines: