• <GetStart>
  • CSPro User's Guide
    • The CSPro System
      • What is CSPro?
      • CSPro Capabilities
      • Release History
      • What's New in CSPro?
      • CSPro Applications
      • CSPro General Concepts
      • CSPro General Functionality
        • Data Sources
        • Data Requirements
        • Connection String
        • Encrypted Data
        • Text Attributes
        • Unicode Primer
        • Synchronization Overview
        • Paradata
        • Multiple Language Applications
        • Mapping
        • Questionnaire View
      • How To ...
    • Data Dictionary Module
    • The CSPro Language
    • Data Entry Module
    • Batch Editing Applications
    • Tabulation Applications
    • Data Sources
    • Synchronization
    • CSPro Statements and Functions
    • Text Templates
    • Templated Reporting System
    • HTML, Markdown, and JavaScript Integration
    • Action Invoker
    • Appendix
  • <CSEntry>
  • <CSBatch>
  • <CSTab>
  • <DataManager>
  • <TextView>
  • <TblView>
  • <CSFreq>
  • <CSDeploy>
  • <CSPack>
  • <CSDiff>
  • <CSConcat>
  • <Excel2CSPro>
  • <CSExport>
  • <CSIndex>
  • <CSReFmt>
  • <CSSort>
  • <ParadataConcat>
  • <ParadataViewer>
  • <CSCode>
  • <CSDocument>
  • <CSView>
  • <CSWeb>

Text Attributes

When CSPro reads or writes text files, there are two primary attributes to consider when determining how the contents of the text file are read, and how the contents are written such that they can be successfully processed by other applications:
  • Text encoding: Whether the file is encoded as UTF-8 or ANSI.
  • Newline characters: Whether line endings are written in Windows or UNIX formats.
Text Encoding
An encoding defines how a text file's bytes are represented as characters. Most modern programs use UTF-8, a Unicode-based variable-length encoding scheme that uses between one and four bytes to represent a character. CSPro fully supports the encodings ANSI and UTF-8, and partially supports UTF-16 LE.
Prior to version 5.0, CSPro interpreted files as ANSI, but when CSPro became Unicode compliant, it started writing out files using UTF-8 with a byte order mark (BOM). This three-byte BOM helped CSPro differentiate between ANSI files, created prior to version 5.0, and UTF-8 files created with newer versions.
In this release, CSPro 8.1, a file without a BOM is considered to be ANSI. However, modern programs generally use UTF-8 without a BOM, and CSPro will adopt this approach in a future release.
The following table outlines how CSPro currently interprets and writes text encodings:
Text File TypeInput Encoding
If No BOM
Output EncodingOverride
 
Specification files:
JSON-based filesUTF-8UTF-8—
Non-JSON specification filesANSIUTF-8 (BOM)—
 
Text files created in logic:
Files written using the Action Invoker—UTF-8encoding argument
Files accessed using the File objectANSIUTF-8 (BOM)🔜
 
Data sources:
Text data filesANSIUTF-8 (BOM)🔜
Text notes filesANSIUTF-8 (BOM)"encoding" connection string property
Text status filesANSIUTF-8 (BOM)"encoding" connection string property
Comma Delimited—UTF-8 (BOM)"encoding" connection string property
Semicolon Delimited—UTF-8 (BOM)"encoding" connection string property
SAS syntax file—UTF-8 (BOM)"encoding" connection string property
Tab Delimited—UTF-8 (BOM)"encoding" connection string property
 
Other files:
Logic codeUTF-8UTF-8 (BOM)—
Question textUTF-8UTF-8 (BOM)—
User-defined messagesUTF-8UTF-8 (BOM)—
Listing files—UTF-8 (BOM)🔜
Operator statisticsANSIUTF-8 (BOM)🔜
 
Other functionality:
Concatenated text filesANSIUTF-8 (BOM)"encoding" connection string property
Newline Characters
Due to a variety of historical reasons, operating systems use different control characters when writing newlines. As a program with its roots as a Windows application, CSPro traditionally used two characters to signify a newline: a carriage return and a line feed character ("\r\n"). Unix-based systems use only the line feed character ("\n").
When reading text files, CSPro can process files specified with either Windows- or Unix-style newlines, "\r\n" or "\n".
When writing text files, CSPro is inconsistent in the way that newlines are written. This will be standardized in a future release so that "\n" is the default for all files. For some files, it will be possible to override this setting. The following table outlines the current state of writing newlines:
Text File TypeDefault NewlineOverride
 
Specification files:
JSON-based files"\n"—
Non-JSON specification files"\r\n"—
 
Text files created in logic:
Files written using the Action Invoker"\n"newline argument
Files written using the File object"\n"🔜
 
Data sources:
Text data files"\r\n"🔜
Text notes files"\r\n""newline" connection string property
Text status files"\r\n""newline" connection string property
Comma Delimited"\r\n""newline" connection string property
Semicolon Delimited"\r\n""newline" connection string property
SAS syntax file"\r\n""newline" connection string property
Tab Delimited"\r\n""newline" connection string property
 
Other files:
Logic code"\r\n"—
Question text"\r\n"—
User-defined messages"\r\n"—
Listing files"\r\n"🔜
Operator statistics"\r\n"🔜
 
Other functionality:
Concatenated text files"\r\n""newline" connection string property
See also: Newline Handling, Unicode Primer