Option in CsPro 5 to keep ANSI data file as is

What would you like to see in CSPro?
Post Reply
GVLORICA
Posts: 4
Joined: April 13th, 2013, 5:58 am

Option in CsPro 5 to keep ANSI data file as is

Post by GVLORICA »

Hi,

I just want to give you unsolicited advice on how users of old version of CsPro embrace version 5 and above, that is, leave the original data file format as is. This means that if the data file is in ANSI, the output data file format should not be changed to UTF-8. This can be implemented by either adding an option in CsPro development mode that would allow the user to select the data file format (ANSI/UTF-8) or CsPro 5 can be redesigned such that if there is no BOM found in the data file when opening it, it would be processed in ANSI mode otherwise process it as UTF-8. Definitely, processing UTF-8 has an overhead compared to processing ANSI data file because you don't have to parse the latter.

Another reason it that a lot of users have already developed tools/systems written in other programming language and find it time consuming to integrate CsPro 5 in their existing system.

Best regards,
Gregory Martin
Posts: 1796
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Option in CsPro 5 to keep ANSI data file as is

Post by Gregory Martin »

We thought very hard about this before designing CSPro 5 and we ultimately decided that it would be easier to output all data files in UTF-8 format rather than burden the user with the option of choosing between ANSI and UTF-8, a choice that would have meant nothing to most of our users, and would have confused many people. Also, we liked the idea of a design once, deploy anywhere strategy, where the programmer didn't have to think about all the places where the data entry application would be used. That meant that, for example, Chinese could be entered into a data file, whether or not the application designer intended on such.

Now, you are right that some advanced users have written tools to parse CSPro data files and they may have to be modified to deal with the BOM and UTF-8 encoding. However, if you are only using ASCII in your file, then outside of the BOM, the data file is exactly the same as before, so you can simply skip past the three bytes of the file and use your tools in the same way.

How have you designed your tools? If you used .NET, making your programs read UTF-8 files is a rather trivial change.
iip
Posts: 32
Joined: January 19th, 2012, 11:30 pm

Re: Option in CsPro 5 to keep ANSI data file as is

Post by iip »

and why not just use UTF8 without BOM? so all framework that use both ansi and unicode will work the same, so we don't have to change the code
Gregory Martin
Posts: 1796
Joined: December 5th, 2011, 11:27 pm
Location: Washington, DC

Re: Option in CsPro 5 to keep ANSI data file as is

Post by Gregory Martin »

Some people distribute UTF-8 files without the BOM and then hope that the software package can identify whether or not the file is UTF-8 or ANSI, but this is not an accurate solution 100% of the time. Also, say you have a 1.3 gigabyte data file, which is a possibility in CSPro. To check whether or not the file is UTF-8 without a BOM or ANSI, we would have to parse the whole file to see if there are any Unicode character sequences. Only then could we properly know what format to read in the file. This is not very practical as it would nearly double the time it took to do anything with a CSPro data file. That's why we settled on using UTF-8 with a BOM.
Post Reply