Page 1 of 1

Option in CsPro 5 to keep ANSI data file as is

Posted: April 13th, 2013, 5:17 pm
by GVLORICA
Hi,

I just want to give you unsolicited advice on how users of old version of CsPro embrace version 5 and above, that is, leave the original data file format as is. This means that if the data file is in ANSI, the output data file format should not be changed to UTF-8. This can be implemented by either adding an option in CsPro development mode that would allow the user to select the data file format (ANSI/UTF-8) or CsPro 5 can be redesigned such that if there is no BOM found in the data file when opening it, it would be processed in ANSI mode otherwise process it as UTF-8. Definitely, processing UTF-8 has an overhead compared to processing ANSI data file because you don't have to parse the latter.

Another reason it that a lot of users have already developed tools/systems written in other programming language and find it time consuming to integrate CsPro 5 in their existing system.

Best regards,

Re: Option in CsPro 5 to keep ANSI data file as is

Posted: April 16th, 2013, 7:19 pm
by Gregory Martin
We thought very hard about this before designing CSPro 5 and we ultimately decided that it would be easier to output all data files in UTF-8 format rather than burden the user with the option of choosing between ANSI and UTF-8, a choice that would have meant nothing to most of our users, and would have confused many people. Also, we liked the idea of a design once, deploy anywhere strategy, where the programmer didn't have to think about all the places where the data entry application would be used. That meant that, for example, Chinese could be entered into a data file, whether or not the application designer intended on such.

Now, you are right that some advanced users have written tools to parse CSPro data files and they may have to be modified to deal with the BOM and UTF-8 encoding. However, if you are only using ASCII in your file, then outside of the BOM, the data file is exactly the same as before, so you can simply skip past the three bytes of the file and use your tools in the same way.

How have you designed your tools? If you used .NET, making your programs read UTF-8 files is a rather trivial change.

Re: Option in CsPro 5 to keep ANSI data file as is

Posted: June 16th, 2013, 3:10 am
by iip
and why not just use UTF8 without BOM? so all framework that use both ansi and unicode will work the same, so we don't have to change the code

Re: Option in CsPro 5 to keep ANSI data file as is

Posted: June 16th, 2013, 5:34 pm
by Gregory Martin
Some people distribute UTF-8 files without the BOM and then hope that the software package can identify whether or not the file is UTF-8 or ANSI, but this is not an accurate solution 100% of the time. Also, say you have a 1.3 gigabyte data file, which is a possibility in CSPro. To check whether or not the file is UTF-8 without a BOM or ANSI, we would have to parse the whole file to see if there are any Unicode character sequences. Only then could we properly know what format to read in the file. This is not very practical as it would nearly double the time it took to do anything with a CSPro data file. That's why we settled on using UTF-8 with a BOM.