concatenation behaviour

iip · Post by **iip** » July 15th, 2013, 1:45 am

Hi,

We have problem during concatenation in CSPRO5 where some dat files are generated but empty (this case appear because puncher canceled data entry), all files has BOM in it, example:

f1 (3 bytes/only BOM)
f2 (3 bytes/only BOM)
f3 (3 bytes/only BOM)

concatenation data is fx with 9 bytes with contain EF BB BF 0D 0A 0D 0A 0D 0A, in my opinion it should only contain BOM (EF BB BF) right? because those are empty files

Regards,

-iip-

Gregory Martin · Post by **Gregory Martin** » July 15th, 2013, 9:03 am

You're right ... this is a something that we didn't think about when doing the Unicode conversion. I've fixed the problem and it will come out in the 5.0.3 release (next month probably).

Thanks for reporting this bug!

Anne · Post by **Anne** » December 4th, 2013, 4:20 am

About the BOM character: according to wikipedia, it is not needed in UTF-8 files (only UTF-16). Why is it in use by CSPro?

As I'm making an application where I do not know how many interviewers - and hence how many data files - I have, I wrote this nice bat-file to concatenate data rather than using the concatenate tool in CSPro, but my batch program doesn't work because of the BOM character, and so far, I haven't found an easy way to remove it.

Guess I'll make a CSPro batch to do it instead, but I really don't like it

Gregory Martin · Post by **Gregory Martin** » December 17th, 2013, 10:58 am

The problem we had when designing for the Unicode version is that CSPro uses simple text files as our data files, so we don't have any useful way of storing metadata about the file. For example, if we used a binary format, we could have a flag that indicated if the data was ANSI or UTF-8, but we don't have that.

So we had to figure out a way to identify between data files created in older versions of CSPro and data created in the newer version. The answer was adding the BOM to the data files. For example, if we didn't have it, and we encountered a character like 'ü' in the data file, we wouldn't necessarily know if it was a German accented letter, or if it was the beginning of a UTF-8 character sequence. The BOM helps us interpret all characters correctly.

sofiajoe · Post by **sofiajoe** » October 4th, 2014, 1:09 am

What I need is to make an application to export the data to use in the productionRunner tool (This tool basically just runs through all .pff and .bat files you specify). So I have aldready made the application (the .exf file), and the pff file. And in the .exf file, I have already specified what fields I want and what universe, and amongst the fields I chose, all of the ID elements are included.

CSPro Users Forum

concatenation behaviour

concatenation behaviour

Re: concatenation behaviour

Re: concatenation behaviour

Re: concatenation behaviour

Re: concatenation behaviour