Mixed encodings in dump file

Dirk vss2svn at nogga.de
Thu Jan 4 10:17:51 CST 2007


> Kenneth, I must admit that I am very inexperienced with issues 
> regarding code pages and converting among encodings... according to 
> your description, is ticket 26 
> <http://www.pumacode.org/projects/vss2svn/ticket/26> still valid? It 
> sounds like the XML parser will "do the right thing" as long as the 
> correct encoding is written to the XML files?
>

Oh, this old devil jumps back into my neck ...

First: vss itself does not contain any codepage information. Since every 
client will write the settings in its own codepage, you can have a mixed 
archive with different encodings, but you can not tell from the outside, 
which one is the correct one. E.g. consider two developers with 
different codepages working on the same archive. Both will write log 
messages in their own codepage, and therefor can not correctly decode 
the log messages from the other. And there is no way to prevent this.

Second: You have to distinguish  the version controlled files and the 
associated information like author and comment. No source control system 
will deal with the "encoding" of the stored file itself. You can 
consider this as black box data. Only comment and author information 
needs ot be encoded in the corret codepage.
This is why you will see utf-8 encoded comments but codepage encoded 
data in the dumpfile.

Third: There is a long outstanding problem to save all this information 
during the conversion in vss2svn. Initially I thought, that putting the 
right codepage into the header of the ssphys generated xml file will 
solve all problems, but this wasn't the case. So we are still looking 
for a good solution and there are a few workarounds.

1.) I'm not sure about the state of the Ken/Unicode [1] and 
Ken/ssphys-trusted-encoding branches [2] 
<http://www.pumacode.org/projects/vss2svn/browser/branches/Ken/ssphys-trusted-encoding>
2.) What happens if you patch this line[3] to change to your correct 
codepage
 >  TiXmlDeclaration decl ("1.0", "windows-1252", "");

Best regards
Dirk

[1] http://www.pumacode.org/projects/vss2svn/browser/branches/Ken/Unicode
[2] 
http://www.pumacode.org/projects/vss2svn/browser/branches/Ken/ssphys-trusted-encoding
[3] 
http://www.pumacode.org/projects/vss2svn/browser/trunk/ssphys/SSPhys/Formatter.cpp#L57 



More information about the vss2svn-users mailing list