Mixed encodings in dump file
Dirk
vss2svn at nogga.de
Thu Jan 4 10:17:51 CST 2007
> Kenneth, I must admit that I am very inexperienced with issues
> regarding code pages and converting among encodings... according to
> your description, is ticket 26
> <http://www.pumacode.org/projects/vss2svn/ticket/26> still valid? It
> sounds like the XML parser will "do the right thing" as long as the
> correct encoding is written to the XML files?
>
Oh, this old devil jumps back into my neck ...
First: vss itself does not contain any codepage information. Since every
client will write the settings in its own codepage, you can have a mixed
archive with different encodings, but you can not tell from the outside,
which one is the correct one. E.g. consider two developers with
different codepages working on the same archive. Both will write log
messages in their own codepage, and therefor can not correctly decode
the log messages from the other. And there is no way to prevent this.
Second: You have to distinguish the version controlled files and the
associated information like author and comment. No source control system
will deal with the "encoding" of the stored file itself. You can
consider this as black box data. Only comment and author information
needs ot be encoded in the corret codepage.
This is why you will see utf-8 encoded comments but codepage encoded
data in the dumpfile.
Third: There is a long outstanding problem to save all this information
during the conversion in vss2svn. Initially I thought, that putting the
right codepage into the header of the ssphys generated xml file will
solve all problems, but this wasn't the case. So we are still looking
for a good solution and there are a few workarounds.
1.) I'm not sure about the state of the Ken/Unicode [1] and
Ken/ssphys-trusted-encoding branches [2]
<http://www.pumacode.org/projects/vss2svn/browser/branches/Ken/ssphys-trusted-encoding>
2.) What happens if you patch this line[3] to change to your correct
codepage
> TiXmlDeclaration decl ("1.0", "windows-1252", "");
Best regards
Dirk
[1] http://www.pumacode.org/projects/vss2svn/browser/branches/Ken/Unicode
[2]
http://www.pumacode.org/projects/vss2svn/browser/branches/Ken/ssphys-trusted-encoding
[3]
http://www.pumacode.org/projects/vss2svn/browser/trunk/ssphys/SSPhys/Formatter.cpp#L57
More information about the vss2svn-users
mailing list