Mixed encodings in dump file

Kenneth Porter shiva at sewingwitch.com
Thu Jan 4 10:54:37 CST 2007


--On Thursday, January 04, 2007 11:42 AM -0500 Toby Johnson 
<toby at etjohnson.us> wrote:

> OK, thanks... does the XML parser then take care of it automatically,
> without any need to actually read the PI from the XML file in the vss2svn
> script? The ticket mentioned something about from_to() calls but those
> seem to no longer be there...

Correct. The parser does the conversion to native Perl Unicode.

Take a look at r262 for what happened to from_to().

The string objects for author and comment are actually UTF-8 but without 
the Perl internal markers to make them recognized as such, so the 
Encode::decode_utf8 sort of does for UTF-8 encoding what a C++ placement 
new does for object allocation. It says "this is already UTF-8, so set the 
internal bit to recognize that". This prevents double-encoding when the 
comment and author are written to the dump file.



More information about the vss2svn-users mailing list