Mixed encodings in dump file
Kenneth Porter
shiva at sewingwitch.com
Thu Jan 4 10:54:37 CST 2007
--On Thursday, January 04, 2007 11:42 AM -0500 Toby Johnson
<toby at etjohnson.us> wrote:
> OK, thanks... does the XML parser then take care of it automatically,
> without any need to actually read the PI from the XML file in the vss2svn
> script? The ticket mentioned something about from_to() calls but those
> seem to no longer be there...
Correct. The parser does the conversion to native Perl Unicode.
Take a look at r262 for what happened to from_to().
The string objects for author and comment are actually UTF-8 but without
the Perl internal markers to make them recognized as such, so the
Encode::decode_utf8 sort of does for UTF-8 encoding what a C++ placement
new does for object allocation. It says "this is already UTF-8, so set the
internal bit to recognize that". This prevents double-encoding when the
comment and author are written to the dump file.
More information about the vss2svn-users
mailing list