Mixed encodings in dump file
Toby Johnson
toby at etjohnson.us
Sat Jan 6 18:20:01 CST 2007
Ori Avtalion wrote:
> Toby Johnson wrote:
>
>> Gotcha. I'll close ticket 26 and open another to add an encoding switch
>> to ssphys.
>>
>>
>
> I have written a patch for
> <http://www.pumacode.org/projects/vss2svn/ticket/44>
>
> Since I have technical problems at work with running vss2svn from the
> source, I cannot test it with my VSS repository.
>
> I know this isn't a healthy way to submit patches, but I think it will
> at least help speed the ticket along, and since I made changes to ssphys
> it can basically be tested with a regular latin-codepage VSS repository.
>
> There is a chance it may not run at all, though :)
>
> I've added an encoding flag to the xml formatter of ssphys with a
> default value of windows-1252, and a matching flag for vss2svn.pl (also
> with a default).
>
Thanks for the patch! It looks good to me at first glance. I currently
don't have a build environment set up for ssphys, so hopefully Dirk or
Kenneth can test it out before we apply it. Otherwise I'll add it to my
growing to-do list...
> Regarding DoSsCMD() in vss2svn.pl, the character range removed from the
> output is correct for *most* windows codepages, since all of them
> re-define just the lower part of windows-1255.
> Arabic, for example, has rarely used characters in 0x8D and 0x90
> <http://en.wikipedia.org/wiki/Windows-1256>
>
Here is the line you are referring to:
$gSysOut =~ s/[\x00-\x09\x11\x12\x14-\x1F\x81\x8D\x8F\x90\x9D]/_/g;
Maybe we should remove the 0x81 and everything after if it's not
windows-1252? Or maybe we should be more strict when it comes to
filenames and less so for everything else? Are those upper codepage
characters considered valid filename characters in those codepages?
toby
More information about the vss2svn-users
mailing list