Mixed encodings in dump file

Toby Johnson toby at etjohnson.us
Sat Jan 6 18:20:01 CST 2007


Ori Avtalion wrote:
> Toby Johnson wrote:
>   
>> Gotcha. I'll close ticket 26 and open another to add an encoding switch
>> to ssphys.
>>
>>     
>
> I have written a patch for
> <http://www.pumacode.org/projects/vss2svn/ticket/44>
>
> Since I have technical problems at work with running vss2svn from the
> source, I cannot test it with my VSS repository.
>
> I know this isn't a healthy way to submit patches, but I think it will
> at least help speed the ticket along, and since I made changes to ssphys
> it can basically be tested with a regular latin-codepage VSS repository.
>
> There is a chance it may not run at all, though :)
>
> I've added an encoding flag to the xml formatter of ssphys with a
> default value of windows-1252, and a matching flag for vss2svn.pl (also
> with a default).
>   

Thanks for the patch! It looks good to me at first glance. I currently 
don't have a build environment set up for ssphys, so hopefully Dirk or 
Kenneth can test it out before we apply it. Otherwise I'll add it to my 
growing to-do list...

> Regarding DoSsCMD() in vss2svn.pl, the character range removed from the
> output is correct for *most* windows codepages, since all of them
> re-define just the lower part of windows-1255.
> Arabic, for example, has rarely used characters in 0x8D and 0x90
> <http://en.wikipedia.org/wiki/Windows-1256>
>   

Here is the line you are referring to:
$gSysOut =~ s/[\x00-\x09\x11\x12\x14-\x1F\x81\x8D\x8F\x90\x9D]/_/g;

Maybe we should remove the 0x81 and everything after if it's not 
windows-1252? Or maybe we should be more strict when it comes to 
filenames and less so for everything else? Are those upper codepage 
characters considered valid filename characters in those codepages?

toby



More information about the vss2svn-users mailing list