Nightly 20070202 and new encoding switch

Dirk vss2svn at nogga.de
Mon Feb 5 11:45:10 EST 2007


Hi Jonathan,

> Please note that XML discourages or forbids some Unicode codepoints,
> not bytes in specific codepages. Specifically, windows-1252 does not
> map any byte to a codepoint in the range [0x80-0x9F].
> (see http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx)
> For example, 0x80 in windows-1252 maps to Unicode 0x20AC (Euro sign).
>   

This is interesting to read. I was somewhat new to XML (and I'm still 
not an expert) when I researched this. I expected the behavior that you 
state above. I made a XML file with "encoding=windows-1252" and entered 
a few of the problematic bytes/characters (in the windows-1252 
codepage). I expected the file to be valid XML, since in the encoding I 
used all bytes are allowed and defined. To verify I opened the file in 
XMLSpy and the tool complained about invalid characters. Regardless 
whether I used the direct character or the XML byte encoding. Therefore 
I concluded to interpret it as "problematic bytes" and not as 
"problematic codepoints".

Perhaps, I did something else completely wrong at that time.

Thanks for the info and the clarification.

Dirk



More information about the vss2svn-users mailing list