Fwd: Another migration tool

Toby Johnson toby at etjohnson.us
Fri Dec 1 02:31:15 EST 2006


Kirit Sælensminde wrote:
> Is there more information available in the actual SourceSafe database 
> than is exposed through SS.EXE? If so then it may be that replacing 
> SS.EXE with a database reader could simplify some of the heuristics 
> used for ordering etc. in the tool. It would only be worthwhile though 
> if my tool has better modeling of how the SourceSafe repository 
> changed over time than the existing vss2svn tool does. How is this 
> handled?

Reading the database directly, as you can probably imagine, solves many 
of the problems of reading ss.exe (not the least of which is how to 
parse the output, which as you said can be tricky) but also raises new 
ones. Of course we -- and by "we", I mean Dirk :) -- had to 
reverse-engineer the database format through trial-and-error, and there 
are likely still some places where we're doing it wrong.

However, this approach also exposes some data that is simply impossible 
to retrieve using ss.exe or the OLE API, such as recovering child items 
from a deleted project, or correctly recovering the history of a renamed 
item, especially if different items of the same name existed in the 
repository at multiple points in time.

Unfortunately, the bottom line is that the VSS database structure is 
rather cumbersome, incomplete, and fragile, and regardless of how the 
data is retrieved there's a good chance some information is lost. For 
example, there is no sort of auto-incremented counter in any of the 
database files to give even the correct order of actions (although the 
ss.exe output gives the illusion of ordered version numbers, these are 
derived at runtime and aren't actually stored anywhere). So this means 
we must rely entirely on timestamps, and since VSS is a file-based 
system that has only the system clocks of the various client machines 
that connect (sometimes even in different time zones!), this information 
is very unreliable -- especially, as Dirk mentioned, when an 
archive/restore cycle is performed, because then the timestamps are 
overwritten with the time of the restore, and not the time of the 
original commit!!

>  
>
>     Since we worked also very hard on getting things "right" during the
>     conversion, there are a few concepts that are not easily mapped
>     between
>     the two tools. Esp. the archive and restore cycles are the most
>     problematic one. Have you solved this problem domain and how did you
>     solve it? 
>
>
> I'm not 100% sure what you mean here. SourceSafe has no concept of 
> transactions - each file submission is handled seperately, so the 
> migration doesn't attempt to guess where transactions might be valid. 
> In practice each file version that is sent to Subversion is a seperate 
> transaction (revision number).

Dirk was referring here to the act of using the VSS "Archive" command 
followed by a later "Restore"; as I mentioned above, this really screws 
with the timestamps. However, since you mention transactions, I should 
point out that we try to deduce atomic transactions in VSS by assuming 
that if consecutive VSS commits have the same author and comment, they 
are part of the same logical transaction, and are recreated in 
Subversion that way. We keep track of any files that are modified in a 
given transaction, and "commit" that transaction whenever the same file 
is about to be modified twice (there are also other cases where we 
always immediately commit, such as after a rename).

>
> Better handling of shared files is the main thing that the tool is 
> able to handle. If you have a simple situation where a file is 
> developed and then shared to each location it is used then this tool 
> will handle that much better than other tools I've seen, i.e. it will 
> not put multiple versions of that file into Subversion until after the 
> share occurs.
>
> What does vss2svn do in this situation? I've been thinking of putting 
> together a single page with all of the tools I can find with a short 
> description of what they actually import in terms of the SourceSafe 
> history into Subversion.

I believe we are doing the same thing here; specifically, when an item 
was shared in VSS we treat that as a Subversion "cheap copy". We keep 
track of all shares during the migration, and after a share occurs, then 
any commits which are made to any of the various logical locations which 
point to the same physical file are propagated to each file in 
Subversion. So when foo.txt is shared to bar.txt, that is treated as an 
"svn copy" action. Then if a commit is made to foo.txt, that change will 
be made to both foo.txt and bar.txt in the same transaction.

Unfortunately, as you can imagine, all of this is rather complex, and 
the learning curve for just getting familiar with the code is very 
steep. Couple that with the fact that most people will only use such a 
tool once, and you can see that it very difficult to continue innovation 
of such a project! I doubt I will ever need to perform another VSS 
migration (I hope to live the rest of my life without ever actually 
using the tool for real source control again :) so the "scratch your 
itch" motivation of most open source projects quickly diminishes.

toby



More information about the vss2svn-users mailing list