[Svnmerge] request for review: #2863 - analyze_source_revs fix

Sun Aug 5 16:13:35 PDT 2007

On sab, 2007-08-04 at 12:43 -0500, Dustin J. Mitchell wrote:

> > > This is a tiny patch, but fixes incorrect behavior when merging between
> > > two repositories -- analyze_source_revs previously looked for the head
> > > revision of the *target* when trying to determine the highest possible
> > > revision, but it is responsible for analyzing revisions in the *source*.
> > > 
> > > Again, any comments and/or review will be appreciated.
> > > 
> > > Patch is attached and at
> > > http://subversion.tigris.org/nonav/issues/showattachment.cgi/696/get_latest_rev_of_source.patch
> > 
> > I think the idea was to reuse the previously-built cache of svn info
> > invokation. I understand that this is a correctness issue for people
> > doing cross-repository merges (which weren't supported when svnmerge
> > began), but I would love if we can avoid slowing down people merging
> > within the same repository.
> > 
> > Does this patch cause an additional remote svn info to be issued, in the
> > case there is a single repository?
> 
> I think it does -- but the existing code is clearly incorrect. 

It is perfectly correct if you run merges within a single repository,
whcih is the only "documented" way to do merges. I appreciate your
efforts to fully support merges across multiple repositories, but I am a
little concerned if that slows down operations within a single
repository (which I am sure it is by far the most common type of merge
whcih svnmerge.py is used for).

>  Is there
> another way to do this?  I think using get_svninfo() on them to compare
> UUIDs or repos-roots would cause an extra fetch anyway.

Not necessarily: svn info of a local path does not fetch anything.
Pardon me if I haven't looked at the code for a long time, but *on
paper* I can't see why a remote fetch should be necessary here. 

In the original design, it was exactly the same to run get_latest_rev
against source_url or branch_url, since they both referred to the same
repository. I explicitally decided to use branch_url because I was sure
to have a cached svn info at that point.

get_latest_rev could use an internal cache, separated from that of svn
info, and indexed by the repo root. UUID would be more correct as an
index, but you need a remote connection to extract the UUID from any
URL... so using a repo root as an index is pragramatically better,
because it will allow get_latest_rev to reuse cached information even
after your patch.

Does it make sense?

> analyze_revs has some unused logic to handle the case of an unknown
> end_rev.  Could we make that an option for speed-conscious users?

I don't understand what you are suggesting here... can you elaborate
please?

> I think, rather than block fixes because of potentially slower operation
> (slow and right beats fast and wrong in my book..), we should look at
> other ways of increasing speed.  

Again, I don't see this really as a fix; rather, it's a good step in the
direction of supporting multiple repositories. Which is not something
that was taken into account from the beginning, and thus it really
doesn't qualify as a fix.

> One I've been thinking about is caching
> immutable information such as RevisionLog information -- I find that the
> biggest time sink for my merges (which are over svn+ssh://, ugh) comes
> from downloading the same logs and revision diffs on every merge.

Yes, but I wouldn't put that within svnmerge.py. Caching part of the
remote repository into some local space is probably a project on its
own; I had played with such a toy before. I had design a tool called
"lsvn" which would basically have the same UI of "svn" (forwarding every
command), but cache many things locally (not file contents/diffs, but
logs and revprops). After that, you can simply tell svnmerge.py to run
"lsvn" instead of "svn" and be done with it. In fact, I guess many users
of "svn" would be happy of "lsvn" independently of svnmerge.py.
-- 
Giovanni Bajo