Copyright © 1998 by Brad Appleton, Stephen Berczuk,
Ralph Cabrera, and Robert Orenstein.
Permission is granted to copy for the PLoP '98 conference.
Abstract:
Most software version control systems provide mechanisms for branching
into multiple lines of development and merging source code from one
development line into another. However, the techniques, policies and
guidelines for using these mechanisms are often misapplied or not fully
understood. This is unfortunate, since the use or misuse of branching
and merging can make or break a parallel software development project.
Streamed Lines is a pattern language for organizing related
lines of development into appropriately diverging and converging
streams of source code changes.
Keywords:
Branching,
Parallel Development,
Patterns,
Software Configuration Management,
Version Control
Read this section by following the above hyperlink if you want an
introduction to SCM patterns. You can read about
our motivation and progress in developing an SCM pattern language,
and view a
diagram showing the relationships between SCM patterns.
Skip ahead to the next section if you want to stay focused on parallel
development and branching.
Any software project of certain team and system sizes will invariably
require at least some efforts to be conducted in parallel. Large
projects require many roles to be filled: developers, architects, build
managers, quality assurance personnel, and other participants all make
contributions. Multiple releases must be maintained, and many platforms
may be supported. It is often claimed that parallel development
will boost team productivity and coordination, but these are not the
only reasons for developing in parallel.
As [Perry98] points out, parallel
development is inevitable in projects with more than one developer. The
question is not "should we conduct a parallel development effort", but
"how should a parallel development effort best be conducted?"
[Perry98] suggests that many of the
basic parallel development problems which arise can be traced back to
the essential problems of: system evolution, scale, multiple
dimensionality, and knowledge distribution.
-
Evolution compounds the problem of parallel development because
we not only have parallel development within each release, but
among releases as well.
-
Scale compounds the problem by increasing the degree of parallel
development and hence increasing both the interactions and
interdependencies among developers.
-
Multiple dimensions of system organization compounds the problems
by preventing tidy separations of development into independent
work units.
-
Distribution of knowledge compounds the problem by decreasing the
degree of awareness in that dimension of knowledge that is
distributed.
Thus, a fundamental and important problem in building and evolving
complex large scale software systems is how to manage the phenomena
of parallel changes. How do we support the people doing these parallel
changes by organizational structures, by project management, by process,
and by technology?
If parallel development is a fact of life for any large software
project, then how can developers making changes to the system in
parallel be supported by project management, organizational structures,
and technology? Streamed Lines is a pattern language that
attempts to provide at least a partial answer to this question by
presenting branching and merging patterns for decomposing a project's
workflow into separate lines of development, and then later recomposing
these lines back into the main workstream. The patterns describe
recurring solutions for deciding how and when development
paths should diverge (branch) and converge (merge).
Streamed Lines does not describe a complete solution to all the
problems encountered during parallel development; It merely attempts to
reveal the ways in which branches can be used to help create an effective
parallel development solution. What do we even mean by "effective
parallel development"? [Atria95]
defines effective parallel development as:
... the ability for a software team to undertake multiple, related
development activities -- designing, coding, building, merging,
releasing, porting, testing, bug-fixing, documenting, etc. -- at the
same time, often for multiple releases that use a common software
base, with accuracy and control.
Note that this definition extends to include teams that span multiple
locations, an increasingly common situation for many organizations.
It encompasses all elements of a software system and all phases of the
development lifecycle. Inherent within the definition is the concept
of integration, in which parallel development activities and projects
merge back into the common software base. Also, the definition of
effective parallel development includes process control -- the
policies and "rules of the road" that help assure a controlled,
accurate development environment.
So how can branching help us achieve effective parallel development?
Branches may be used to isolate changes, and to insulate developers from
other's integrated changes that have yet to be integrated, built, tested,
and baselined. Branches may also be used to organize the decomposition
work into change-tasks and work-streams and to control the integration
of changes from tasks and streams into other streams. When used
appropriately in this manner, branching helps address problems of
communication, visibility, project planning and tracking, and ultimately
risk management.
The following is a
brief introduction to the concepts
of file checkin/checkout, and to branching and merging. If you are
already familiar with these concepts you may safely skip this section.
Most VC tools supporting branches do so at the granularity of a lone
file or element. The revisions and branches for each file form a
version tree which depicts the evolution of a single file.
This is called file-oriented branching. Branches are used
and organized and viewed in the context of a single file. While there
may be a loose or coincidental similarity between the version trees
of different files, file-oriented branching focuses primarily on
physical modifications to individual files as the unit of change
and change-flow.
But branching is most conceptually powerful when viewed from a
project-wide or system-wide perspective; the resultant version tree
reflects the evolution of an entire project or system. We call this
project-oriented branching. With project-oriented branching,
branches are used and organized and viewed in the context of an entire
project, product, or system. Project-oriented branching imposes a more
or less uniform structure on the version trees for all the files in the
system. Instead of emphasizing modifications to individual files,
project-oriented branching focuses primarily on the flow of logical
changes across the entire system. Logical changes flow through and
between streams of work in which product and component versions are
integrated, built, baselined, and released.
There are essentially five different forms of branching, each of which
may be represented using the file-based branching of most VC tools:
- Physical:
-
Branching of the system's physical configuration - branches are created
for files, components, and subsystems
- Functional:
-
Branching of the system's functional configuration - branches are created
for features, logical changes (bug-fixes and enhancements), and other
significant units of deliverable functionality (e.g., patches,
releases, and products)
- Environmental:
-
Branching of the system's operating environment - branches are created for
various aspects of the build and run-time platforms (e.g. compilers,
windowing systems, libraries, hardware, operating systems, etc.) and/or
for the entire platform
- Organizational:
-
Branching of the team's work efforts - branches are created for
activities/tasks, subprojects, roles, and groups
- Procedural:
-
Branching of the team's work behaviors - branches are created to
support various policies, processes, and states
Specific instances of each type of branching will be discussed in many
of the patterns which follow. It should be mentioned that there is
frequent overlap between the above types of branching. For example,
a branch created for a particular bug-fix may be regarded as both a
bug-fix branch, and as an activity-branch. In this case, the set of
changes which constitute the fix are performed as a single task. But
a branch created for an integration effort won't always correspond
to a single fix or feature. It is quite common, however, for a branch
to correspond to more than one type of branching. The important thing to
remember is which type is perceived as the primary intent of the branch.
It should also be mentioned that using branches for more than 2-3 of
these dimensions at the same time is discouraged because it
can necessitate a combinatorial explosion of branches spawned from the
same origination point (which is quite unwieldy).
[Conradi96] discusses
this inherent weakness of hierarchical branching and version-trees:
a hierarchical organization is often convenient, but it quickly breaks
down when variance occurs simultaneously along multiple dimensions.
We use various terms and notation throughout this paper. Where possible,
we have tried to use names and concepts that frequently recur in practice.
In general, when a branch corresponds to a line of development
containing (or intended for) multiple sets of logical changes, we refer
to the branch as a codeline, even though it need not be limited
to source-code artifacts. Often a branch is used only for
a single logical change (also called a change-task). If a
branch is used for a single change-task and is then immediately merged
back to its parent, we call it an activity-branch, or simply a
branch or subbranch. In theory, the terms "branch"
and "codeline" may be used as synonyms. When describing branching
patterns, however, we try to be consistent in using the term "codeline"
to refer to a longer-lived workstream, and using the term "branch" to
mean a single activity-branch or a subbranch of a codeline.
A version may refer to a revision of a single file, or to a
set of file revisions that make up the entire project (or one of its
components/subsystems).
A change-package is the group of revisions that
were modified or created as part of a change-task.
A baselevel is a named configuration of the project that is
self-consistent enough to serve as a stable base for
subsequent development efforts.
A baseline is a baselevel that is suitable for a formal
internal or external release.
Merging is the process of integrating the revisions in a
change-package into a the contents of a codeline. Sometimes, a
change in one codeline needs to be incorporated into another codeline.
For example, a bug-fix in a maintenance codeline may also be needed in
the corresponding development codeline for the next major release.
We refer to this as change propagation, or simply
propagation. When the entire contents of a codeline are merged
into another codeline, or into a developers workspace, we call this
particular kind of merging, syncing with the codeline, or just
syncing.
Since revision names like "1.4.1.2", used by VC tools like RCS
(and many others) aren't particularly mnemonic, we use more symbolic
branch names consisting of letters and numbers (and some other characters).
We also use the '/' character to indicate the beginning
of a branch name, so that versions can be uniquely determined with an
identifier such as "/main/rel1-maint/fix232/4". Hence a fully
specified version name resembles a directory path in Unix or DOS. A
few VC tools (most notably ClearCase and Perforce) use the same or
similar conventions for version naming.
When drawing codelines, branches, change-tasks, and their relationships,
we use a tree structure with branch-names inside boxes and version-names
inside circles (a "box" or "circle" with no name inside is considered
"anonymous"). Branches and codelines are indicated with solid lines,
whereas merges and propagations are indicated with dashed lines.
These version-tree diagrams are reminiscent of interaction
sequence diagrams in the UML; but we draw the timeline from left to
right instead of from top to bottom (to conserve space).
Branch names always appear at the beginning of the timeline for the
branch, and are preceded by a '/'. A "box" appearing in
the middle of a timeline for a branch corresponds to a change-task
that was performed "on-line" (directly on the codeline, instead of on
its own branch), and there is no leading slash in front of the
name for such a change-task. The length of a change-task "box" may be
used to indicate its duration relative to other change-tasks.
Parallel development raises several important issues and concerns for the
success of the development projects. These risk-factors are briefly
identified here, and are
described in detail in a separate section.
The patterns in Streamed Lines are divided into categories of
branching policy, branch creation, and branching structures. These
categories loosely correspond to the
[GoF] pattern categories
of: behavioral, creational, and structural (respectively). In addition,
many of the patterns refer to some basic types of branches and
codelines. We define all of these categories below:
-
Basic Branch/Line Elements
-
Some basic varieties of branches and codelines that serve as lower-level
building blocks for various patterns; these are not necessarily patterns
per se, but they nevertheless participate in one or more patterns in the
language
-
Branching Policy Patterns
-
Patterns describing behavioral policies to establish or preserve the
conceptual or physical characteristics of a codeline
-
Branch Creation Patterns
-
Patterns describing when to create a new kind of branch or codeline
-
Branch Structuring Patterns
-
Patterns describing the collaborations between two or more related branches
in a branching structure
The participants in Streamed Lines are distributed among these
four categories as follows:
Basic Branch/Line Elements |
Branching Policy Patterns |
|
|
Branch Creation Patterns |
Branch Structuring Patterns |
|
|
The full pattern descriptions appear in
Appendix A.
We have presented a series of patterns for managing branching in
parallel development projects. Certain subsets of these patterns
represent conflicting styles and may not mesh well together for the
same project; the patterns selected for a particular project are
dependent on the needs of the organization and the project itself. In
this section, we provide some guidelines on which patterns to select
for your project. Which patterns you use will largely depend upon
selected tradeoffs between safety and productivity (or "liveness").
More conservative strategies tend to tradeoff productivity for safety,
while more optimistic strategies may do the opposite.
Generally speaking, using more branches for greater isolation reduces
safety risks, but at the expense of more merging and integration
effort. More merging and integration also requires more communication and
greater visibility of changes and baselines. Using fewer branches reduces
merging and integration efforts, but at the expense of less isolation and
less safety. Merging sooner rather than later fleshes out risks early on
while there is more time to address them, but requires continual efforts
to regularly monitor and address such risks.
In short, you will have to confront and manage risks concerning safety,
productivity, and communication no matter what you do. Time and effort
must be invested to manage these risks. The three basic ways to do this
are to pay now, to pay later, or to pay-as-you-go.
The most productive overall strategies attempt to invest a reasonably
small amount up front, and then pay the rest as they go. The larger and
more critical and risk-averse your project is, the more you will need
to invest in "up front" planning and policies, while still employing
a pay-as-you-go strategy throughout the lifetime of the project (which
includes regular monitoring and feedback to make incremental corrections).
Such an approach essentially tries to offload back-end costs (of deferred
or unmanaged risks) by handling the most critical risks "up front" as a
minimal initial investment, and to amortize the remaining costs using a
"just-in-time" approach.
Here then are the important strategic decisions to make while planning
the branching and merging road-map for your parallel development efforts.
Be aware that performing less up-front planning requires more attentive
and visible monitoring and feedback; while more up-front planning
often results in more things that need to be corrected later on. These
differences should decrease, and eventually converge, as the project
evolves and its parallel development policies and procedures become more
stable and mature.
Before making any important strategic decisions, probably the first
and most important thing to do is determine the amount and kind of
risk your project can tolerate within its environment. Look at all of
the forces of branching and parallel development
described earlier and try to get a good picture of how and where each
of them applies to your project and its development environment. Which
risks apply to you? Which ones seem important and which ones seem
secondary?
Typically, the most fundamentally important tradeoff to consider will be
that of safety versus liveness.
To get an idea of how much safety risk you can tolerate, ask yourself how
much time and effort is required to back-out an unwanted or detrimental
change from one of your codelines and builds. How many people does it
impact and how soon (and how critically) are they impacted? How much
rework and rebuilding is required and how much time and staff are required
to perform that rework? How much additional communication overhead does
the rework impose?
If the answer to these questions leads you to believe it would be
a very significant, or even monumental undertaking to back-out an
unwanted change, then your project probably has a very low threshold
for safety risks. If on the other hand it seems that only a select few
people would be affected and it wouldn't take very much time to correct
the problem, then you may have a very high threshold for safety risks.
Don't forget to consider how your risk-threshold will change and evolve
as the project evolves and matures! It is exceedingly common for a
project to tolerate more risk (and sometimes have greater time-to-market
pressures) before it has been deployed to a broad base of
customers than after it has been deployed and several releases
are being supported and maintained. Also, if the size of the team or of
the system is expected to grow considerably, it may make more sense to
take some preventive measures early on, before it becomes to difficult
to impose non-trivial changes in the team's process and behavior. At
the very least, you will need to plan to migrate from a process that
tolerates more risk to a process that eventually tolerates less risk.
The first strategic decision to make is whether to adopt the strategy of
Early Branching or
Deferred Branching.
These are the
two different "branching styles" underlying the majority of the branching
patterns in Streamed Lines.
Early Branching
is better suited to larger or more formal efforts
that require a high degree of fine-grained isolation and control; you
assume less safety risks but pay the price of additional merging and
propagation.
Deferred Branching
is good for projects that can afford to risk losing a bit of safety in
order to gain more productivity; less branching and integration means
less overhead, but also less isolation and verification.
The choice of early or deferred branching also affects the visibility
with which teamwork and workflow can be communicated from a file's
version tree. Deferred branching may hide the intent of a change or
set of changes to go into specific releases. Early branching makes this
intent clear early on, but requires more effort to follow through with
that intent and propagate the change to more codelines than would be
required if you had waited longer before branching.
The branching style that you decide is best suited for your environment
will dictate a complementary set of patterns and pattern variants:
Early Branching Style |
Deferred Branching Style |
|
|
Regardless of the branching style selected,
Codeline Policy and
Codeline Ownership
should used be for every branch and codeline created. These two practices
need to be employed in a way that is readily visible to the team, and
which can be easily and quickly communicated in as short a time-span
as possible.
Patterns like
Parallel Maintenance/Development and
Overlapping Releases
are typically the first branching structures many shops encounter. They
can be applied using either branching-style. It depends primarily upon
when you branch (early or late) and upon which effort goes on the branch
and which stays on the parent codeline.
Early branching tends to keep the release or major release as the
invariant for each codeline. So instead of splitting development and
maintenance across codelines, it keeps the same release on the same
codeline, regardless of whether or not it is development effort or
maintenance effort for the given release.
For deferred branching, the releasing/maintenance effort will always be
the one that branches off, allowing the latest and greatest development
to continue on the same line as before. This way of thinking may be peculiar
to those accustomed to an early branching style that uses separate codelines
for each release; they may have difficulty understanding why it is coherent.
With deferred branching, it's not the release that remains invariant on
the branch, it's that the recency of the effort on the branch: the latest
development efforts, or else the latest maintenance efforts.
Along with selecting a branching style, you will need to select
appropriate merging styles to match your branching preferences.
A higher tolerance for safety risks and minimal effort implies a
relaxed policy toward codelines, and requires fewer integration lines;
A lower tolerance for safety risks implies stricter codeline policies,
more codelines, and more integration effort.
Although the choice of merging style often follows from the chosen
branching style, a higher risk branching style does not necessarily
imply a higher risk merging style. In fact, you may wish to offset high
risk in one with low risk in the other. If you take more risks when
splitting things apart, you may want to take less risk when putting
things back together.
Remember that every time you add another line of integration, you are
in effect, adding another level of indirection: you gain more
isolation and nicer conceptual organization but you spend more time
merging. It should be noted that a
Virtual Codeline
is somewhat merge-evasive and may be used to simulate just about any
kind of codeline. The merging patterns that are more suited to each
merging style are as follows:
Relaxed Merging Style |
Restricted Merging Style |
|
|
In either case, frequent incremental integration is always a good idea
(using
Merge Early and Often or one of its variants) but the
merging frequency and ownerships will differ between the two styles.
The relaxed style favors liveness and assumes higher risk by having
people merge and propagate their own changes across codelines. The more
restricted style favors safety and has more codelines, each with more
restricted access, and with codeline-owners performing most of the
merges.
Unlike the branching styles, the merging styles may be mixed and
matched to achieve a gradual progression from high-activity codelines
with relaxed policies to lower-activity codelines with restricted
policies. This can be accomplished with patterns such as
Docking Line,
Subproject Line,
Component Line and
Remote Line.
But with a more relaxed style, each of these kinds of codelines
will typically merge back to the development line while a more
restricted style is more likely to use it as one in a set of
Staged Integration Lines.
By choosing appropriate branching and merging styles, you have effectively
chosen risk management strategies for organizing and integrating work
activities (and even for visibly communicating the status of codelines
and baselines to a large extent). Now you are ready to create some
specific codelines. It is extremely rare for a single project to use
all of the branching patterns presented here. The majority of
parallel development projects will typically use the following "core
set" of branching patterns (or one of their variants):
Many parallel development efforts will require little more than the
above patterns, along with one of
MYOC,
Docking Line,
or Staged Integration Lines.
Other projects will have more
sophisticated needs. They may start out with the above, and be okay for
awhile; But they will eventually need to progress to the next tier of
branching patterns, or their variants (often in the following order):
Once again, one or more of the following merging patterns will be used
with the above:
MYOC,
Docking Line, or
Staged Integration Lines.
Often, the project will take on more risk during early development and
then gradually tolerate less and less risk as it grows in team-size,
project size/complexity, or moves more and more into maintenance mode.
In addition to requiring more of the second-tier branching patterns
above, merging styles may need to become less forgiving and more
cautiously controlled:
The following patterns are usually for "special needs" only:
You may need them very rarely, or only for certain kinds of projects
and project teams. But when the project does require them, they often
have a very profound impact on the overall shape of the project-wide
version tree, and on the overall organization of parallel development
efforts. These patterns (along with
Change Propagation Queues)
should be used sparingly, and only as the need arises. This is
especially true of platform-lines since it is often better to handle
multi-platform issues with separate files and/or directories than with
separate branches.
As the project evolves, there will always be the need to periodically
revisit, refactor, and realign the branching/merging structures adopted
and their corresponding policies. You will also want look at the overall
picture of the project-wide version tree and check to see if the tree
looks too wide, too unwieldy, or too disjointed. Prudent use of codeline
propagation and retirement into the
Mainline
will help guard against the tree becoming too wide.
The patterns
Subproject Line, and
Policy Branch
can help to correct a version tree that has become to complex
and unwieldy.
MYOC and
Docking Lines
can help remedy development that has become too isolated or disjoint.
The branching patterns in Streamed Lines don't cover every
possible contingency. Situations will arise where the correct pattern
or variant to use is not at all obvious, or may not even exist.
However, even in these cases, some of the recurring themes which
underly many of the branching patterns presented here may still be
broadly applicable for your particular problem. These are as follows.
Just like variable names in a program, each branch should have a
meaningful name which communicates its purpose or its policy. Meaningful
names help to more clearly and visibly communicate intent and status,
particularly when the names appear in tool generated reports, queries,
and diagrams (especially version trees).
If your VC tool doesn't directly support named branches, then
floating labels (sometimes called sticky labels)
can be used to the same effect. See the pattern
Virtual Codeline.
Don't suspend all activities on a particular codeline when many of
those activities could continue unaffected on a separate branch,
without impacting the efforts on the original codeline. Productivity
need not be hindered this way.
See
Parallel Releasing/Development Lines for an example.
This in fact increases productivity while imposing very little additional
safety risk and only modest additional integration effort.
Frequent, incremental integration is one of the signposts of success,
and its absence is often a characteristic of failure. Current project
management methods tend to avoid strict waterfall models and embrace
the spiral-like models of iterative/incremental development and
evolutionary delivery. Incremental integration strategies, like
Merge Early and Often
and its variants, are a form of risk management that tries to flush
out risk earlier in the lifecycle when there is more time to respond to
it. The regularity of the rhythm between integrations is seen by
[Booch],
[McCarthy],
and [McConnell]
as a leading indicator of project health (like a "pulse" or a "heartbeat").
Not only does early and frequent integration flesh out risk sooner and
in smaller "chunks," it also communicates changes between teammates.
Every time a developer integrates a new baseline into their workspace,
or a new change into the baseline, they learn something about what
has happened to the system and where it has changed. In this sense,
integration turns out to be a very real form of communication, albeit
an indirect one. For this reason, it is crucial that the presence of
new baselines and baselevels are clearly and visibly communicated to all
concerned, and that the completion of important changes that are ready
to be built/baselines are also clearly and visibly communicated.
So perhaps a corollary to "integrate early and often" would be "commit
changes visibly and clearly." This includes changes that have been
committed to be included into a particular baseline/codeline, as well as
baselines that are now ready to be sync-ed into developer's workspaces.
Often, the best way to resolve risks that arise from opposing forces
(or competing concerns) is create a new branch for the competition.
Such incompatibilities may result from: access policies, dueling
ownerships, integration frequency, activity-load, activity-type,
and platform. Examples of this include:
Policy Branch,
Inside/Outside Lines,
Component Line,
Parallel Maintenance/Development,
and Platform Line.
Sometimes branching on incompatibility isn't enough. Divergence will
often require frequent convergence, or continuous mediation.
In this case, it is often necessary to add another level of indirection,
by adding another line of integration between the two opposing
forces or competing codelines. Examples are:
Subproject Line,
Docking Line,
Remote Development Line,
Staged Integration Lines,
and Mainline.
This will help reduce risk by isolating variation along the appropriate
dimension of work. While this does help to control and contain the amount
of variation to a locally manageable region, it does impose an additional
integration burden later on. (So does branching on incompatibility.)
The theory here is that the integration overhead at the end will be
minimized by the continual control that is more easily afforded by
isolating the change.
Avoid branching hierarchies that are extremely wide or dense!
(Think of "branch and bound.") Try for minimal reconciliation by
creating new branches only when the added benefit is worth the added
synchronization overhead. Use additional branches to provide greater
isolation between tasks and changes; and use integration-lines to add
additional verification and validation of merged changes.
But don't use branches to solve all your problems!
Many problems are best addressed by different means. For example,
numerous multi-platform issues are better solved by using extra files
and directories rather than platform-branches. Don't use branches as a
"hammer" to make every problem look like a nail, and don't "sow" a new
branch unless you can reap the benefits.
Preserve the conceptual integrity of the branch! When delegating
volatile aspects of high-impact variation to separate branches, keep each
aspect logically consistent within its own branch: keep codeline usage
consistent with its policy, and keep codeline policy consistent with its
purpose. Occasional "fine-tuning" and remedial actions are to be expected,
but avoid changes that violate the spirit of the codeline's intent.
Preserve the physical integrity of the branch! Don't merge
incomplete or inconsistent changes into the codeline; and don't leave
codelines in inconsistent states. When the configuration of a codeline
is inconsistent or incorrect it can adversely impact all users of the
codeline. Try to keep codelines reliably consistent, and consistently
reliable.
Choose optimistic or pessimistic branching policies and stick with
them! For a given project, strike a sensible balance of trade-offs
between safety (isolation, access control, code integrity, and risk
mitigation) and liveness (productivity, integration overhead, working
"on-line") and then apply them in a consistent manner. The balance may
need to be dynamically adjusted over time; but at any given time, the
policies should be consistent with one another.
You may recall that one of the recurring themes in the
[GoF] Design Patterns book
is: "Encapsulate the thing that varies." Branching doesn't
achieve encapsulation of information so much as it achieves isolation
of changes. So a recurring theme in most of these branching patterns is:
Isolate the thing that varies!
Each branch and codeline isolates one or more of the following dimensions
over a given time-period:
-
Physical Structure - organization and distribution of:
- System knowledge
- Components and subsystems
- Configuration elements (files and directories)
-
Functional Evolution - organization and distribution of:
- Change and change-flow
- Delivery (releases and patches)
- Functionality (requirements, features, fixes, and enhancements)
-
Teamwork - organization and distribution of:
- Interaction and communication (coordination and collaboration)
- Policy and procedure (cooperation and control)
- Workflow and activity-flow
- Roles and responsibilities
-
Environment and infrastructure (platform and resource variations)
-
Reproducibility and traceability (identification and tracking)
Perhaps most importantly, the branching policies and patterns described
here do not remove the need for communication between project
team members; These patterns should facilitate communication, not
eliminate it! The goal of these patterns is to help isolate work,
not people. People working together on a project need to remain
socially connected and coordinated, and to maintain awareness of the
impact of their efforts downstream and throughout the entire lifecycle.
Jeopardize this and you jeopardize team synergy, and ultimately, team
success.
If you isolate people from their work, systemic disconnection may
result: developers lose touch with the effects of their own efforts on
the overall project. If you segregate people from each other according
to their work tasks, social isolation may occur: people lose touch with
one another and with the overall project team. The purpose of
parallelization is not to isolate people from people, or people from
their work, but to isolate work from other work. Conway's Law
(see [Cope95]) applies
just as much to the architecture of the project's version tree as it
does to the architecture of the system. Use this wisdom to your advantage
(and ignore it at your peril).
There are some
common traps and pitfalls to watch out for when using branching
for parallel development. Some of them are the result of naive
approaches which seem "right" at first glance, but which deeper
understanding reveals to be a "dead end." Others are the result of
inappropriately (or overzealously) using the various branching patterns
in the wrong context. Many of these branching "pitfalls" try to
include some analysis of root cause and cure/prevention. But ultimately
it seems like all of them can be traced back to some combination of
poor planning, poor communication, or poor management.
[McKenney95] writes of the forces for
and against parallelizing a software program, breaking them down into:
Speedup, Contention, Overhead, Economics, Complexity, and a few others.
Most of these forces are equally applicable to the case of
concurrent/parallel software development. In fact, designing parallel
development strategies for concurrent software development bears more
than a striking resemblance to parallel programming strategies for
concurrent object systems. The former deals with multiple collaborating
objects running in multiple threads of execution across multiple address
spaces in a parallel software program; the latter deals with multiple
collaborating individuals working in multiple threads of development
across multiple workspaces in a parallel software development project.
As [Lea96] describes, some of the most
basic tradeoffs to be made when designing concurrent object systems are
those of safety ("The property that nothing bad ever happens")
and liveness ("The property that anything ever happens at all").
These tradeoffs are essentially the same for software development:
From either direction, the goal is to assure liveness across the
broadest possible set of contexts without sacrificing safety.
The need to apply such strategies across the broadest possible set of
contexts ties into their reusability across the project, and between
projects. Hence all the same issues and concerns mentioned by
[Lea96] regarding safety, liveness, and
reusability also arise during parallel development.
Branching is an optimistic concurrency control strategy for parallel
development. It tries to mitigate the risk associated with such
optimism by separating concurrent/parallel efforts into isolated paths of
development. Branching off into separate workstreams is fairly easy to do
with minimal interference, and gets rid of the need for development tasks
to "block" waiting for checkout-locks to be released. Rejoining the two
paths after they've been separated is done via integration (merging). The
inherent risk in resynchronization is mitigated by allowing it to happen
in a well insulated context at a more convenient time.
In effect, every codeline and branch represents a form of risk
management by isolating how functionality, environment, knowledge,
teamwork, responsibility, and reliability, are distributed and
disseminated across time and space.
Branching and merging hierarchically decompose and recompose parallel
development into more manageable chunks! By isolating things along
various dimensions in a hierarchical fashion, we are attempting to
manage dynamically evolving complexity and dependencies. First we
decompose the parallel development problem into codelines and branches
and subbranches, then we recompose the subparts back into the larger
whole by progressively merging subbranches back to branches, branches
back to codelines, and codelines back into the mainstream.
Regardless of whether changes are reconciled and synchronized
immediately, or deferred to a more convenient time and place, there
is always a risk of compromising the integrity of the codeline during
a merge. This is the price for such an optimistic concurrency
mechanism. The usual laws of thermodynamics (regarding entropy and
enthalpy) apply here as well: it is usually harder to put things back
together than it was to take them apart. For every branch created,
there is almost always an opposing merge to be reckoned with!
By separating development into isolated development paths and
change-tasks, branching eases the burden of tracing changes (both
physical and functional) and their dependencies. This makes
configurations, features and faults easier to track, verify and reproduce.
Although each merge carries with it some additional risk to codeline
safety, intelligent use of branching and merging really can help to
preserve codeline integrity (physical integrity, as well as conceptual
integrity).
If your VC tool supports symbolic branch names (rather than numeric
ones) then mnemonic branch names can serve as an effective and highly
visible form of communication that describes the intent of the branch
and the work taking place upon it. If you aren't using such a VC tool
you may need to find a way to work around this, either using a technical
solution
(like Virtual Codeline)
or a social convention among the project team.
Branching also helps communication and collaboration be effectively
organized, synchronized, and parallelized. If used properly so that it
isolates work instead of people, branching promotes effective teamwork
and really can reduce time-to-release. If you thoughtfully apply
risk-aware strategies for the selection of branching and merging
styles, and periodically take a step back to review and revise the
overall branching-tree, you should be able to reap the benefits of
parallel development (shorter cycle-time) and keep the amount of
synchronization overhead (and risk) to a manageable level.
Despite the fact that many VC tools consider branching to be one of
their nicer and more advanced features, branching is in fact a somewhat
low-level construct used for concurrency control. Most VC tools implement
file-oriented branching but not
project-oriented branching.
File-oriented branching is not ideally suited for parallelization of
work and workflow at coarser-grained levels beyond a single file or
directory.
Using file-oriented branching to represent project-oriented branching
results in a fair amount of trivial merging where revision contents
need to be propagated from branch to branch with little or no
difference between them (often causing unnecessary rebuilds when in
fact file-contents have not changed between revisions). Good merging
tools can minimize the pain and overhead associated with this, but the
overhead can still be significant.
Unfortunately, the majority of readily available VC tools don't
provide the user with anything better. It would be far more suitable
if one's VC or SCM tool provided predefined constructs which directly
map to the conceptual notions of: change-sets, activities, and
activity-streams, without being dependent upon branches. Then we could
use the SCM tool to directly model parallel effort and workflow and let
the tool itself worry about how to handle the low-level concurrency
control (branching) with the help of some user-supplied policy
preferences. There are a select few tools which actually do provide
this capability but they are presently in the minority. So unless you
are using such a tool, branching tends to be the next best mechanism
for supporting parallelism.
The result of using all these branching patterns is a version branch
tree structure that, for the most part, represents the intended
structure of activity workflow for the project. One might
regard this as a simple byproduct of Conway's Law, namely
that "Architecture follows Organization"
(see [Cope95]).
In the case of branching for parallel software development, we
might rename this as a corollary to Conway's Law and call it
"Branching Topology Comprises Workflow."
What this means is that tool-generated diagrams and queries/reports
can show version trees which closely conform to the intended work
breakdown structure (WBS) for the project team. This helps visibly
track and communicate status and progress in "real-time" to all
users of the VC tool and repository.
The branching tree of a project represents the structure of its evolution
in terms of change-flows. The flow of work activities is also an important
project structure. Streamed Lines attempts to coordinate these
two sets of structures so that activity and workflow conveniently map
to change-flows (using branches as the grouping mechanism). This helps
makes the project's development and evolution easier to conceptualize
and manage. In this manner, Streamed Lines assists in bringing
some of the architectural and management structures of a software project
into alignment.
The authors would like to give special thanks to the following people
for their significant contributions to Streamed Lines:
-
Chris Seiwald and Laura Wingerd of Perforce Software, for sharing their
drafts of, and considerable expertise with, high-level SCM best
practices
-
Doug Lea, our shepherd for PLoP'98, and concurrent programming "guru"
extraordinaire
-
Steve Vance, for sharing his drafts of advanced SCM branching
strategies
-
DeWayne Perry and Beki Grinter, of Bell Labs Research, for sharing their
vast knowledge of merging, workflow, and concurrent/collaborative
development
-
Participants in the Network of Learning workshop group at
the PLoP'98 conference: Mike Beedle, Mark Bottomley, Phil Eskelin,
Teri Hudson, Nick Jacobs, Tom Mowbray, Robert Switzer, and Paul Taylor
-
Linda Rising, David Delano, Neil Harrison, and all the other folks
responsible for putting together the ChiliPLoP'98 conference, which
gave the four of us the opportunity to meet face-to-face and collaborate
in the same room at the same time
[back to the table of contents]
Send us your comments!