This video explores how we can discover or create a “well-structured containment model” for an existing code-base.
Previously we asserted that “well-structured containment”
is a hierarchical organization of all of the files in a code-base, into containers, so that
no containers contain too many files or child-containers, and
The dependency relationships between containers is acyclic
Such a model will give us the foundation we need for defining a software architecture, with the associated benefits that we are looking for.
To construct our containment model we will draw on what is available
The implemented sea of source files, or classes, and their countless interdependencies, and/or
The current physical organization of those source files into folders, packages, assemblies, jar files, Maven projects, or whatever. All things being equal,
we will aim to create a containment model from these artefacts that assumes as little code-level refactoring as possible.
Building containment using just the source files
means identifying cohesive clusters of files and recursively grouping them
Since this approach ignores the current physical structures, it is likely to diverge considerably from them (which we may or may not consider to be desirable)
But it will always be possible to align the physical organization with the model, once we have figured out what that model is
Although this requires the analysis of a vast amount of dependency data, this analysis and the identification of cohesive clusters can be helped a lot by automatic processing, so
This approach can often be quicker than “fixing” highly tangled existing structures
Using physical structures means initializing the containment model to reflect the current physical grouping of files
And then moving files and containers around, and refactoring the code (or at least simulating this) to change or remove inter-container dependencies
Primarily the focus tends to be on disentangling the containers
This approach will naturally result in a model that is more closely aligned with current physical structures
And again it will be possible to make the corresponding changes to the codebase so that the physical structure is fully realigned to the model if this is desired
Model construction in this way is largely a manual process, but again automated analysis and guidance can make it quite feasible,
even if it is sometimes harder than starting with a clean slate.
As you might expect, it is most common to use elements of both approaches
Use the physical containers where they are reasonably well structured, or not too difficult to disentangle
And build new structures based on the underlying file interdependencies where this isn’t the case.
we may preserve the top-level physical breakout of the code-base, as this may have been thought out more carefully at the start of the project, and may have infrastructural or even organizational impact that makes it costly to change.
And underneath this top level breakout, where there has been less attention to emerging structures, we lean more on bottom-up reconstruction
Taking a closer look at the bottom-up modelling approach…
The first step is to identify groups of tightly coupled files that have relatively low coupling with other groups
(Automatic analysis can help a lot)
And then wrap those groups into containers. The result of this will be a flat ocean of containers that is considerably smaller than the flat ocean of files, but probably still way too Fat to be considered “well-structured”.
Cohesive clusters of containers are then identified
and wrapped into higher level containers
recursively until we have a well-structured hierarchy
The pure top-down approach is quite different
The model is first initialized to reflect the physical containment structure, and then manipulated until it is well-structured, with as few changes as possible.
Moving files between containers is generally preferable to code refactoring in that it requires no code-change unless the physical structure is to be aligned with the model, and even then the change to the code doesn’t affect the code logic and so is simpler and lower risk.