The Anatomy of Code Review
To review or not to review? It is remarkable there is still no general consensus on this critical part of software engineering process among teams and practitioners. Opinions are highly polarised which is often a sign of a deeper process mechanics than it might appear on surface. With this piece, I intend to highlight all significant aspects of a code review.
At first, think about the way we went with the code versioning: before code versioning systems became widespread, we had manually backed up the source folder as we progressed. As codebases continued to grow and, as another consequence, started to require multiple contributors, managing multiple versions became inevitable. A commit — a change — became a unit of software engineering, instead of code itself. We do not question why we need versioning because we understand the essential need of it.
Code review process, mostly associated with having a PR (Pull Request) also has an essential purpose: it represents a delayed activation. Let me explain.
We all know software is easily broken. The number of artefacts and interdependencies contained in even small piece of software, makes it inherently fragile to change. Each time we want to make a change, any change, we can break things which were already working — and could not even notice that.
Consider that we have a consistent and largely proven state of the system in the code base. Introducing a change, even if it contains no regressions, immediately affects the whole product development process.
IT TAKES EFFORT TO RETURN TO CONSISTENT STATE — If case of bug / regression found, even a mere rollback requires efforts, unnecessarily obfuscates the change history and still requires further actions (as the changes we wanted to introduce are still open now). Broken tests should be analysed, root causes found and fixes introduced (another change).
IT IMPOSES MORE EFFORTS ON OTHERS Making a change public implicitly forces other developers working on same code base in parallel to perform actions: synchronise new changes with theirs, which might be a non trivial process resolving conflicts. If the public state becomes broken it might completely block others from being able to proceed with their work.
DELAYED COSTS OF HIDDEN FAILURES Identifying a regression shortly after a change is introduced is still the best case scenario. As even large test coverage will not guarantee to catch all issues, failures in production will take much more efforts to recover from.
SECOND-ORDER COSTS All efforts spend of fixing failures steal the time from development team to develop new features or improve existing.
Considering all these multiplying effects we must be highly interested that we spent all economically justified efforts to ensure the quality of a particular change. This is the main factor to be measured upon. If qualification takes time — which it usually does — perfect. Any additional qualification effort is expected to be aligned by avoidance of later failures.
HOW EXACTLY SHOULD WE QUALIFY?
First, by RUNNING TESTS. Having a test suite with large enough coverage is a minimal measure which wins us at least /some/ confidence. Good, if tests can be run in seconds (see Philosopher’s Stone of SW Engineering), but if not, running tests for hours before making a change public still worth time invested. There are projects, usually in industrial domain where testing cannot be, at least easily, automated and still requires time-consuming manual interactions: even in this case keeping the change open until testing is done justifies the delay.
And then by PEER REVIEW. If we talk about the direct purpose of the review — finding possible issues with the current change, the effect is usually described as “having second pair of eyes” which is actually a semantic wrapper around factual cognitive processes used:
- In the process of “translating requirements into executing instructions” (which is what the original change author did) our mind operates differently comparing to the mode of “trying to understand how this code leads to execution of this requirement” (the reviewer), which makes it follow the different path and eventually spot issues which escaped
- We all have different set of knowledge references, thus if we do not understand something (insufficient knowledge) our brain starts looking for the answer, again following a different path
- We all have different experience / skills which leads us to explore problem space differently, finding own solution and evaluate current one by comparison
While automated tests are perfect in testing defined functionality, we, humans, are still vastly better at non-functional activities: thinking at a global level, evaluating domain intersections, designing for scale. Combining both types of activities builds a most efficient option to verify changes.
THERE’S MORE
The qualification is a primary goal of the code review but there’s more. Thinking in context of overall project and product development there are second order effects which could possibly have more significant impacts.
REACHING AN AGREEMENT The nature of change review process is that you have different options but the shared goal, which implies to reach the goal the participants are expected to come to an agreement somehow. An agreement is a powerful mechanism in different kinds of communication (see also: Contract Mindset). Team members learn how to solve problems coming to a solution together and this increases chances for all upcoming problems in the project will be solved in more optimal way.
GETTING MORE FAMILIAR WITH CODEBASE the reviewer inevitably peeks into areas he is usually not working with.. In the long term this makes the team more robust as in critical situations, members would be able to take over another’s area in a short time.
ALIGN ARCHITECTURE AND CODE STYLE across different components makes overall system more coherent.
AUTOMATIC DECISION DOCUMENTATION If an accompanying description contains even short mention of motivation, main effect and major changes it will serve as a reference for possible future problems. Such a notice also helps the reviewer quickly grasp the context.
WHAT TO LOOK AT
It is not expected that the reviewer is obliged to understand everything that is contained in a change. This would take time (recall, we need to stay economically feasible) and will still bring no guarantee all issues will be discovered. Instead he can look at:
- CUSTOMER or INFRASTRUCTURE IMPACT which be the result of this change should actually be understandable by anyone, even not a technical team member
- SUSPICIOUS CHANGES — the ones that do not belong to the change context
- OCCASIONAL SLIPS like copy-paste patterns or reversed orders
- WORK REMNANTS like debugging traces, temporary code changes which were not reverted or commented-out code
Naturally, changes of smaller size are preferable from all perspectives: more chances the reviewer would be able to “wrap his head around” the whole changes, less future conflicts with other code changes existing in parallel, still lesser risk of regression if less components are affected.
At the same time, there are benefits in introducing only complete and consistent changes. If the nature of the change requires modification at many places, sometimes because of non optimal architecture, it should be introduced as a whole.
RESOLVING CONFLICTS
Given that team members have different skills and experiences, and even different goals an initial agreement cannot always be achievable. Yet this situation should not block the process. This is optimally handled by having agreed to the role distribution, even if it is implicit. The main ones are:
- MASTER / APPRENTICE Either the reviewer or the change author has clearly more experienced role. Thus he has a word for final decision.
- PEER / PEER With no clear seniority it is expected that parties try to agree on each issue separately.
In each case we assume both reviewer and author have shared goal as members of one team. Despite this, there can still be cases where opinions are not aligned. This is solved by having agreed on a dedicated authority who “has the last word” in advance. This can be either another senior member of the team or the team lead, as he takes more responsibility.
There is also no imperative to fix all the findings within this review or at all. When done right, the review has already brought a lot of value. Nitpicking (something like convention on excess semicolons) quickly becomes non justified as every effort modifying pending change takes time and actually need to be fully qualified again.
Finally, the initial question — “to review or not review” — is deliberately wrongly stated. If a team is acting consciously, understanding why and agreeing to how, it will inevitably manage to install its own version of review process, which will be unconditionally better than just following some guidelines.