What Software Quality Actually Costs And Why Most Teams Are Measuring It Wrong

The conversation about software quality almost always starts with tools. It should start with consequences. There is a version of the software quality conversation that most engineering teams have had many times. It goes roughly like this: testing coverage is too low, the QA process is too slow, the tools aren’t integrated well enough, and something needs to change before the next release.

That conversation is useful as far as it goes. But it tends to stop at the symptom level, at the visible friction and the immediate bottlenecks without reaching the underlying question that actually determines whether a team improves: what does poor quality actually cost, and are we measuring that cost correctly? Building Qualigentic pushed us to think about this more carefully than we had before. Here is what we found.

60–70% of QA effort spent maintaining existing tests — not writing new ones
Invisible > Visible The costs that don't generate a ticket are almost always larger than the ones that do
Design > Verification Quality found during design costs a fraction of quality found in production

The visible costs are not the expensive ones

When organisations try to calculate the cost of software quality problems, they typically start with what they can measure: the number of bugs reported, the time spent in rework, the support tickets that trace back to a defect in production. These numbers are real and they matter. But they systematically undercount the actual cost of poor quality because they miss the costs that don’t generate a ticket. The engineer who spends two hours debugging something that a better test suite would have caught in two minutes that time doesn’t show up as a quality cost. It shows up as normal development time. The product decision that gets deferred because the team doesn’t have enough confidence in the existing codebase to add a new feature safely that doesn’t show up as a quality cost either. It shows up as slower velocity. The customer who doesn’t renew, or the procurement decision that goes to a competitor, because the software had three incidents in six months that might show up in revenue, but it rarely gets traced back to the engineering process that allowed those incidents to happen.

When you add up the invisible costs alongside the visible ones, the picture changes substantially. Quality problems are almost always more expensive than they appear and the gap between the apparent cost and the actual cost tends to grow with the complexity and criticality of the software.

Why the standard approach to quality doesn't fix the problem

The standard response to software quality problems follows a predictable pattern: invest in better testing tools, increase test coverage targets, add a QA stage to the development process, and measure the number of bugs caught before release. These are reasonable interventions. They do improve outcomes. But they address quality as a downstream activity, something that happens after the code is written, to verify that what was built is correct.

The fundamental problem with this model is that it treats quality as a function of verification rather than a function of design. And when quality is primarily a verification activity, it will always be in tension with delivery speed because verification takes time, time is scarce, and scarcity leads to shortcuts. The teams that consistently produce reliable software have made a different choice. They treat quality as a design activity something that happens while the system is being conceived and built, not after.

The questions are different: not “does this work?” but “what are the conditions under which this breaks, and have we designed against those conditions?” This shift changes the economics of quality. 

When failure modes are considered during design, the cost of addressing them is low. When they are discovered in production, the cost is high. The difference isn't the complexity of the fix it's the stage at which it happens.

The maintenance problem that nobody budgets for

There is a specific cost that deserves more attention than it typically gets: the cost of keeping an existing test suite working.

Most teams, when they think about the investment required for quality, focus on the cost of writing tests. That is the visible, budgetable part of the work. What is harder to account for is the ongoing cost of maintaining those tests as the codebase evolves updating scripts after code changes, removing redundant tests that now test nothing, diagnosing whether a failure represents a real defect or a stale test that no longer reflects how the system works.

Industry data consistently puts the proportion of QA effort spent on maintenance at somewhere between 60 and 70 percent. That is a significant allocation of engineering time devoted not to finding new problems but to keeping the existing detection infrastructure functional.

This maintenance burden compounds in regulated industries. In environments subject to DORA, Solvency II or PSD2, quality isn’t just an internal engineering concern, it is an audit topic. Regulators want evidence chains, not screenshots. They want to see a traceable line from requirement through to test execution through to archived result, and they want to be able to retrieve that evidence on demand. Building and maintaining that evidence chain manually, on top of an already stretched QA function, is a significant and often underestimated cost.

What a regulator-facing evidence chain actually requires
Requirement
Jira / ALM ID, version, owner
Generated test
Script + hash, model + prompt
Execution
Timestamp, env, operator
Result
Pass / fail, logs, traces
Archive
Signed, retention, on-demand
Designed against DORA Articles 6 & 9, Solvency II Pillar 2 and PSD2 Article 95. Your regulator-facing evidence is a query away.

This is one of the problems Qualigentic was designed to address directly. The autonomous maintenance capability, detecting stale, broken and redundant tests and refactoring them automatically exists precisely because we saw how much engineering capacity was being consumed by that work. And the full audit traceability layer, from requirement to archived execution result, exists because we recognised that in regulated industries, the evidence chain is not optional infrastructure. It is a requirement with a cost, and that cost should be planned for rather than absorbed.

What Qualigentic required us to confront

Qualigentic is a system that has to be reliable. The nature of what it does means that quality problems don’t stay contained they propagate, they compound and they have consequences that go beyond the immediate technical failure. Building it required us to be more explicit than we are accustomed to being about the relationship between quality and architecture. The decisions made early in the design process about how components interact, about where validation happens, about how failure is surfaced and handled have consequences that play out over months and years, not just in the current sprint.

Three things stood out from that process that we think have broader applicability. 

That model works when the feedback loop is fast and the QA team has enough context to catch what matters. In practice, it usually creates a bottleneck and a cultural dynamic where quality is seen as someone else’s job until it becomes an emergency. When quality is distributed when every engineer feels genuine ownership over the reliability of what they ship the conversation changes. Problems get raised earlier, trade-offs get made more consciously and the accumulated cost of small quality compromises doesn’t quietly grow until it becomes a crisis.

Test design has to happen at the same time as system design
When tests are written after the architecture is established, gaps appear — not because the system is poorly built, but because the failure modes weren't fully considered when the design decisions were made. Writing tests alongside design forces those considerations earlier, when they are cheaper to address.
Quality processes have to be fast enough to be used under real pressure
A quality process that adds significant time to the development workflow will get compressed when sprints get tight. A quality process fast enough to run continuously becomes part of the normal rhythm of development. The difference isn't convenience — it's whether the process actually changes what gets shipped.
Quality ownership has to be distributed, not delegated
When quality is primarily the responsibility of a dedicated QA team, it becomes a handoff. That model creates a bottleneck and a cultural dynamic where quality is someone else's job until it becomes an emergency. When every engineer feels genuine ownership over reliability, problems get raised earlier and trade-offs get made more consciously.

That model works when the feedback loop is fast and the QA team has enough context to catch what matters. In practice, it usually creates a bottleneck and a cultural dynamic where quality is seen as someone else’s job until it becomes an emergency. When quality is distributed when every engineer feels genuine ownership over the reliability of what they ship the conversation changes. Problems get raised earlier, trade-offs get made more consciously and the accumulated cost of small quality compromises doesn’t quietly grow until it becomes a crisis.

The measurement question

If poor quality is more expensive than it appears, the implication is that the investment required to address it is also higher than organisations typically budget for. This creates a practical problem: how do you make the case for that investment to people who are working from the visible cost numbers? The answer is to change what you measure. The metrics that most engineering teams track: test coverage, bug counts, mean time to resolution are lagging indicators. They tell you what happened, not what it cost or what will happen next. The more useful measurements are the ones that capture the invisible costs: the proportion of development time that is reactive rather than productive, the frequency with which new feature development is slowed by concerns about existing system stability, the correlation between quality metrics and delivery velocity over time. These numbers are harder to produce. But they tell a more accurate story about the real economics of quality and they make the conversation about investment much more tractable.

What you are measuringWhat most teams trackWhat actually matters
Test coverage% of lines covered✓ Failure modes covered
Bug countTickets logged✓ Bugs that never reached a ticket
QA investmentTools and headcount✓ Maintenance burden + invisible cost
Delivery velocityStory points per sprint✓ Velocity lost to quality anxiety
Incident costTime to resolve✓ Revenue and trust impact upstream
Quality ownershipQA team responsibility✓ Every engineer, every sprint
Lagging indicators tell you what happened. Leading indicators tell you what it cost and what comes next.

What this means in practice

The practical implication of thinking about quality this way is that the interventions worth making are different from the ones that tend to get prioritised. The tools matter, but they matter less than the process design. A team with average tools and a well-designed quality process will consistently outperform a team with excellent tools and a poorly designed one.

The process design matters, but it matters less than when quality thinking enters the development cycle. Teams that consider quality during design will consistently outperform teams that consider quality during verification, regardless of how good the verification process is. And ultimately, both of these are downstream of culture of whether the people building the software feel genuine ownership over its reliability, and whether the organisation around them rewards that ownership or quietly penalises it in favour of delivery speed. Quality is a technical problem in the sense that it requires technical skill to solve. But the decisions that determine whether a team produces reliable software are mostly not technical decisions. They are organisational decisions, process decisions and culture decisions made upstream of the code, with consequences that show up long after the release.

At Caixa Mágica, this is how we think about quality not as a phase in the development process, but as a way of working. Qualigentic pushed us to be more rigorous about it. The lessons from that process inform how we approach every project we build.

Qualigentic · Caixa Mágica Software
See what agentic quality looks like inside a real development workflow
Time-boxed pilot. One application. One framework.
Regulator-facing evidence in 6–8 weeks.