Introduction
On 17 July, thousands of subpostmasters opened their doors as they do most mornings, only to find the core software that keeps their branches running had vanished from reach. Horizon, the system that powers daily transactions across the network, was unavailable for hours after a datacentre incident. Lights were on, counters were staffed, customers queued as usual, but for a long and nerve-stretching window the machine at the heart of the operation simply did not respond.
This analysis explains how a national IT failure turns into real losses at the counter, why those losses too often land on the shoulders of subpostmasters, and what a fair, practical path forward should look like. It marries operational detail with financial reality and sets out concrete steps for branches, the Post Office, and technology suppliers to reduce the risk that a software failure ever again becomes a personal financial crisis.
A Network Of Small Businesses Tied To One Spine
Each branch may be small, but together the network is vast, and it relies on one shared backbone. Horizon handles more than simple retail sales. It records postage, bill payments, top-ups, government services, banking transactions, foreign currency, lottery, and a long tail of niche tasks that only a post office performs. The shared design brings consistency and auditability. It also creates a single point of failure. When Horizon stops, every branch in the country feels the shock at precisely the same time.
The July outage made this structure painfully clear. Around 11,500 branches could not run normal processes for several hours. Terminals timed out. Screens froze at log-in. Workflows that depend on identity checks or system-generated labels simply stalled. Faced with this, subpostmasters had three options. Close the counter. Offer a very narrow cash-only service for low-risk items. Or keep customers waiting in the hope that systems would return before tempers and patience ran out. None of these choices is good. All three create costs that ripple far beyond a single morning’s takings.
Where The Money Disappears During An Outage
There are three distinct revenue losses in a technology blackout. Understanding them separately helps set fair compensation rules and better contingency plans.
1. Transactions That Cannot Happen Without The System
Many core products have a tightly controlled workflow. Special Delivery requires a system label and tracking entry. Banking deposits need account validation and an audit trail. Benefit payments and government services must verify identity and entitlement on screen. If those screens do not load, there is no safe manual route. The customer leaves without completing the task. If they travelled far or are time-constrained, the sale does not return later. It is gone.
2. Transactions That Customers Abandon
Even for services that could theoretically wait, queues have a tipping point. People leave when uncertainty stretches beyond a few minutes. Footfall is wasted, and a portion of those customers will not come back that day. In busy branches this abandonment curve is steep. The visible effect is a thinning line. The hidden effect is a dent in the day’s conversion rate that no one can recover once the clock has moved on.
3. Leakage From Manual Workarounds
Some branches accept cash for low-risk items during an outage, then post the transactions into the system later. Staff keep notebooks, tally sheets, or simple receipt pads. Even with care, some entries slip. Handwritten amounts can be misread. A customer using exact change might decline a receipt and later dispute the amount. Small errors add up. In a normal week these are rounding differences. In a blackout, leakage can become a visible slice of the day’s takings.
Costs That Keep Ticking Even When Nothing Moves
Revenue is only half of the picture. Operating costs continue while systems stall. Staff are on shift. Wages do not pause because a server has stopped. In many communities, subpostmasters bring in an extra pair of hands on high-traffic days. Those pre-booked hours are non-refundable. There is also reputational cost. Branches serve as anchors in local life.
If people walk away embarrassed or annoyed, that trust needs to be re-earned, and it often takes more than a single good day to do it. Finally, there is stress. When essential tools fail, the person at the counter carries the face-to-face fallout. That emotional toll is hard to quantify, but anyone who has managed a branch knows it is real.
A Simple Way To Estimate Branch-Level Losses
You do not need complex models to calculate the damage. A transparent, branch-friendly method can be built from familiar numbers.
- Start with an average hourly transaction count for the affected day and time. Most branches know their pattern by hour.
- Apply a realistic completion rate for each bucket during the outage. System-required becomes zero. Discretionary might drop to twenty percent. Cash-safe may hold at fifty to seventy percent depending on confidence with manual tallies.
- Convert the missing transactions into revenue using product mix and fees. Include any commission or reimbursement that would have applied.
- Add a standard allowance for leakage on any manual items that must be re-keyed after the fact. A small percentage is defensible and easy to audit.
- Add fixed costs for the outage window. Staff hours and overhead should be included pro-rata.
This logic is simple to implement. It avoids guesswork. It gives subpostmasters a figure they can recognise from their own logs, and it gives the centre a consistent basis for compensation.
Who Should Carry The Risk
The branch network cannot be expected to insure the central system. The rule should be clear. If a branch followed the official outage playbook, the financial risk of a system failure sits with the Post Office and its technology providers, not with the individual subpostmaster. That principle matters for fairness. It also matters for safety. If liability is unclear, staff may feel pressure to invent unsanctioned workarounds that put customers and data at risk.
What Fair Compensation Looks Like In Practice
Compensation needs to be automatic, fast, and based on auditable inputs rather than paperwork battles. A workable model can combine three elements.
- An automatic base payment per affected hour once the central incident is confirmed. This compensates for standing costs and immediate disruption without delay.
- A variable top-up based on each branch’s typical hourly throughput for that day and time. Use recent comparable weeks as the baseline. Apply the bucket method described earlier to calculate lost revenue more precisely.
- A leakage allowance to recognise the friction of re-keying manual sales. A small, standard percentage keeps arguments out of the process.
Set service-level targets for the payout timeline. A week for base payments. Two weeks for the variable top-up. Publish performance against those targets monthly so the system is trusted.
The Branch Outage Playbook: What To Do In The First Hour
Branches need clarity on the exact steps to follow. A good playbook is short, visible next to the counter, and rehearsed.
- Confirm the incident using the official channel. If the issue is national, stop troubleshooting locally and move to outage mode.
- Inform customers in plain language. Put a sign at the door and at the counter that explains the situation and lists any cash-safe services still available.
- Begin a manual transaction log for approved low-risk items. One line per sale with time, product, amount, and initials.
- Protect the queue. Offer realistic wait times. Encourage customers with complex needs to return once systems are up.
- Mark the start and end of the outage window in the log. Photograph the signage and the first page of the log for your records.
A short rehearsal every quarter builds confidence. New staff must be walked through the process on day one. Confidence reduces errors, and fewer errors mean faster reconciliation when systems return.
What The Centre Must Commit To Immediately
Head office and suppliers have responsibilities that go beyond apology emails. These commitments are practical and measurable.
- Clear recovery targets. Publish a recovery time objective for the central platform and a recovery point objective for transaction data. Keep them conservative and realistic.
- Outage communication. Within fifteen minutes of a confirmed national incident, send a single, structured update that branches can show customers. Update it at defined intervals, even if the message is simply that engineers are still working.
- Automatic compensation. Trigger the base payment as soon as the incident is closed. Do not wait for claim forms. Invite branch-level data only for the variable top-up.
- Dedicated reconciliation tools. Provide a simple screen for posting manual logs back into the system with fields that mirror the paper log. Include prompts that reduce common errors.
- Transparent post-mortems. After any major incident, publish a plain-English summary that explains what failed, what was fixed, and what will change to prevent recurrence.
Technical Fixes That Reduce The Chance Of A Repeat
Some outages are inevitable. Many are preventable or can be softened with the right design choices. The following improvements are realistic for a nationwide retail system.
Offline-First For Low-Risk Workflows
Give terminals a limited offline mode for defined products. Capture the sale offline with checks that can be validated locally. Hold the transaction securely and push it to the core system once connectivity returns. Store and forward is not exotic. It is a mature pattern when designed with clear boundaries and safeguards.
Graceful Degradation Instead Of Hard Failure
Design the application to degrade services by dependency. If identity services are down, warn the operator and block only the flows that require identity. Keep simple postage and retail open. The goal is to preserve as much of the counter’s usefulness as possible.
Redundant Paths To The Datacentre
Two connectivity providers per branch is a practical standard. So is active-active capacity across datacentres with independent power and network upstreams. Regular failover tests should be a calendar event, not an aspiration.
Local Caching Of Critical Reference Data
Keep small, signed caches of product tables, pricing, and service rules at the branch. These caches allow the system to validate basic transactions offline without opening security holes. The cache can expire gracefully and refresh automatically.
Focused Security That Does Not Block Recovery
Security controls must prevent fraud and protect data. They should not stop an entire network from handling cash-safe items during a partial outage. Risk-based controls, device attestation, and tiered permissions can keep the bar high without producing an all-or-nothing failure pattern.
Communication That Calms The Counter
What branches need in the heat of an incident is clarity. Vague language increases stress. A good update answers the questions customers ask.
What is down. What is still available. Rough timing for the next update. Whether customers should wait or come back later. If the central message includes those points in one short paragraph, counter staff can focus on people rather than trying to interpret technical jargon.
Metrics That Build Trust
Trust follows data. The organisation should report three things after each incident.
- Time from first alert to first branch-facing message.
- Total outage duration for each major function. For example, postage, banking, identity checks.
- Compensation timing performance against the service-level targets.
When these numbers improve over time, confidence improves with them.
A Practical Roadmap For The Next Ninety Days
Change does not need to wait for a full platform overhaul. The network can move on three fronts immediately.
- Publish and rehearse the branch outage playbook across all regions. Make sure every counter has the same laminated sheet and the same manual log template.
- Turn on automatic base compensation for any central incident longer than a set threshold. Keep the threshold short enough that branches feel protected.
- Begin a pilot for offline-first low-risk workflows in a representative mix of branches. Measure error rates, reconciliation speed, and customer satisfaction, then expand.
These steps are simple, visible, and morale-boosting. They buy goodwill while deeper engineering work proceeds.
Conclusion
A national system outage is not an abstract headline for subpostmasters. It is a morning of awkward conversations, lost sales, and worry about how to make the numbers add up. The structure of the network creates shared benefits in normal times and shared pain in bad ones. That is exactly why risk must be shared fairly, with automatic compensation and clear rules that reward good practice at the counter.
The path forward is not mysterious. Keep branches safe with a rehearsed outage playbook and a narrow set of cash-safe products. Keep people informed with timely, plain-English updates. Pay automatically and quickly on a transparent formula. Invest in technical design that degrades gracefully, stores and forwards when acceptable, and fails over reliably.
Do these things consistently and the next time a server fails, a branch will still feel like a place where work gets done. Customers will leave with confidence intact. Subpostmasters will finish their shift knowing the system stands behind them when the system itself is the problem. That is what a trustworthy network looks like in practice.
