With a little care, and with suitable intermediaries who can bring together all stakeholders and help them reach a position that works for them, we can deliver extreme reliability at marginal cost.

Guest author

November 25, 2022

7 Min Read
Perfection, excellence, and the cost of bridging them in the telecoms industry

Telecoms.com periodically invites expert third parties to share their views on the industry’s most pressing issues. In this piece telecoms consultant William Webb examines the collective challenge of ensuring we are always connected.

Across the western world, failures in communications networks are infrequent, but they do come at the expense of personal safety, data security and business continuity.  The recent Rogers outage in Canada lasted nearly a day, whilst the UK experienced a similar outage in December 2018 from O2’s network.  Both of these outages felt like they lasted a very long time because our economy is Internet-reliant. In addition, they become even more problematic in an emergency, when one has to dial for emergency response services.

Recently Rogers acknowledged that 2.92 million wireline and 10.242 million wireless customers were impacted during the blackout. While subsequent reports determined that it did not breach service level agreements (SLA) with its retail customers, Rogers is assessing if it breached SLA with its vendors.

The Rogers outage was caused by an update to the distribution routers in its network, which caused Rogers’ internet gateway, core gateway and distribution routers to cease communication with one another, as well as with Rogers’ cellular, enterprise and cable networks.

The network of mobile operator O2 experienced an outage in December 2018 affecting all of its 25 million customers. A recent Ofcom inquiry concluded that O2’s outage was significant, and that the disruption was caused by an issue with software provided by Ericsson. A fault in this critical software, linked to the expiry of a ‘security certificate’, caused the software to fail and disrupted O2’s network.

Both of these outages were basically caused by software bugs – unintentional errors rather than malicious activity – and both made headline news for good reason.  Millions are inconvenienced or put at risk. We have recently seen that Internet of things (IoT) networks also fail, impacting or idling a variety of systems such as information signs, in-store payments, mobility networks and more.

So the question is whether is this very public failure, of both policy and of systems, fixable?

The path to network reliability

Outages similar to the ones described at the beginning of this article are thankfully rare – and that’s partly why they make front-page news. And while there has been no complete failure of a mobile network in the UK since 2018, there have been many less notorious cases of local unavailability of services and applications.

Mobile networks are comprised of millions of lines of code and some of the most advanced technology in existence. That they fail as infrequently as they do is amazing (by way of comparison, just think how often your PC or laptop needs a reboot). While we should ensure that they are as reliable as possible, complete reliability is not feasible, and going from, for example, one failure every five years to one every twenty years carries a very high cost that will be passed onto consumers who may not value the additional reliability as much as the raised cost.  So all stakeholders must accept that perfection is a journey not a destination, and that the risks are always with us.

Potential solutions

Governments often believe that they should get involved – not least when loud calls for “something to be done” echo in national parliaments – and there can be a role for intervention, but like most things done in haste, they tend to be ill-judged.  Governments focus on security threats and worry loudly about Chinese equipment, and while these are potential risks, they should worry at least as much about insufficiently tested software and unintended errors.

Governments further believe that having more suppliers lowers risk, which is true in part, but each supplier is just as likely to have bugs in their code as another. The more suppliers there are, the harder it is to ensure that their equipment integrates and that their code is error-free.  Finally, Government intervention in a competitive market (arguably not the case in Canada today, to revisit that example) is difficult and risks market distortions.

The best form of resilience is technological redundancy: having a second option available when the first, inevitably, fails. And generally, in the G7 we do, when the mobile network fails, devices shift to Wi-Fi, often without us even noticing. Of course, Wi-Fi only works in or near buildings, so is not a perfect substitute. And let’s not forget that there are ever more people who work, live or travel away from Wi-Fi. The same is true in reverse: if Wi-Fi or broadband fail, we can switch to cellular data, using a mobile hotspot to connect Wi-Fi only devices.

Satellite connectivity can also play a role in some cases, not least in less connected jurisdictions, although only the most up-to-date space solutions have the capacity to be a complete solution.

And there is a last solution for the cases when Wi-Fi can’t be used – national mobile roaming during network failure. Here, when one mobile network fails, the affected subscribers are distributed across the other mobile networks in the country until such time as their home network comes back to life – effectively the model that Canada’s practically-minded minister seeks to enshrine in commercial agreements between his operators this month.

Technically, this solution is relatively easy to implement by giving subscribers a pre-programmed network ID in their SIM cards that they can roam to. The ID is only activated by an operator with a working network once a national network failure has been declared, and then deactivated once it is over.

There are challenges, such as ensuring the other networks are not overwhelmed by traffic, but these are soluble using throttling, reduced data rates or similar.  And they should be accompanied by the stick: substantial penalties for any operator with a failed network, to discourage over-reliance on this mechanism.

This solution is not costly to implement and, absent a complete failure across multiple mobile networks, should mean very few are affected by, or even notice the network failure.

The opposition to this option generally comes from operators who worry it will set a precedent leading to national roaming at all times, where one network has no coverage but others do. There are good reasons to avoid general national roaming and hence any emergency roaming would need to come with clear and ideally legally enforceable guarantees that it would not be the thin end of a wedge that inexorably led to wider roaming.

Conclusion

The modern world is built on high speed, high reliability networks. In addition to telecommunications  and manufacturing, everything from trains to home thermostats and wrist watches relies on networks. Recent outages have proven how even a few hours of disruption grind economies to a halt and, in some cases, endanger people’s safety. For this reason we believe that acquiring resilience can only be achieved by investing in technological redundancy.

With a little care, and with suitable intermediaries who can bring together all stakeholders and help them reach a position that works for them, we can deliver extreme reliability at marginal cost.  In doing so we get ever closer to perfection while not undermining the workability of the very good – and pave the way for a safer and more reliable technology that works for every user.

 

William-Webb-William-Webb-portrait-150x150.jpg-150x150.jpgWilliam has over thirty years’ experience in technological communications. A previous CTO of Ofcom for over seven years, he was also the Director of Corporate Strategy of Motorola based in Chicago, USA.He moved on to become one of the founding directors of Neul, holding the role of CTO where he was responsible for the overall technical design of an innovative new wireless technology, before being sold to Huawei in 2014. Latterly, he was CEO at Weightless SIG, which harmonised the technology as a global standard.

William is the author of 17 books including “The 5G Myth”, “Spectrum Management”, and Our Digital Future”. He has 18 patents, and over 100 papers spanning learned journal papers to the Wall Street Journal. His biography is included in multiple “Who’s Who” publications around the world where he has been honoured with life-time achievement awards.

 

Read more about:

Discussion

You May Also Like