Rogers Communications Inc. says many of its own employees were knocked offline and unable to immediately deal with a massive network outage on July 8 that affected millions of its wireless and wireline customers.
In a letter of explanation demanded by regulators at the Canadian Radio-television and Telecommunications Commission, Rogers described the outage as “unprecedented” and said its engineers and technical experts “are continuing to work alongside … global equipment vendors to fully explore the root cause and its effects.”
Rogers executives including chief executive Tony Staffieri and newly appointed chief technology officer Ron McKenzie are expected to face further grilling Monday at a meeting of the Standing Committee on Industry and Technology about the day-long national outage that hampered services from 911 calls to the Interac payments system.
In the early stage of the outage, many of Rogers’ network employees “could not connect to … IT and network systems,” which “impeded initial triage and restoration efforts,” Rogers said in the letter to the CRTC.
“To complicate matters further, the loss of access to our VPN system to our core network nodes affected our timely ability to begin identifying the trouble and, hence, delayed the restoral efforts,” the company said.
Some employees did have access through “emergency SIMs” on alternate telecommunications carriers Telus and Bell, a practice established through reciprocal agreements in 2015, while others travelled to centralized locations to establish network access.
“Together, these groups were able to establish the necessary team to identify the cause of the outage and recover the network,” Rogers said in the partially redacted letter. However, it took most of the day to re-establish service to customers, and some sporadic problems continued through the weekend.
In the letter, Rogers expanded on earlier statements blaming the outage on a network system failure following an update in its core IP network.
During the sixth phase of a seven-phase process that had begun weeks earlier — the first five phases of which Rogers says proceeded without incident — coding was introduced in the telco’s distribution routers that triggered the failure of the IP core network, starting at 4: 45 am on July 8.
A routing filter was deleted, which allowed for all possible routes to the Internet to pass through the routers, resulting in “abnormally high volumes of routes throughout the core network,” Rogers said.
“Certain network routing equipment became flooded, exceeded their capacity levels and were then unable to route traffic, causing the common core network to stop processing traffic.”
As a result, the Rogers network lost connectivity to the Internet for all incoming and outgoing traffic for both the wireless and wireline networks for consumers and business customers.
For reasons that weren’t fully explained in the letter, at 6 am, Rogers’ chief technology officer reached out to counterparts at Bell and Telus advising them of the issue Rogers was having “and also to watch-out for possible cyber-attacks. “
The widespread outage affected Rogers customers classified as “critical infrastructure,” such as hospitals, and gas and energy providers, Rogers said in the letter, adding that it is not known whether these customers were “fully impaired or if they had some degree of dual -carriers diversity that protected them from full disablement.”
Both Bell and Telus offered assistance during the outage, according to the letter to regulators.
“However, given the nature of the issue, Rogers rapidly assessed and concluded that it was not possible to make the necessary network changes to enable our wireless customers to move to their wireless networks,” the telco said, adding that it was unable to access its user database and home subscriber server during the outage.
“Furthermore, given the national nature of this event, no competitor’s network would have been able to handle the extra and sudden volume of (more than 10 million) wireless customers … and the related voice/data traffic surge.”
However, Rogers said it would explore, along with other Canadian carriers, how they could work together to avoid future widespread outages. A Memorandum of Understanding on such cooperation is to be delivered in September to the Minister of Innovation, Science and Economic Development.
Rogers also pledged to segregate its wireless and wireline core networks to avoid a repeat of the widespread July 8 outage. In addition, the telco said it has hired an external review team to do a complete evaluation of all processes, including the performance of network upgrades , disaster recovery procedures, and communication with the public.
A previously announced plan to credit all residential and small business customers with the equivalent of five days of service fees will be applied automatically to customer accounts as of Aug. 1, Rogers said. The end-users of affected re-sellers will also be credited .
“While the outage for most customers was approximately a day, Rogers wished to demonstrate our commitment to our customers and recognize how we let them down that day,” the letter said.
The telco apologized for failing to live up to its promise as Canada’s most reliable network, and an signed introduction by chief regulatory officer Ted Woodhead said the company was “particularly troubled that some customers could not reach emergency services or receive alerts” during the outage.