An update on our September 30 BGP issue
Earlier this week, a technical error meant that some internet traffic was incorrectly routed through our network. While minimal traffic was actually received due to protection mechanisms in place around the internet, we’re investigating and will share our findings.
How the issue occurred
For just under three hours on September 30, between 03:46 and 06:32 AEST, an incorrect network configuration was deployed to one of our Telstra Edge routers that form part of our link to the global internet and which handle traffic routing. That configuration incorrectly advertised 500 IPv4 prefixes as belonging to Telstra, unintentionally routing some internet traffic through our network. You may have seen some information on this incident already from the email service provider ProtonMail.
The incident was triggered by post verification testing that was being run to address an unrelated software bug in our Telstra Internet Direct provisioning tools. A previous test verification prefix set was incorrectly loaded against a production service, which meant these prefixes were then announced to the global internet through Border Gateway Protocol (BGP).
How BGP works and why this happened
In simple terms, BGP supplies all the different networks that make up the internet with a quick guide to the fastest routes to each corner of that combined network. BGP peers regularly announce updated routes to each other, and those updates propagate from peer to peer. It’s that system that allows the autonomous systems that make up the internet to create the fastest and most efficient routes for data to travel. In this case, the incorrect configuration was announced to peers that then adopted it and announced it to others, amplifying the technical error.
BGP is open and relies on trust by design, but various protection mechanisms are implemented in networks around the world to protect against unsolicited or invalid changes. Our investigation indicates minimal traffic was observed in our network during the period, and it’s our understanding that these protection measures internationally helped to reduce the overall routing impact. We’ve been supportive of the industry making changes to RPKI Route Origin Validation to improve some of the known vulnerabilities of BGP, and have implemented RPKI on our own Australian network and are in the process of rolling this out to our global networks.
It’s important to understand that the root cause of this interruption was not malicious in nature, the routes were not intentionally hijacked, and no emails or data were breached or lost. As soon as our investigation revealed the cause of the issue, we engaged Level 3 Network staff who rolled back the network configuration to resolve the impact.
What we’re doing about it
To stop this from happening again, we’ve temporarily disabled the provisioning testing tools that caused this until we can ensure it won’t happen again. We’re also modifying our route validation to prohibit the kind of bulk upload for static routes that was the initial cause of this issue.
We’d like to apologise for any service issues experienced by other parties. We’re in the process of undertaking a post incident review to examine our testing, validation and network escalation paths and will share our findings within the global internet network operators’ technical community when that is complete.