Internet access for organizations today is no longer about connectivity for email and web browsing. A stable Internet connection is a vital component in the chain of IT systems required to conduct business. Typically, in the past, the focus around Internet connectivity has been on cost, with vendors providing solutions allowing organizations to spread their traffic across consumer and enterprise products. This approach is all good and well and can provide significant cost savings, especially when employee traffic is directed over low-cost consumer products such as ADSL, however, when you conduct B2B business through front-end servers hosted in your DMZ, resilience becomes a major concern. A dead Internet link can mean loss of revenue and even, potentially more serious, brand damage in this scenario. In this paper, we discuss several methods that can improve the resilience of an Internet link. While this sounds like it should be a simple case of connecting to multiple Internet Service Providers, the devil, as they say, is in the detail.
Business networks have been mission-critical for some time now, and the focus on resilience and business continuity has always been top of any CIO’s mind. However, the general areas of interest for this focus were restricted to internal networks and systems. With more and more business being conducted either directly via the web or B2B over Internet links to systems hosted in DMZ’s, it is simply no longer permissible for an Internet link to be down. Loss of access to the Internet can directly impact revenue generation, especially today as the business operating models begin shifting towards off-site cloud computing and software as a service.
A solution to the problem
Multihoming is essentially a method whereby a company can connect to more than one ISP at the same time. The concept was born out of the need to protect Internet access in either an ISP link failure or an ISP internal failure. In the earlier days of Internet access, most traffic was outbound, except email. An Internet link failure left internal users with no browsing capability and email backing up on inbound ISP mail gateways. Once the link was restored, so was browsing and email delivery. The direct impact to the business was relatively small and mostly not revenue effecting. Early solutions to this problem were to connect multiple links to the same ISP, but while this offered some level of link resilience, it could provide no safeguards against an internal ISP failure.
Today, however, most organizations deploy a myriad of on-site Internet-accessible services such as VPN’s, voice services, webmail, and secure internal system access while also making use of business-critical off-site services such as software as a service (SaaS) and other cloud-based solutions. Furthermore, while corporate front-end websites are traditionally hosted offsite with web hosting firms, the real-time information on the corporate websites and B2B sites is provided by back-end systems based in the corporate data center or DMZ. Without a good quality Internet connection, these vital links would be severed.
Varied requirements and complexity
The requirement for multihoming is varied and could range from the simple need for geographic link diversity (single ISP) to full link and ISP resilience where separate links are run from separate data centers to different ISP’s. While the complexity varies for each option, the latter forms the most complex deployment option but affords the highest availability. The former provides some degree of protection but does require a higher grade of ISP.
A major component of the complexity comes in around IP addressing. The Internet IP addressing system works because each ISP applies for a range of addresses from the central Internet registrar in their region. They would then allocate a range of IP addresses, called an address space, to their customers from this pool. Of course, it goes without saying that no two ISP’s can issue the same address space to a customer.
Why would this be a problem? Simply put, it’s all about routing. Routing is the process whereby the Internet finds out how to get traffic to your particular server. It’s a bit like the Google map for the Internet. For somebody to find your server, a “route” or path needs to exist to the IP address of your server. Since you are getting your Internet service, and hence your IP address space, from your ISP, they are responsible for publishing the route to your server across the entire Internet. They are effectively the source of your route, and nobody else can do that for your particular address space. You can see how things can go wrong if the ISP suffers some form of internal failure. If your particular route disappeared, your server would vanish from the Internet, even if your Internet link was up and running. This is precisely the kind of issue multihoming tries to solve, but we will start at the more simple options and work our way up for completeness.
Single Link, Single ISP, Multiple address spaces
While not a multihoming solution in the strictest sense of the term, the single link, multiple address option can be useful for small sites. In this scenario, the publicly accessible host is assigned two IP addresses from two different address spaces. You would, of course, need two address spaces from your ISP for this to work. Thus, theoretically, if a routing issue impacts one of the address spaces, the other may still be available. However, the single physical ISP link is, of course, a single point of failure, and this option would seem to offer little in the form of real resilience.
Multiple links, Single ISP, Single address spaces per link
This scenario, generally called multi-attached, is a variation on the above. The site now connects through multiple links, each with a different IP address space, but still via a single ISP. If one of the links fails, its IP addresses will become unreachable. However, the other IP address on the remaining link will still be available, and your server would still be reachable. Internet Service Providers use a control protocol to manage their IP routes called Border Gateway Protocol or BGP. This protocol is used to manage the traffic re-routing over the live link. BGP can be complex and demands a lot from the equipment it runs on. Of course, with complexity comes a cost; however, the BGP deployment for this scenario is not as onerous as with a fully multihomed site and should not attract too much attention from the CFO. While the deployment is a simpler version of full multihoming, it does restrict the corporate to a single ISP, which may not be part of the business’s strategic intent.
Multiple Links, Multiple ISP, Single address space
This scenario is what is generally meant when discussing multihoming. The BGP protocol is used to manage the visibility of the single address space across the multiple links and ISP’s and, thus, maintain the routes. In addition, the BGP protocol communicates between the corporate routers and those of the two ISP’s with the protocol being able to detect a link failure and divert traffic to the functioning link even if this is via a different ISP network.
What’s the catch?
There is always a catch, and in this case, there are actually a number of them. To run true dual ISP multihoming and BGP as a corporate, you would need your own Provider Independent (PI) IP address space, and you would need to apply for a unique BGP Autonomous System Number (ASN). The AS Number is used to identify your site as a valid Internet location in the eyes of BGP. While applying for an ASN is not an onerous undertaking, it does place some significant responsibility squarely with you instead of the ISP. Deploying BGP effectively brings your organization one step closer to the Internet by making you responsible for advertising your own public IP address spaces and, thus, your routes. It also means that any operational mistakes you make will spectacularly ripple through the entire Internet.
Address space considerations
Most large organizations that operate true multihoming already have their own Provider Independent address space. This is an address space that they requested directly from the local Internet registrar themselves some time ago before IP version 4 (IPv4) addresses started running out. Today it is virtually impossible to be allocated a PI address space from the IPv4 pool. It is possible to run a multihomed scenario by using ISP-provided IP address spaces. Still, the network configurations become considerably more complex and, at some point, start defeating the end goal of increasing resilience. In the real world, increased complexity seldom equates to improved resilience.
A true BGP-enabled multihoming deployment (often known as running default less) will require hardware capable of storing IP routing tables of Internet-scale. This is desirable as it protects the organization from an internal ISP failure. However, it requires the routers on-site to be of a “carrier-grade,” in other words, big and beefy. The Internet routing tables are massive, and the vast amount of processing power and memory will be required to run defaults. It is possible to run in a reduced route mode where only local prefixes are stored on the routers. Still, given the effort and expense of deploying a full multihomed solution, compromise should not really be part of the conversation.
While there are definite advantages to full multihoming, there are also some significant caveats. Complexity and scaling aside, the real reasons and costs for considering multihoming should be carefully considered.
That said, there is no better way to ensure high availability and performance for a highly Internet-dependent organization than through true multihoming. A Provider Independent IP address space will be required and carrier-grade routers configured in a geographically diverse manner and supported by suitably qualified support staff. The benefits can outweigh the costs when direct revenue is generated through the Internet, and multihoming can be seen as a strategic business initiative and something that can help CIO’s sleep well at night.