Resiliency
4 TopicsBuilding Highly Resilient Networks in Network Edge - Part 1 - Architecture
This is first part of a series of posts highlighting the best practices for customers who desire highly resilient networks in Network Edge. This entry will focus on the foundational architecture of Plane A and Plane B in Network Edge and how these building blocks should be utilized for resiliency. For more information on Network Edge and device plane resiliency please refer to the Equinix Docs page here. Dual Plane Architecture Network Edge is built upon astandard data center redundancy architecture with multiple pods that have dedicated power supplies and a dedicated Top of Rack (ToR) switch. It consists of multiple compute planes commonly referred to as Plane A and Plane B for design simplicity Plane A connects to the Primary Fabric network and Plane B connects to the Secondary Fabric network The most important concept for understanding Network Edge resiliency: the device plane determines which Fabric switch is used for device connections. Future posts will dive much deeper in the various ways that Network Edge devices connect to other devices, Clouds and co-location. While referred to as Plane A and Plane B, in reality there are multiple compute planes in each NE pod The actual number varies based on the size of the metro This allows devices to be deployed in a manner where they are not co-mingled on the same compute plane, eliminating compute as a single point of failure Network Edge offers multiple resiliency options that can be summarized as device options and connection options Device options are local and provide resiliency against failures at the compute and device level in the local metro This is analogous to what the industry refers to as ”High Availability (HA)” Connection resiliency is a separate option for customers that require additional resiliency with device connections (DLGs, VCs and EVP-LAN networks). This will be discussed in depth in separate sections. It is common to combine both local resiliency and connection resiliency, but it is not required—ultimately it depends on the customer’s requirements Geo-redundancy is an architecture that expands on local resiliency by utilizing multiple metros to eliminate issues that may affect NE at the metro level (not covered in this presentation) Single Devices Single devices have no resiliency for compute and device failures The first single device is always connected to the Primary Compute Plane * Single devices always make connections over the Primary Fabricnetwork Single devices can convert to Redundant Devices via the anti-affinity feature Anti-Affinity Deployment Option By default single devices have no resiliency However, single devices can be placed in divergent compute planes. This is commonly called anti-affinity and is part of the device creation workflow Checking the "SelectDiverse From" box allows customers to add new devices that are resilient to each other Customers can verify this by viewing their NE Admin Dashboard and sorting their devices by "Plane" This feature allows customer to convert a single device install to Redundant Devices By default, the first single device was deployed on the Primary Fabric The actual compute plane is irrelevant until the 2nd device is provisioned The 2nd device is deployed on the Secondary Fabric AND in a different compute plane than the first device Resilient Device Options These options provide local (intra-metro) resiliency to protect against hardware or power failures in the local Network Edge pod By default, the two virtual devices are deployed in separate compute planes (A and B) In reality there are more than two compute planes, but they are distinct from each other The primary device is connected to thePrimary Fabric network and thesecondary/passive device is connected tothe Secondary Fabric network Redundant Devices Clustered Devices Deployment Two devices, both Active, appearing as two devices in the Network Edge portal. Both devices have all interfaces forwarding Two devices, only one is ever Active. The Passive (non-Active) device data plane is not forwarding WAN Management Both devices get a unique L3 address that is active for WAN management Each node gets a unique L3 address for WAN management as well as a Cluster address that connects to the active node (either 0 or 1) Device Linking Groups None are created at device inception Two are created by default to share config synchronization and failover communication Fabric Virtual Connections Connections can be built to one or both devices Single connections are built to a special VNI that connects to the Active Cluster node only. Customer can create optional, additional secondary connection(s) Supports Geo-Redundancy ? Yes, Redundant devices can be deployed in different metros No, Clustered devices can only be deployed in the same metro Vendor Support All Vendors Fortinet, Juniper, NGINX and Palo Alto The next post will cover the best practices for creating resilient device connections with Device Link Groups and can be found here.3.7KViews5likes2CommentsBuilding Highly Resilient Networks in Network Edge - Part 3 - EVP-LANs
This is the third part of a series of posts highlighting the best practices for customers who require highly resilient networks in Network Edge. This entry highlights how to build resilient EVP-LAN connections in Network Edge. EVP-LAN is another method to connection Network Edge devices EVP-LANs differ from DLGs in that they connect to Network Edge devicesandFabric ports All EVP-LAN connections go to the Fabric, even for devices in the same metro--this is in contrast to DLGs which can be local or remote. Asof November 2023, multiple NE devices in the same metro can be part of the same EVP-LAN network, removing the previous restriction of a single NE device per metro. Customers that require maximum resiliency should deploy additional EVP-LANs that span boththe Primary and Secondary Fabric networks The same logic applies for EVP-LANs such that they should spread across Primary and Secondary Fabric planes The current maximum bandwidth for all EVP-LAN connections in the same metro is 10GB. This may change in the future. Redundant Devices For proper resiliency, each device in theRedundant Device pair will require a singleconnection to two different EVP-Networks The Primary device on the Primary planewill use the Primary Fabric switch The Secondary device on the Secondaryplane will use the Secondary Fabric switch Clustered Devices Clustered devices are different in that theworkflow allows connections to be built toeither the Primary or Secondary Fabric For proper resiliency, each node in theCluster will require a singleconnection to two different EVP-Networks The next post will cover the best practices for creating resilient Fabric Virtual Connections. You can read the first part of this serieshere.1.9KViews3likes0CommentsBuilding Highly Resilient Networks in Network Edge - Part 2 - Device Link Groups
This is the second part of a series of posts highlighting the best practices for customers who require highly resilient networks in Network Edge. This entry highlights how to build resilient Device Link Group (DLG) connections. For more information on Device Link Resiliency please refer to the Equinix docs page here. Device Link Groups are used to connect Network Edge device interfaces to other NetworkEdge device interfaces They can be local (same metro) or remote and are a shared broadcast domain A single DLG, like a single Ethernet cable, provides no resiliency Local DLGs do not traverse the Equinix Fabric and aren’t susceptible to Fabric switch outages Hybrid DLGs have some devices connected to the Primary Fabric plane and other to theSecondary Customers that require maximum resiliency should deploy additional DLGs that connect toboth the Primary and Secondary Fabric networks Redundant Devices can have one or both devices connect to the DLG Clustered devices act as one logical device, therefore the cluster is represented as a singleentity Each device will build connections to the Plane to which it is connected as shown here: Redundant Devicesshow asindividualentities in the DLG workflow Clustered Devicesare represented as a single logical entity, connecting the DLG to the activenode (node 0 is active by default) In October 2023 a new feature was released that allows customers to choose whichFabricplane the DLG is connected to: Primary or Secondary Previously, all connections were made over the Primary Fabric plane by default Customers now have control over the level of resiliency for their DLGs. Single DLGs will haveno resiliency but customers may opt to deploy a 2nd, parallel DLG for connections that arebusiness-critical. The 2nd DLG will incur additional Fabric charges and will consume an additional interface onevery connected Network Edge device We do not see value in creating multiple, local (same metro) DLGs for resiliency, but we do notrestrict customers who want to deploy in this manner Local DLGs do not traverse the Fabric switches and incur no charges today Customers now have the control to deployresiliency with DLGs based on theirrequirements and budget as shown below The next post will cover the best practices for creating resilient EVP-LAN connections. You can read the first part of this serieshere.4.7KViews2likes3CommentsBuilding Highly Resilient Networks in Network Edge - Part 4 - Virtual Connections
This is the fourth and final part of a series of posts highlighting the best practices for customers who require highly resilient networks in Network Edge. This entry highlights how to build resilient Fabric Virtual Connections. Virtual connections allow NE devices to connect to CSPs, NSPs and ISPs as well as co-location The same logic still applies: the NE device plane determines which Fabric switch is used The VC connection workflow is very similar to the EVP-LAN workflow with some differences Customers that require VC link resiliency should create two or more VCs to their NE deviceswith each connection traversing the Primary and Secondary fabric switch Some CSPs like Azure require resiliency but most do not The Fabric portal allows connections to be created as redundant or single-legged The workflow is highlighted in the next few slides for both Redundant and Clustered devices Redundant Devices Clustered Devices This concludes a four part series on buildingHighly ResilientNetworks in Equinix Network Edge. You can read the first part of this serieshere. For more helpful resources please visit: Architecting for Resiliency https://docs.equinix.com/en-us/Content/Interconnection/NE/deploy-guide/NE-architecting-resiliency.htm NE Geo-Redundancy Optionshttps://docs.equinix.com/en-us/Content/Interconnection/NE/deploy-guide/NE-geo-redund-options.htm NE Device Link Resiliencyhttps://docs.equinix.com/en-us/Content/Interconnection/NE/deploy-guide/NE-device-link-resiliency.htm1.8KViews2likes0Comments