Home > The Complexity Conundrum

The Complexity Conundrum

The person I mentioned in my last two blog posts was back, and I didn’t even bother asking them why. Honestly, I wasn’t sure whether they were here for the technical descriptions or the creative analogies, but I gave the next one my best shot.   The thing about the Borg is that, when you deal with one of them, you are dealing with all of them. But at the same time, you are only dealing with the one in front of you at that moment (and God help you if there's more than one). On top of that, they're polymorphic: they can adapt to the circumstances and environment around them. On top of that, if you ARE unlucky enough to be facing more than one, they are additive in quality. Meaning that each individual Borg complements and supplements the others, which is why they are unbeatable when faced in large numbers. Finally, despite the collective aspect of their essential nature, the Borg has robust segmentation, such that an action which has a detrimental effect on one will not spread to the rest.   Data center switching fabric devices (such as the Cisco Nexus®) are essentially a Borg cube.   First, they are massive pieces of equipment, big enough to take up anywhere from one-third to almost an entire rack depending on the model, modules, and number of ports you invest in. (Coincidentally, you also buy them in cube-like forms. But I might be reading too much into that. Then again, they also seem to cost as much as a Borg cube might.) Second, the more you combine them, the more powerful they become. Third, they can be segmented, such that even while they are physically a single unit, they can have separate "contexts" that make them act as discrete devices that still share the common strengths of the whole (things like power, administrative services, or supervisor modules). And finally, these devices are technically advanced enough to assimilate other functions, such as firewall, IDS, and IPS features found in the Cisco® ACS, while still not compromising its main role. Like I said, it’s a Borg cube, and your larger organizations may well be assimilated by a fleet of them.   Setting the Star Trek® fanboy references aside, there are two key features that are important to understand if you want to grasp how a Nexus differs from other network gear.   Virtual Device Contexts Much like "container" technology in the server space, a VDC uses the essential foundation of the switch itself—the base operating system, the hardware, and the administrative and supervisor services—and spins off a virtual switch that mimics the host in all ways, but operates within a completely separate, secure space—a "context." Ports on the switch can be assigned to these contexts for security, redundancy, performance, etc.   Virtual Port Channel Network engineers have long been familiar with the idea of channel bonding: combining multiple ports so that they operate as a single unit to increase bandwidth and stability. But channel bonding occurs on a single device. vPCs can take two (or more) ports from two (or more) Nexus devices, and present them as a single circuit for even greater performance, stability, and security.   Ironically, this glorious techno-god of the data center appears to most monitoring tools as a humble router. A router with a metric butt-ton of ports hooked up to it, but still a router. You even get routing information, if your monitoring tool collects and displays that.   The problem, as I’ve explained in past posts, rests in what you are NOT seeing.
  • There are typically too many interfaces (most of which you actually care about, unlike on a regular switch where the majority of ports don’t need to be monitored) to manage effectively. Also, collecting statistics on this many interfaces tends to overwhelm your middle-of-the-road monitoring solutions
  • Each device context appears as its own separate box, so there’s no on-screen indication that it's actually part of one super-machine
  • There's no information about how ports are bonded into vPCs
  I want to take a moment and focus on WHY this is a problem, just to be clear about how important this is. Imagine that you have an environment with two Nexus systems. On each device you've grouped ports 1, 2, 5, 8, and 10 into a channel bonded set, and then combined them across both devices to create vPC 100. This vPC serves the third floor of your building.   One day, you begin getting calls from the help desk, saying that folks on the third floor are complaining about errors, slowness, and "problems with the network."   First, you have to remember that you have a vPC serving the third floor. Otherwise you may be troubleshooting the completely wrong issue. Second, you have to know which ports participate in that vPC. While I've made it easy for our example, the truth is that you could use any mix of ports on either Nexus: 1, 3, 5, 7 on NexusA and 2, 4, 6, 8 on NexusB.   Knowing just those two basic things—let alone digging into each of those interfaces to see if one (or more) is down, throwing errors, or having some other problem—is time consuming even if you have the world's greatest documentation or memory.   What you need is A LOT of visualization. A tool that will automatically detect those relationships and display them along with their status and performance so that you can quickly identify the source and nature of the problem; and better yet, be notified proactively when an issue occurs.   And that just scratches the surface of the problems of what you’re not seeing. Remember that each port is “significant”—it matters to your network health as an individual port as well as (possibly) part of an aggregated channel. Each port can have its own VLAN or access control rules. This presents three problems:  
  • Having that information at all. Just collecting and maintaining it on a port-by-port basis is a challenge.
  • Having that information visible when the port in question presents a problem.
  • But because ports can be combined, if the VLANs or ACLs of one port don’t match those of the others, you get bizarre behaviors that are maddeningly difficult to track down, understand, and resolve.
  As IT professionals, we intuitively learn to embrace complexity and even revel in it. But there comes a time when that complexity, and the power it presents us, can end up being the source of our undoing.
Leon Adato
Leon Adato is a Head Geek™ and technical evangelist at SolarWinds, and is a Cisco® Certified Network Associate (CCNA), MCSE and SolarWinds Certified Professional (he…
Read more