Growing Beyond Your First Storage System
June 20, 2018
Monitoring and Observability
At some point, your first storage system will be “full." I’m writing it as “full” because the system might not actually be 100% occupied with data at that exact point in time. The system could be full for another technical reason. For example, shared components in a system (e.g. CPUs) are overloaded before you ever install the maximum amount of drives, and upgrading those would be too expensive. Or it could be an administrative decision that has made you decide to not hand out new capacity from an existing system. For example, you’re expecting a rapid organic growth of several thin provisioned volumes, which would soon fully utilize the capacity headroom of the current system.
The fact that a single system has reached the maximum capacity, either for technical or administrative purposes, does not mean you need to turn away customers. IT should be a facilitator to the business. If the business needs to store additional data, there’s often a good reason for it. In health care it could be storing medical images. For a service/cloud provider, hosting more (paying) customers. So instead of communicating “Sorry, we’re full, go somewhere else!” we should say something in the direction of “Yes, we can store your data, but it’s going to land on a different system.” In fact, just store the data and leave out the system part!
MORE OF THE SAME?
When your first system is full and you’re buying another one, you could buy a similar system and install it next to the original one. It might be a bit faster, or a bit more tuned for capacity. Or completely identical, if you were happy with the previous one.
On the other hand, this might also be a good moment to differentiate between the types of data in your company. For example, if you’ve started out with a block storage, maybe this is the time to buy a NAS and offload some of the file data to it.
Regardless of type, introducing a second system will create a couple of challenges for the IT department. First, you’ll now have to decide which system you want to land new data on. With identical type systems, it might be a fill and spill principle where you fill up the first system and then move over to the second box.
Once you introduce different types and speeds of systems though, you need to differentiate between types of storage and the capabilities of systems. Some data might be better suited to land on a NAS, other data on a spinning disk SAN, and another flavor of data on an all-flash SAN array. And you need to keep track which clients/devices are attached to which systems, so documentation and a clear naming convention is paramount.
KEEPING IT RUNNING
Then there’s the challenge of keeping all the storage systems running. You can probably monitor a handful of systems with the in-box GUI, but that doesn’t scale well. At some point, you need to add at least central monitoring software, to group all the alerts, and activities in a single-user interface. Even better would be central management, so you don’t have to go back to the individual boxes to allocate LUNs and shares.
With an increasing number of storage systems comes an increasing number of attached servers and clients. Ensuring that all clients, interconnects, and systems are on the right patch levels is a vertical task across all these layers. You should look at the full stack to ensure you don’t break anything by patching it to a newer level.
If you glue too many systems together, you’ll end up with a spaghetti of shared systems that make patch management difficult, if not impossible. Some clients will be running old software that prevents other layers (like the SAN or storage array) from being patched to the newest levels. Other attached clients might rely on these newer codes because they run a newer hypervisor. You’ll quickly end up with a very long string of upgrades that need to be performed, before you’re fully up to date and compliant. So, it’s probably best to create building blocks of some sort.