Strategic Storage Budgeting
With the price per gigabyte of storage coming down rapidly, that line item is no longer the overriding consideration for most storage budgets. While that is some relief for storage users, in other ways it creates a new problem: how long should you wait for storage to get faster and cheaper before you buy? Before running the numbers, make sure your requirements are being met.
Add to that the complexity of upgrading to new technologies 2 Gbps vs. 4 Gbps Fibre Channel, for example, or SAS vs. SATA, SCSI or Fibre Channel and you're confronted with an array of planning and budgeting issues when it comes time to upgrade or replace storage architecture.
Budgeting for storage is not just about buying more density or the latest cool stuff; it is about determining your needs based on available technology, and making sure those requirements are met. This article will explore some of the most important issues to consider when budgeting for storage.
- How will a new technology integrate into the current environment?
- Will this technology meet user requirements for performance and reliability?
- How does this new technology affect O&M (operation and maintenance) costs?
Integration of technology into the current environment is a large problem for several reasons. Let's take a real-world example. A company has servers from one vendor and storage from another. The storage vendor can provide a new storage infrastructure that will support 4 GB Fibre Channel RAID controllers, 4 GB Fibre Channel switches, and other storage components. That all sounds great, but the can the server side support the 4 GB architecture?
This is a big question that should be asked of every hardware vendor. A standard PCI bus running at full rate supports 536 MB/sec, but many PCI buses do not support this full rate, and even though the situation is better, the same is also true for a PCI-X bus running at approximately 1.1 GB/sec (twice the PCI rate). A two-port 2 GB host bus adapter (HBA) can require up to 800 MB/sec (200 MB/sec for each port reading and 200 MB/sec for each port writing). Therefore, a standard PCI bus cannot support 2-port HBAs running at 2 GB/sec, which would be the same as one port at 4 GB/sec.
From a failover point of view, having two ports with 2 GB provides greater redundancy if an HBA port fails, which is more common than both ports failing. This assumes that you have an HBA failure and not a PCI bus failure. In the case of PCI-X, a 2-port 4 GB HBA far exceeds the PCI-X bus bandwidth, (1.1 GB/sec for PCI-X, and two ports of a 4 GB HBA require 1.6 GB/sec for full rate), so performance is far closer to that of two ports of a 2 GB HBA.
All of these performance numbers assume that the I/O being done is streaming I/O. If it isn't, then why even consider 4 GB HBAs and infrastructure in the first place? Yes, you can get improved IOPS performance with 4 GB HBAs from a larger command queue, but the performance improvement is not that great (rarely exceeding 20 percent) and is often workload-dependent. This improved performance is surely not a justification to run out and buy a 4 GB infrastructure.
The bottom line is that any site considering 4 GB technology must make sure servers can support this new performance level. More often than not, large servers lag in bus technology, given the large lead time it takes to design the complex memory interconnects to the bus and the availability of new bus technology. You can buy PCI-Express bus technology from Dell on one, two, and four CPU systems, but it is hard to find such technology on large (greater than 16) multi-CPU servers today.
User requirements should be a major driver of technology upgrades. Many organizations do not have a good handle on what the user application profiles look like, what the growth requirements are, and worst of all, whether the system is configured and tuned for those application profiles. This lack of understanding of the environment can lead to poor decisions on what hardware and software is needed.
One system I recently reviewed did not have an emulation or characterization of its workload. This is especially important for large sites. Without this information, how could this large site test patches for performance degradation (yes, it happens all too often), test new technology to measure performance improvements, or test increases in workloads to see if the system can handle them?
User applications and requirements should be a large component in any decision to upgrade technology. If you do not know what users are doing with the system, how do you know what they need today, let alone plan for the future? This situation often turns into a fire drill when the system is overloaded, and management starts throwing money at the problem instead of executing a master plan for technology infrastructure upgrades.
From what I have seen in my 25 years in the business, technology maintenance costs almost always follow the same pattern:
- The cost of O&M for new technology is high for early adopters.
- In the subsequent 6 to 18 months, the cost drops as the technology is more widely adopted.
- The cost continues to drop, and drops sharply when a technology replacement is released, until ...
- The cost skyrockets as the vendor tries to phase out the technology. This value is far greater than the original cost of maintenance, sometimes by as much as five times greater, since the vendor no longer wants to support the technology.
This is the general life cycle for O&M costs. It makes sense given vendor costs, and unless technology trends change, the pattern is likely to continue.
One other area that should be considered is the personnel cost to the organization of supporting old hardware and software. You're not likely to find a new hire who knows how to work on Fibre Channel arbitrated loop HBAs, RAIDs and switches, and finding training course for that hardware isn't an easy task either. Just recall the frantic search for mainframe COBOL programmers for Y2K a clear example of personnel operations costs becoming unreasonable.
We have not talked much about budgeting for storage, but the issues addressed here are the ones that drive the high cost of storage changes. Most sites know what their physical storage growth will be, or at least what the budget will allow them for physical storage growth. The major cost items are not adding a few trays of disks with 146 GB drives or swapping out 36 GB drives for 300 GB drives; the major cost drivers are the infrastructure. The real question is how do you determine what you need; how much is it going to cost; and how does it fit with the current environment.
One pitfall is that organizations think they can just jump into new technology without fully understanding the whole data path (the path from the application to the operating system to the HBA/NIC to the storage devices). Plugging 4 GB HBAs in current servers into a 2 GB storage infrastructure does not generally improve performance unless you are aggregating the performance of multiple RAID controllers and multiple hosts. The science (some call this an art, but it is really based on scientific analysis and study of the data path) of determining what users need and when they will need it is the process of budgeting for storage.
You need a full understanding of:
- Your current environment, including the performance level that environment can support today and the performance level that environment can support given technology trends
- User requirements for performance and growth, including the current workload and the trend line for growth (performance mapped to expected new technology)
- Your current and future O&M costs. Don't wait until the maintenance contract ends to find out that the cost has sky-rocketed technology maintenance costs follow a pattern
This article was originally published on Enterprise Storage Forum.