The Spectre and Meltdown vulnerabilities that affect microprocessors made by Intel, AMD and ARM are a huge deal for cloud service providers and their customers.
To put it bluntly, they are a huge deal for two key reasons. One is that the vulnerabilities could be exploited to steal sensitive data, and the other is because fixing the vulnerabilities will result in a reduction in the computing performance of the virtualized infrastructure that customers are paying for.
To address the first of these, the cloud computing giants have been working on the infrastructure that powers Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure to make it secure.
For Amazon’s part, it says that all instances across the Amazon EC2 fleet have now been fixed, although new microcode supplied by Intel for its processors is causing instance and application crashes on occasion. To prevent this, Amazon is disabling some of the microcode and waiting for more Intel updates.
Google has also given an update on its Google Cloud Platform, stating that it has already been updated to prevent all known vulnerabilities. By using its VM Live Migration technology, Google was able to perform the updates with no forced maintenance windows or restarts.
What about Microsoft? Earlier in January, the company announced that “the majority of Azure infrastructure has already been updated to address this vulnerability. Some aspects of Azure are still being updated and require a reboot of customer VMs for the security update to take effect.”
That seems to cover the cloud infrastructure, but that doesn’t do anything to protect customers that are virtual machines running in the cloud. “Customers who use their own operating systems with GCP services may need to apply additional updates to their images,” Google warns. The same is doubtless true for Azure and AWS customers.
So Spectre and Meltdown have been extremely inconvenient for service providers, and they will continue to be the cause of a great deal of operating system patching for some time yet. But there’s no evidence that any hackers have managed to exploit Spectre or Meltdown so far, so from a security standpoint the two vulnerabilities may not have been as catastrophic as some people initially feared.
What About the Performance Impact from Spectre and Meltdown Mitigations?
But what about the second point — performance slowdowns? It’s too early to provide definitive answers about the extent of the performance degradation of virtual machines running in the public cloud thanks to the Spectre and Meltdown mitigations that have been put in place.
But the early evidence is that the degradation can be very significant. Red Hat has measured the patch performance impact as ranging from 1% to 20%, while discussions on the Lustre distributed file system mailing list show degradation of between 10% and 45% for certain IO-intensive applications.
And Epic Games, a company that operates off AWS public cloud servers, published a graph of CPU utilization that appears to show one of its back-end services was demanding about 15% cpu utilization before the Meltdown and Spectre security patches were applied, and about 45% after — an increase in cloud resource utilization of some 300%.
For public cloud customers, then, the impact of patching these two security flaws is that the cloud resources they pay for are suddenly significantly less effective. Any arguments that the security patches are important and therefore the performance degradation is inevitable will likely butter no parsnips with customers — why should they pay the same amount and receive less cloud computing grunt?
What will likely happen is that public cloud providers will have to offer some sort of deal — perhaps discounted pricing or service credits — to compensate customers for the performance degradation they may experience.
Of course, future updates to the mitigations may lessen the performance impact. And as public cloud service providers replace their hardware with newer equipment that is not affected by the two bugs, the whole performance problem will likely go away.
But in the meantime, the whole affair underlines the fact that when you take advantage of virtualization technology to move your computing resources to the public cloud, the whole shebang becomes someone else’s problem. And that means if CPU performance suddenly takes a hit due to some unexpected bug, it’s the cloud provider, not you, that has to bear the costs.
Paul Rubens is a technology journalist and contributor to ServerWatch, EnterpriseNetworkingPlanet and EnterpriseMobileToday. He has also covered technology for international newspapers and magazines including The Economist and The Financial Times since 1991.