SHARE

Linux File Systems, Revisited

Written By

Jul 20, 2010

ServerWatch content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

My article three weeks ago on Linux file systems set off a firestorm unlike any other I’ve written in the decade I’ve been writing on storage and technology issues.

Contrary to popular opinion, Linux file systems will require changes to handle the 100TB environments that will become commonplace in the not-too-distant future.

Unsure About an Acronym or Term?
Search the ServerWatch Glossary

My intentions were to relate my experience as a high-performance computing (HPC) storage consultant and my knowledge of file systems and operating systems to advise readers on the best course of action. This is no different from the approach I take in all my articles. I spend most of my time reviewing storage technology issues for my customers. The installations I work with generally start with 500 terabytes of storage and go up from there. I have one site that I work with that currently has more than 12 petabytes, and many planning for 60PB by 2010.

There is a big difference in my world between the computation environments and the large storage environments. In the HPC computational environments I work with I often see large clusters (yes, Linux clusters). Of the many hundreds of thousands of nodes that I am aware of, however, no one is using a large — by large, I mean 100TB or greater — single instantiation of a Linux file system. I have not even seen a 50TB Linux file system. That does not mean that they don’t exist, but I have not seen them, nor have I heard of any.

Related Articles

» Are Linux File Systems Right for You?
» Storage Virtualization Plays Catch Up

» Virtually Speaking: Unclogging the I/O Pipes

Other Issues

Aside from the emotional responses and personal attacks (sorry folks, I’m an independent consultant and not paid by Microsoft or any other vendor, and my opinion is my own), a number of readers raised some good points.

One wondered about Google’s use of large file systems. My response is that each of Google’s file systems on each of the blade nodes is pretty small and the aggregation of the file systems is done by an application. Also, Google’s file system is not part of the standard Linux release.

A number of readers noted that I didn’t delve into the details of the extents in ext-3/4 and XFS. For more on the issue, see Choosing a File System or File Manager.

One reader wondered whether users with petabyte storage requirements would use a block device file system rather than a networked hot-add file system (I would think resizing would become quite a nightmare, not to mention a forced fsck) or whether they would run a stock Linux file system or do it without investing some time and money into some heavy tweaking. Additionally, most NAS file systems do not scale to petabytes.

My response is that I know people who need the performance for an SMP for file systems of this size today. Breaking the file system up using blades and over a network increases the overhead of management and therefore the cost. NAS performance doesn’t cut it for these people doing streaming I/O for large archives, almost always with HSM-based file systems. The reader basically agrees with my point that Linux file systems must dramatically improve fsck performance, and as for the last point, yes, these people are investing heavily in performance resources.

A few readers pointed out that I failed to mention other factors besides the file system, such as device drivers, the hardware platform and the application access patterns and what other applications were running. This is a fair comment, but my response is I was just trying to address the file system issues in Linux, not critique the whole data path.

A Call to Action

These are the opinions and analysis of one storage consultant, based on what I have seen in real-world environments at very large sites. My advice is that Linux file systems are probably okay in the tens of terabytes, but don’t try to do hundreds of terabytes or more. And given the rapid growth of data everywhere, scaling issues with Linux file systems will likely move further and further down market over time.

If you disagree, try it yourself. Go mkfs a 500TB ext-3/4 or other Linux file system, fill it up with multiple streams of data, add/remove files for a few months with, say, 20 GB/sec of bandwidth from a single large SMP server and crash the system and fsck it and tell me how long it takes. Does the I/O performance stay consistent during that few months of adding and removing files? Does the file system perform well with 1 million files in a single directory and 100 million files in the file system?

My guess is the exercise would prove my point: Linux file systems have scaling issues that must be addressed before 100TB environments become commonplace. Addressing them now without rancor just might make Linux everything its proponents have hoped for.

Henry Newman is a regular contributor to Enterprise Storage Forum, where this story originally appeared. Newman is an industry consultant with 27 years experience in high-performance computing and storage.

Henry Newman

Henry Newman is a ServerWatch contributor.

Linux File Systems, Revisited

Other Issues

A Call to Action

Henry Newman

Company

Categories