I read this article, “Big Blue boffins scan 10 billion files in Flash in a flash” on the Register today, and said to myself how can this be? Has someone finally written a file system optimized for the underlying storage topology for RAID volumes, LUN layouts, JBODs and file system metadata that also addresses end-to-end data integrity and POSIX limitations for metadata and data-locking for metadata and files that are open from multiple nodes being written? Well, we all know that this is impossible and it is not really fair of my to pick on IBM or any other file system for that matter. There are so many problems with the I/O stack that need addressing and which are basically impossible to address today.

The file system cannot determine the layout of the underlying storage given the limited data that can be obtained from the SATA or SCSI stack. This problem needs to be addressed by the ANSI T10 and ANSI T13 groups. Until more details are passed up to the host, you cannot blame the file systems. POSIX is controlled by the OpenGroup, and if you want changes to the POSIX specification expect years of fighting only to lose. A group of U.S. government people and interested parties tried to add a few extensions for large shared file systems, and we got nowhere with the OpenGroup after years of trying. I feel sorry for file system vendors. Everyone I talk with says we require POSIX functionality, but POSIX is limiting scaling. One other thing, the ANSI T10 has a specification that would support a significant portion of the end to end integrity, but it is not support for SATA drives. As I said I feel sorry for file system vendors, as what they can do is limited today.