Still No Discussion of Parallel I/O

By Henry Newman

More and more applications running across multiple nodes are doing I/O to the same files. Of course, reading is not the problem. Many readers can open a file at the same time and not have any issues. The problem is when multiple threads are writing or a thread is writing and others are reading. Up until now, with the exception of MPI-IO, there has been no effort put into defining method for doing parallel I/O.

Although I like MPI-IO and the structure it is tied to -- MPI and the types of operations that are done with MPI. Other work in this area includes the I/O forwarding layer, IOFSL. The problem is the more I look around, the more I see being done whether it be databases, search engines, decision support, or one of a myriad applications that require multiple nodes doing more than reading global files.

The leading parallel file systems have attempted to address the problem, but the file system is abstracted from much of the knowledge about how the file is to be used. Potential hints associated with the file access can really be known only by the application. If the POSIX interface is not going to get changed, which I really doubt, and everyone still wants applications to support a POSIX interface, which is what I hear, then the only answer to the problem is a common library between the system call, and the operating system and the file system. Although IOFSL is really designed for HPC problems, I am not sure it will take much to allow it to efficiently support other applications, such as databases and search engines. I have given up on getting POSIX updated, and HDFS has given on POSIX, but that will not work for everything. It is time to rethink how the industry solves this problem.

This article was originally published on January 31, 2012