Distributed Directories
Traditionally in PVFS, all entries for a given directory are stored on the server holding the metadata for the directory itself. The distributed directories project will allow directory entries for a given directory to be spread across multiple servers. This will allow very large numbers of files to be efficiently handled in a directory as multiple tasks may access different parts of a directory in parallel.
With distributed directories, OrangeFS allocates multiple dspace handles spread among the various servers for storing directory entries in much the same way data is spread among servers for regular files. Which server holds a particular directory entry is determined by applying a hash function to the name of the entry, similar to the approach used in GPFS and GIGA+ to distribute directories. The list of dspace handles is returned by getattr so that a client may use these handles to access directory entry information.
In the initial implementation, directory entries are spread across all available metadata servers. Future plans include mechanisms to dynamically expand and contract the number of servers holding a directory based on the number of entries that are present. A system administrator will be able to control system-wide settings that indicate the number of servers among which entries are distributed. A user will be able to control settings that indicate the number of servers among which entries are distributed for his own directories.
Another limitation of the initial implementation is that the root directory is not distributed. As with earlier releases of PVFS, all root directory entries are stored on the server holding the metadata for the root directory.
Future research will evaluate advanced strategies for retrieving entries from multiple servers.
