With the popularity of SAS Grid Manager, this question often comes up: which clustered or shared file system should we use with the multiple nodes of the SAS Grid? This is a question that needs to be thought through very carefully because the amount of time and effort to fix an incorrect or poor performing clustered file system is very significant with today’s large data implementations of SAS.
The SAS Global Forum 2013 paper A Survey of Shared File Systems describes which characteristics determine whether the clustered file system is a good fit for your environment and how to choose the one that meets your needs. According to the authors, the important characteristics of clustered or shared file systems with respect to SAS performance include:
- whether the file system data is retained in memory in a local file cache
- its handling of file system metadata
- implications for the physical resources
This paper also covers the pros and cons of several clustered file systems that we have tested here at SAS. Please note that although the Veritas clustered file system (VxFS) is not discussed in the paper, this file system works very nicely with SAS Grid.
Please note the information regarding the common clustered file systems for Windows, as well as recent recommendations for the GFS2 file system from Red Hat. We have encountered some performance issues with these file systems.
There are other white papers listed in the SAS Usage Note: A list of papers useful for troubleshooting system performance problems that go into more details on how to configure clustered file systems like IBM’s GPFS. These papers are written based on the work we have done with IBM and several very large SAS customers. You may want to bookmark this page and check it occasionally for new content. Additional papers will be added as they become available.