Looking for a clustered file system for SAS Grid?

18

With the popularity of SAS Grid Manager, this question often comes up:  which clustered or shared file system should we use with the multiple nodes of the SAS Grid? This is a question that needs to be thought through very carefully because the amount of time and effort to fix an incorrect or poor performing clustered file system is very significant with today’s large data implementations of SAS.

The SAS Global Forum 2013 paper A Survey of Shared File Systems describes which characteristics determine whether the clustered file system is a good fit for your environment and how to choose the one that meets your needs. According to the authors, the important characteristics of clustered or shared file systems with respect to SAS performance include:

  • whether the file system data is retained in memory in a local file cache
  • its handling of file system metadata
  • implications for the physical resources

This paper also covers the pros and cons of several clustered file systems that we have tested here at SAS. Please note that although the Veritas clustered file system (VxFS) is not discussed in the paper, this file system works very nicely with SAS Grid.

Please note the information regarding the common clustered file systems for Windows, as well as recent recommendations for the GFS2 file system from Red Hat. We have encountered some performance issues with these file systems.

There are other white papers listed in the SAS Usage Note: A list of papers useful for troubleshooting system performance problems that go into more details on how to configure clustered file systems like IBM’s GPFS. These papers are written based on the work we have done with IBM and several very large SAS customers. You may want to bookmark this page and check it occasionally for new content. Additional papers will be added as they become available.

Share

About Author

Margaret Crevar

Manager, SAS R&D Performance Lab

Margaret Crevar has worked at SAS since May 1982. She has held a variety of positions since then, working in sales, marketing and now research and development. In her current role, Crevar manages the SAS Performance Lab in R&D. This lab has two roles: testing future SAS releases while they're still in development to make sure they're performing as expected; and helping SAS customers who are experiencing performance issues overcome their challenges.

18 Comments

  1. We are considering replacing our server. Our current IT consultant is not familiar with SAS installations and would appreciate a contact to discuss system and application configuration. Can you recommend a SAS support expert who he could work with? (We currently maintain a Windows 64 bit server environment.)

    Thanks,

    Jim

    • Christina Harvey
      Christina Harvey on

      Jim . . .

      We recommend that you work with your SAS sales team. They will connect you with a SAS technical architect who can work with your IT consultant.

  2. Is GPFS the only clustered file system for SAS. I have some Veritas Clustered File system licenses I would like to look at re-using and was wondering if Veritas Clustered File System is supported and whether there are any best practice guides available.

    Thanks

    Regards

    Guido

  3. Margaret Crevar

    Guido . . .

    Yes, SAS GRID works wonderfully with the Veritas Clustered File System. The following are best practices when running with Veritas File System and Veritas Volume manager

    1. Use 8KB block size when creating a VxFS file system
    2. Use multipathing.
    3. When creating a Veritas volume use a RAID-0 stripe across multiple LUNs (assuming LUNS are RAID protected).
    4. Depending upon the underlying LUN characteristics make the VxVM strip size be equal to a full stripe-width on the underlying LUN. For example if a LUN is a RAID-5 volume created as a 4+1 with a 64KB strip size, the strip size for the host should be 256KB.
    5. In the above case the SAS BUFSIZE parameter should be set to 256KB in the SAS configuration file.
    6. If SAS BUFSIZE is 256KB or larger use vxtunefs to increase the value of discovery_direct_iosz.
    7. If SAS BUFSIZE is 256KB or less change the vxtunefs parameters
    • Read_perf_io
    • Write_perf_io
    to match the SAS BUFSIZE otherwise make the values an integer divisor of the SAS BUFSIZE.
    8. Check read_nstream and write_nstream values are set appropriately from VxVM volume configuration.
    9. Mount the SASWORK temporary file system(s) with tmplog and mincache=tmpcache.

    In SAS Version 9.1 and 9.2 have users modify LIBNAME statement to add offset=value, where value is the SAS BUFSIZE. Using the example above (5) add offset=256K to the LIBNAME statement. In version 9.3 put –alignsasiofiles in the V9 config file.

    By default a SAS data set contains 512 bytes of header data. The actual SAS data begins in the file at an 8K offset and the file is written to with BUFSIZE IO request sizes. When offset= is set then the first data page will be at “offset” in the file.

    Margaret

  4. Hi Margaret,

    Do you have any experience with the Gluster filesystem from a SAS Grid perspective?
    We are currently investigating it and it is providing good performance.

    Are there any technical limitations for not being a supported SAS grid filesystem?

    Regards,
    Wouter

    • Margaret Crevar
      Margaret Crevar on

      Greetings Wouter. This is a great question.

      As you may know, we work very closely with testing SAS on their operating system, including SAS Grid. Recent testing with Red Hat on SAS Grid and using the Gluster file system Red Hat has advised us that Gluster is not ready for SAS and should not be used for SAS Grid.

      If this status ever changes, we will announce that SAS customers can use SAS Grid with GlusterFS as an entry to this blog.

        • Margaret Crevar
          Margaret Crevar on

          It is still not supported per work we have done with Red Hat. No plans to do more work with Gluster in the near future.

  5. Hi Migrate,

    In this article "Best Practices for Data Sharing in a Grid Distributed SAS® Environment" ( https://support.sas.com/rnd/scalability/grid/Shared_FileSystem_GRID.pdf ) , it mentions that Solaris QFS was one of the shared file systems that SAS has reviewed. However the article does not give any evaluation of QFS. According to SAS experience, is QFS a recommended shared file system for SAS GRID compute servers which are installed in a Solaris environment ?

    I appreciate your prompt reply,

    • Margaret Crevar
      Margaret Crevar on

      Lei,

      The document you are referencing is very old. SAS has had no experience with Oracle's QFS in over 5 years. Back then it worked okay with no modifications to it.

      Margaret

  6. Hi Migrate,

    I have a question about LUN size vs number of LUNs when you need to create a filesystem for SAS datasets, especially in performance perspective.

    Assume we need a 4TB capacity file system to hold SAS datasets, we could:

    Create 1 LUN with 4TB size;
    Create 2 LUNs with 2TB size each;
    Create 4 LUNs with 1TB size each,
    Create 8 LUNs with 500GB size each;
    etc.

    It's understandable that different storage types (and/or different file system types) might have different approach or formula.
    But as a general rule, in performance perspective, to create a big filesystem for SAS datasets, do you prefer bigger LUN size with fewer LUNs, or smaller LUN size with more LUNs.

    Thanks in advance,

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top