HPC File Server Monitoring and Tuning

As HPC systems grow, the distributed file systems serving these systems need to handle an increased load of data. In order to maintain performance, these underlying file servers need to distributethe load of data volumes efficiently over available disks. This is particularly true at CERN, the Europe...

Full description

Bibliographic Details
Main Author: Andresen, Rune Johan
Format: Others
Language:English
Published: Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap 2005
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9271
Description
Summary:As HPC systems grow, the distributed file systems serving these systems need to handle an increased load of data. In order to maintain performance, these underlying file servers need to distributethe load of data volumes efficiently over available disks. This is particularly true at CERN, the European European Organizationfor Nuclear Research, which expects to behandling Pentabytes of data in the near future. In this thesis, new utilities that analyze file serverdata which is then used to semiautomatically tune thefiles system, are developed. This is achieved using a commercial database to store the dataand then integrating it with the file server. This requires a database and a system design that can handle a large amount of data. File server data collections associated with aprocess known as "volumes", can vary in size, and be accessed at any time. To increase the overall system performance, volume history data is analyzed to locate volumes that may be gathered for increased system performance throuhgh load balancing. For instance, using the volume history data, it is possible to detect and gather volumes that are most accessed during the day with volumes that are most accessed during the night on one file server. The file server capacity is hence optimized. As part of this work, a user interface which can visualize the history data for volumes and partitions, is designed and implemented on top of the AFS file system at CERN. Our initial results presented in this thesisreveal that it is possible to locate volumes that have a repeating access period, and thus, gather them on the same partition. Other analyses and suggestions for future work will also be discussed.