SLURM Administrator Guide for Sun Constellation systems

Overview

This document describes the unique features of SLURM on Sun Constellation computers. You should be familiar with the SLURM's mode of operation on Linux clusters before studying the relatively few differences in Sun Constellation system operation described in this document.

SLURM's primary mode of operation is designed for use on clusters with nodes configured in a one-dimensional space. Minor changes were required for the smap and sview tools to map nodes in a three-dimensional space. Some changes are also desirable to optimize job placement in three-dimensional space.

Configuration

Two variables must be defined in the config.h file: HAVE_SUN_CONST and HAVE_3D. This can be accomplished in several different ways depending upon how SLURM is being built.

  1. Execute the configure command with the option --enable-sun-const OR
  2. Execute the rpmbuild command with the option --with sun_const OR
  3. Add %with_sun_const 1 to your ~/.rpmmacros file.

Node names must have a three-digit suffix describing their zero-origin position in the X-, Y- and Z-dimension respectively (e.g. "tux000" for X=0, Y=0, Z=0; "tux123" for X=1, Y=2, Z=3). Rectangular prisms of nodes can be specified in SLURM commands and configuration files using the system name prefix with the end-points enclosed in square brackets and separated by an "x". For example "tux[620x731]" is used to represent the eight nodes in a block with endpoints at "tux620" and "tux731" (tux620, tux621, tux630, tux631, tux720, tux721, tux730, tux731). While node names of this form are required for SLURM's internal use, it need not be the name returned by the hostlist -s command. See man slurm.conf for details on how to use the NodeName, NodeAddr and NodeHostName configuration parameters for flexibility in this matter.

Next you need to select from two options for the resource selection plugin (the SelectType option in SLURM's slurm.conf configuration file):

  1. select/cons_res - Performs a best-fit algorithm based upon a one-dimensional space to allocate whole nodes, sockets, or cores to jobs based upon other configuration parameters.
  2. select/linear - Performs a best-fit algorithm based upon a one-dimensional space to allocate whole nodes to jobs.

In order for select/cons_res or select/linear to allocate resources physically nearby in three-dimensional space, the nodes be specified in SLURM's slurm.conf configuration file in such a fashion that those nearby in slurm.conf (managed internal to SLURM as a one-dimensional space) are also nearby in the physical three-dimensional space. If the definition of the nodes in SLURM's slurm.conf configuration file are listed on one line (e.g. NodeName=tux[000x333]), SLURM will automatically perform that conversion using a Hilbert curve. Otherwise you may construct your own node ordering sequence and list them one node per line in slurm.conf. Note that each node must be listed exactly once and consecutive nodes should be nearby in three-dimensional space. Also note that each node must be defined individually rather than using a hostlist expression in order to preserve the ordering (there is no problem using a hostlist expression in the partition specification after the nodes have already been defined). The open source code used by SLURM to generate the Hilbert curve is included in the distribution at contribs/skilling.c in the event that you wish to experiment with it to generate your own node ordering. Two examples of SLURM configuration files are shown below:

# slurm.conf for Sun Constellation system of size 4x4x4

# Configuration parameters removed here

# Automatic orders nodes following a Hilbert curve
NodeName=DEFAULT Procs=8 RealMemory=2048 State=Unknown
NodeName=tux[000x333]
PartitionName=debug Nodes=tux[000x333] Default=Yes State=UP
# slurm.conf for Sun Constellation system of size 2x2x2

# Configuration parameters removed here

# Manual ordering of nodes following a space-filling curve
NodeName=DEFAULT Procs=8 RealMemory=2048 State=Unknown
NodeName=tux000
NodeName=tux100
NodeName=tux110
NodeName=tux010
NodeName=tux011
NodeName=tux111
NodeName=tux101
NodeName=tux001
PartitionName=debug Nodes=tux[000x111] Default=Yes State=UP

In both of the examples above, the node names output by the scontrol show nodes will be ordered as defined (sequentially along the Hilbert curve or per the ordering in the slurm.conf file) rather than in numeric order (e.g. "tux001" follows "tux101" rather than "tux000"). The output of other SLURM commands (e.g. sinfo and squeue) will use a SLURM hostlist expression with the node names numerically ordered). SLURM partitions should contain nodes which are defined sequentially by that ordering for optimal performance.

Last modified 8 January 2009

Lawrence Livermore National Laboratory
7000 East Avenue • Livermore, CA 94550
Operated by Lawrence Livermore National Security, LLC, for the Department of Energy's
National Nuclear Security Administration
NNSA logo links to the NNSA Web site Department of Energy logo links to the DOE Web site