ARC: A Root Cluster for Research into Scalable Computer Systems

Official Annoucement of the ARC Cluster (local copy)
NCSU write-up on the ARC Cluster (local copy)
TechNewsDailyStory (local copy)


main system funded in part by NSF through CRI grant #0958311

Cooling door equipment and installation funded by NCSU CSC; GPUs funded in part by a grant from NCSU ETF funds, and by NVIDIA and HP donations


Old ARC Cluster

Looking for the documentation of the old ARC Cluster?

Hardware Status


READ THIS BEFORE YOU RUN ANYTHING: Running Jobs (via Slurm)

Once logged in the ARC, immediately obtain access to a compute node (interactively) or schedule batch jobs as shown below. Do not execute any other commands on the login node!


Hardware

1728 cores on 108 compute nodes integrated by Advanced HPC. All machines are 2-way SMPs with either AMD or Intel processors (see below).
Nodes:







Networking, Power and Cooling:

Pictures


System Status


Software

All software is 64 bit unless marked otherwise.

Obtaining an Account


Accessing the Cluster


Using OpenMP (via Gcc)


Running CUDA Programs (Version 8.0)


Running MPI Programs with MVAPICH2 and Gcc (Default)


Running MPI Programs with Open MPI and Gcc (Alternative)


Using the PGI compilers V16.7

(includes OpenMP and CUDA support via pragmas, even for Fortran)


Dynamic Voltage and Frequency Scaling (DVFS)


Power monitoring

Sets of three compute nodes share a power meter; in such a set, the lowest numbered node has the meter attached (either on the serial port or via USB). In addition, two individual compute nodes have power meters (with different GPUs). See this power wiring diagram to identify which nodes belong to a set. The diagram also indicates if a meter uses serial or USB for a given node. We recommend to explicitly request a reservation for all nodes in a monitored set (see srun commands with host name option). Monitoring at 1Hz is accomplished with the following software tools (on the respective nodes where meters are attached):

Virtualization with LXD (optionally with X11, VirtualBox, Docker inside)

Container virtualization support is realized via LXD. Please try to use CentOS images as they will take much less space than any other ones since only the differences to the host image need to be stored in the container.

Notice: Images are installed locally on the node you are running on. If you need identical images on multiple nodes, then write a script to create an image from scratch. You cannot simply copy images as they are in a protected directory.

X11 inside LXD:

VirtualBox inside LXD (requires X11, see above):

Docker inside of LXD:


PVFS2


PAPI


likwid V4.2.1


Hadoop Map-Reduce and Spark

Simple setup of multi-node
Hadoop map-reduce with HDFS, see also free AWS setup as an alternative and the original single node and cluster setup. But follow the instructions below for ARC. Other components, e.g., YARN, can be added to the setup below as well (not covered). We'll set up a Hadoop instance with nodes cXXX and cYYY (optionally more), so you should have gotten at least 2 nodes with srun. To get rid of ssh errors, you need to add a secondary node server and other optional services. This is not required, it's an option.

You can also run Spark on top of Hadoop as follows, which will also default to the HDFS file system:

export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/usr/local/spark
export PATH="$PATH:$SPARK_HOME/bin"
run-example SparkPi 10

Tensorflow (1.0.1)


Other Packges

A number of packages have been installed, please check out their location (via: rpm -ql pkg-name) and documentation (see URLs) in this PDF if you need them. (Notice, only the mvapich2/openmpi/gnu variants are installed.) Typically, you can get access to them via:
  module avail # show which modules are available
  module load X
  export |grep X #shows what has been defined
  gcc/mpicc -I${X_INC} -L{X_LIB} -lx #for a library
  ./X #for a tool/program, may be some variant of 'X' depending on toolkit
  module switch X Y #for mutually exclusive modules if X is already loaded
  module unload X
  module info #learn how to use modules
Current list of available modules:
  -------------------------- /opt/ohpc/pub/moduledeps/gnu-mvapich2 --------
   adios/1.10.0    mpiP/3.4.1           petsc/3.7.0        scorep/3.0
   boost/1.61.0    mumps/5.0.2          phdf5/1.8.17       sionlib/1.7.0
   fftw/3.3.4      netcdf/4.4.1         scalapack/2.0.2    superlu_dist/4.2
   hypre/2.10.1    netcdf-cxx/4.2.1     scalasca/2.3.1     tau/2.26
   imb/4.1         netcdf-fortran/4.4.4 scipy/0.18.0       trilinos/12.6.4

------------------------------ /opt/ohpc/pub/moduledeps/gnu ---------------
   R_base/3.3.1    metis/5.1.0          ocr/1.0.1          pdtoolkit/3.22
   gsl/2.2.1       mvapich2/2.2         openblas/0.2.19    superlu/5.2.1
   hdf5/1.8.17     numpy/1.11.1         openmpi/1.10.4

-------------------------------- /opt/ohpc/pub/modulefiles ----------------
   EasyBuild/2.9.0 gnu/5.4.0            papi/5.4.3         pgi64/2016
   PrgEnv-pgi/16.7 java                 pgi/16.7           prun/1.1           
   autotools       ohpc                 pgi/2016           valgrind/3.11.0
   cuda            openmpi/1.10.2/2016  pgi64/16.7    

Advanced topics (pending)

For all other topics, access is restricted. Request a root password. Also, read this documentation, which is only accessible from selected NCSU labs.

This applies to:


Known Problems

Consult the FAQ. If this does not help, then please report your problem.

References:

Additional references: