Updated 7 July 2020

Introduction

This article provides users with information about work currently underway to:

  • migrate the JASMIN infrastructure from RedHat Enterprise 6 (RHEL6) to CentOS7, and
  • prepare for the replacement of the LSF batch scheduler by SLURM

As stated in previous announcements, we are in the process of implementing a gradual transition of all "sci" servers and the LOTUS cluster to CentOS7.

Additionally, we will replace the existing batch scheduler (Platform LSF) with SLURM. We have chosen SLURM because it is a cost-effective alternative which is widely-used by other academic/scientific institutions (such as the UK Met Office).

Replacing the JASMIN Analysis Platform with JASPY

Work is underway to provide a new delivery method for the “JASMIN Analysis Platform” software, which is the stack of software currently maintained and deployed across the scientific analysis servers, the LOTUS batch processing clusterand a number of bespoke VMs which have been deployed.

The new system, known as “JASPY”, will be based around the "conda" system for creating the core Python environments in Python 2.7 and Python 3.7. JASPY is documented  here.

The JAP replacement takes the form of 2 new software components:

  • a new Conda-based set of Python environments: known as "Jaspy"
  • a Software Collections Library (SCL) which encapsulates the non-Python which were also part of the JAP: known as "jasmin-sci"

We have drafted documentation about the changes to software on JASMIN. The documentation is currently provided as a single web-based document (PDF). Please consult the documentation here:

Changes to software

Please let us know about your experience of using these CentOS7 machines, by e-mailing CEDA Support support@jasmin.ac.uk including "centos7 support" in the subject line. Please send us both your positive and negative feedback so that we can gauge the response and react to problems. If you have any specific feedback regarding the documentation then please also let us know.

Migration of virtual machines used for hosting JASMIN and CEDA services, and project VMs

The CEDA and JASMIN team is busy working on this activity which should be largely transparent to users. Where individual project VMs are affected, projects have been contacted and plans are in preparation for the redeployment of those hosts with CentOS7 equivalents.

Replacement of current scheduler system with SLURM

The CentOS7 queues managed by SLURM and the CentOS7 SLURM submission node will be available for users to use by mid-May 2020. Documentation for this is now available but further details are still being added.

SLURM Documentation

Users are encouraged to consult this documentation now and to consider what changes they need to make to their workflows, now that the previous LOTUS batch scheduler system is being drained of nodes which are now moving under the control of the SLURM scheduler (by end of June 2020).

Further details

  • A sub-cluster of LOTUS with the CentOS7 systems, and managed by the new batch scheduler SLURM, is now available for initial testing using the following queues, known as partitions in SLURM: short-serial, test, par-singleandpar-multi.
  • The OpenMPI library is the only supported MPI library on the CentOS7 cluster managed by SLURM. OpenMPI v3.1.1 and v4.0.0 are provided which are fully MPI3-compliant. Further details here
  • The new CentOS7 scientific analysis servers for SLURM job submission are: sci2-test.jasmin.ac.uk, sci4-test.jasmin.ac.uk, sci5-test.jasmin.ac.uk. Further information about the “sci” servers can be found here.
  • A webinar on transitioning from LSF to SLURM was held on Thursday 18th June. Presentation slides and a recording of the event are available here.
  • LSF-managed LOTUS resources will be reduced from mid-June. All LOTUS hosts are planned be moved to CentOS7 and managed by SLURM by the end of June.
  • A new CentOS7 "copy service" is available to test from the new CentOS7 sci[4,5].jasmin.ac.uk. Further details on the copy service can be found here
  • Information about the new CentOS7 login, sci and xfer servers is available here. The new sci servers (sci1.jasmin.ac.uk and sci2.jasmin.ac.uk) have now been configured to enable SLURM job submission, so users should use these where possible from now on (in preference to the previous set of sci servers jasmin-sci*.ceda.ac.uk which will be retired in due course).
  • Plans for updates to mass-cli1 and jasmin-cylc servers are currently taking shape. The new CentOS7 cylc server cylc.jasmin.ac.uk with SLURM enabled is available for users to test out. Please note that the environment variable for the new cylc server has to be set to /apps/jasmin/metomi/bin, e.g.
    if [[ $(hostname) =~ ^cylc ]] ; then
       export PATH=/apps/jasmin/metomi/bin:$PATH
    fi
    

Thank you for your attention and please look out for further updates as this work progresses.

CEDA and JASMIN Team