Installing WRF on ARCHER
These instructions describe how to compile and run WRF on the new national supercomputing facility, ARCHER. ARCHER officially replaced the HECToR supercomputer on 21st March 2014. The following instructions have been tested against WRF versions 3.3.1 and 3.4.1.
If you had an account on HECToR, then you can request an account on ARCHER by following the instructions at http://www.archer.ac.uk/support/getting-started/
Requirements
* WRF model source code (including WPS); see http://www.mmm.ucar.edu/wrf/users/download/get_source.html.
* netcdf v4.3.0 (pre-installed on ARCHER and available at /opt/cray/netcdf/4.3.0/cray/81);
* Cray compiler suite (NB - this is currently the default compiler suite on ARCHER, and so you don't need to do anything here)
Compilation Instructions
Copy the WRF tar file over to your work directory on ARCHER, and extract using the command:
tar -xvf WRFV3.4.1.TAR.gz
Within the newly extracted WRF directory, create a bash script called configure_wrf.sh, containing the following:
#!/bin/bash
module load netcdf
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
export NETCDF=/opt/cray/netcdf/default/cray/81
./configure
Give your file executable permissions, i.e.
chmod +x configure_wrf.sh
Then run the script by typing
./configure_wrf.sh
When prompted, select the option:
- Cray XT CLE/Linux x86_64, Cray CCE compiler with gcc (dmpar)
This will select the cray fortran compiler, with 'dmpar' option (distributed memory) which uses MPI, thus allowing you to run WRF as a parallel job across multiple cores.
Upon completion, a file called configure.wrf will be generated, containing the settings needed for the compilation to proceed.
It is preferable to submit the compilation via a serial job submission script. It is possible to run the compile command interactively direct from the command line, although the /tmp directories on the login nodes are periodically emptied so with a long compile process such as WRF, temporary files required by the compiler sometimes go missing sporadically, leading to apparent random failures. To get around this, you can set the $TMPDIR environment variable to point to a location in your own workspace, e.g. $HOME/tmp.
An example serial job submission script to compile WRF is given below:
#!/bin/bash --login
#
#PBS -l select=serial=true:ncpus=1
#PBS -l walltime=04:00:00
#PBS -A [budget_code]
module load netcdf
export WRFIO_NCD_LARGE_FILE_SUPPORT=1
export NETCDF=/opt/cray/netcdf/default/cray/81
# Make sure any symbolic links are resolved to absolute path
export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)
# Change to the directory that the job was submitted from
cd $PBS_O_WORKDIR compile em_real >&compile_log_out.txt
In the above script, be sure to replace [budget_code] with the string specifying your budget. For most CAS people, this will be n02-weat. Save your script with the name compile_wrf.sh, and then submit it using qsub:
qsub ./compile_wrf.sh
Output from the compilation will be saved in a text file called compile_log_out.txt. To check on the status of your serial job, use qstat:
qstat -u <username>
To check for any errors in the compile log file, use grep:
grep Error compile_log_out.txt
Upon successful completion, you should see the wrf.exe and real.exe executables in the 'main' directory.
Running WRF
Below is an example of a parallel job submission script called wrf_exe.sh for running WRF on ARCHER. Note that pbs scripts from HECToR will not work on ARCHER. Also ARCHER has 24 cores per compute node, whereas HECToR phase 3 had 32 cores per node. So running a job across 12 nodes on ARCHER equates to 288 cores in total.
#!/bin/bash --login
# uncomment the following two lines if you need to debug this script
# set -v # Print script lines as they are read.
# set -x # Print commands and their arguments as they are executed.
# PBS job options (name, compute nodes, job time)
#PBS -N wrf_exe
#PBS -l select=12 # number of nodes required for your job (remember 24 cores per nodes)
#PBS -l walltime=08:00:00
# Replace [budget_code] below with your project code (e.g. n02-weat)
#PBS -A [budget_code]
# Make sure any symbolic links are resolved to absolute path
export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)
# Change to the directory that the job was submitted from
# (remember this should be on the /work filesystem)
cd $PBS_O_WORKDIR
# Set the number of threads to 1
# This prevents any system libraries from automatically using threading.
export OMP_NUM_THREADS=1
# Launch the parallel job using 288 MPI processes/cores
aprun -n 288 ./wrf.exe arg1 arg2
Submit using qsub:
qsub ./wrf_exe.sh
As before, you can check on the status of your job using the 'qstat' command. When the job starts to run, you can check on progress using the command:
more rsl.error.0000
Scroll to the end of this file to see how many timesteps have been completed.