Ao Xu (NPU) | Research

Open Accelerators (OpenACC)

Open Accelerators (OpenACC) is a programming standard for parallel computing of heterogeneous CPU/GPU system.

Compile

Example to compile OpenACC code:

 
Single GPU: $ nvfortran -acc=gpu -Minfo=all calPi_OpenACC.F90 -o calPi
Multi-GPUs: $ mpif90 -acc Jacobi_MPI_OpenACC_block.F90 -o Jacobi
(obsoleted) $ pgf90 -acc -ta=tesla:cc35 -Minfo=all calPi_OpenACC.F90 -o calPi

Note: As of August 5, 2020, the "PGI Compilers and Tools" technology is a part of the NVIDIA HPC SDK product available as a free download from NVIDIA (see the news at https://news.developer.nvidia.com/hpc-sdk-ga-2020/)

OpenACC MPI Tutorial can be found https://docs.nvidia.com/hpc-sdk/compilers/openacc-mpi-tutorial/index.html

Using the compiler flag -⁠Minfo=intensity, we can see that the compute intensity, the ratio of computation to data movement.

Running

 
nohup ./calPi &

On Tianhe-2: #!/bin/bash module load PGIcompiler/17.1 yhrun -n 1 -N 1 -p gpu ./lidACC yhbatch -N 1 -p gpu ./job.sh

Sample OpenACC code

Example I: Calculate Pi

Example II: Poisson Solver on a single GPU

Example III: Poisson Solver on multi-GPUs

Example: Accelerated lattice Boltzmann simulation using OpenACC

In these tests, the mesh size 2048*2048, and the iterative steps are 20,000. The collision models are multiple-relaxation-time (MRT) models. All the simulations use double-precision floating point arithmetic. The following table gives performance comparison between CPU and GPU based LB algorithm implementation. Here, MLUPS is short for million lattice updates per second. (Ref: Xu, 2017, Int. J. Heat Mass Transf.)