Parallelism
This page gives hints on how to set parameters for a parallel calculation with the ABINIT package.
Introduction¶
Running ABINIT in parallel (MPI 10 processors) can be as simple as:
mpirun n 10 abinit run.abi > log 2> err
or (MPI 10 processors + OpenMP 4 threads):
export OMP_NUM_THREADS=4
mpirun n 10 abinit run.abi > log 2> err
In the latter, the standard output of the application is redirected to log
while err
collects the standard error.
The command mpirun might possibly be replaced by mpiexec depending on your system.

For groundstate calculations, the code has been parallelized (MPIbased parallelism) on the kpoints, the spins, the spinor components, the bands, and the FFT grid and plane wave coefficients. For the kpoint and spin parallelisations (using MPI), the communication load is generally very small. and the parallel efficiency very good provided the number of MPI procs divide the number of kpoints in the IBZ. However, the number of nodes that can be used with this kind of kpoint/spin distribution might be small, and depends strongly on the physics of the problem. A combined FFT / band parallelisation (LOBPCG with paral_kgb 1) is available [Bottin2008], and has shown very large speed up (>1000) on powerful computers with a large number of processors and highspeed interconnect. The combination of FFT / band / k point and spin parallelism is also available, and quite efficient for such computers. Available for normconserving as well as PAW cases. Automatic determination of the best combination of parallelism levels is available. Use of MPIIO is mandatory for the largest speed ups to be observed.

Chebyshev filtering (Chebfi) is a method to solve the linear eigenvalue problem, and can be used as a SCF solver in Abinit. It is implemented in Abinit [Levitt2015]. The design goal is for Chebfi to replace LOBPCG as the solver of choice for largescale computations in Abinit. By performing less orthogonalizations and diagonalizations than LOBPCG, scaling to higher processor counts is possible. A manual to use Chebfi is available here

For groundstate calculations, with a set of images (e.g. nudged elastic band method, the string method, the pathintegral molecular dynamics, the genetic algorithm), MPIbased parallelism is used. The workload for the different images has been distributed. This parallelization level can be combined with the parallelism described above, leading to speedup beyond 5000.

For groundstate calculations, GPUs can be used. The implementation is based on CUDA+MAGMA.

For groundstate calculations, the wavelet part of ABINIT (BigDFT) is also very well parallelized: MPI band parallelism, combined with GPUs.

For response calculations, the code has been MPIparallelized on kpoints, spins, bands, as well as on perturbations. For the kpoints, spins and bands parallelisation, the communication load is rather small also, and, unlike for the GS calculations, the number of nodes that can be used in parallel will be large, nearly independently of the physics of the problem. Parallelism on perturbations is very similar to the parallelism on images in the ground state case (so, very efficient), although the load balancing problem for perturbations with different number of k points is not adressed at present. Use of MPIIO is mandatory for the largest speed ups to be observed.

GW calculations are MPIparallelized over kpoints. They are also parallelized over transitions (valence to conduction band pairs), but the two parallelisation cannot be used currently at present. The transition parallelism has been show to allow speed ups as large as 300.

Ground state, response function, and GW parallel calculations can be done also by using OpenMP parallelism, even combined with MPI parallelism.
Related Input Variables¶
basic:
 autoparal AUTOmatisation of the PARALlelism
 paral_atom activate PARALelization over (paw) ATOMic sites
 paral_kgb activate PARALelization over Kpoint, Gvectors and Bands
 paral_rf Activate PARALlelization over Response Function perturbations
useful:
 bandpp BAND Per Processor
 gwpara GW PARAllelization level
 max_ncpus MAXimum Number of CPUS
 np_spkpt Number of Processors at the SPin and KPoint Level
 npband Number of Processors at the BAND level
 npfft Number of Processors at the FFT level
 nphf Number of Processors for (Hartree)Fock exact exchange
 npimage Number of Processors at the IMAGE level
 nppert Number of Processors at the PERTurbation level
 npspinor Number of Processors at the SPINOR level
expert:
 diago_apply_block_sliced Inverse Overlapp block matrix applied in a sliced fashion
 gpu_devices GPU: choice of DEVICES on one node
 gpu_linalg_limit GPU (Cuda): LINear ALGebra LIMIT
 iomode InputOutput MODE
 localrdwf LOCAL ReaD WaveFunctions
 np_slk Number of mpi Processors used for ScaLapacK calls
 npkpt Number of Processors at the SPin and KPoint Level
 pw_unbal_thresh Plane Wave UNBALancing: THRESHold for balancing procedure
 slk_rankpp ScaLapacK matrix RANK Per Process
 use_gemm_nonlop USE the GEMM routine for the application of the NONLocal OPerator
 use_gpu_cuda activate USE of GPU accelerators with CUDA (nvidia)
 use_nvtx activate USE of NVTX tracing/profiling
 use_slk USE ScaLapacK
Selected Input Files¶
paral:
 tests/paral/Input/t08.abi
 tests/paral/Input/t21.abi
 tests/paral/Input/t22.abi
 tests/paral/Input/t24.abi
 tests/paral/Input/t25.abi
 tests/paral/Input/t26.abi
 tests/paral/Input/t29.abi
 tests/paral/Input/t30.abi
 tests/paral/Input/t51.abi
 tests/paral/Input/t86.abi
Tutorials¶
 An introduction on ABINIT in Parallel should be read before going to the next tutorials about parallelism. One simple example of parallelism in ABINIT will be shown.
 Parallelism over bands and plane waves presents the combined kpoint (K), planewave (G), band (B), spin/spinor parallelism of ABINIT (so, the “KGB” parallelism), for the computation of total energy, density, and ground state properties
 Parallelism for molecular dynamics calculations
 Parallelism based on “images”, e.g. for the determination of transitions paths (NEB, string method) or PIMD, that can be activated on top of the “KGB” parallelism for force calculations.
 Parallelism for groundstate calculations, with wavelets presents the parallelism of ABINIT, when wavelets are used as a basis function instead of planewaves, for the computation of total energy, density, and ground state properties
 Parallelism of responsefunction calculations  you need to be familiarized with the calculation of linearresponse properties within ABINIT, see the tutorial ResponseFunction 1 (RF1)
 Parallelism of ManyBody Perturbation calculations (GW) allows to speed up the calculation of accurate electronic structures (quasiparticle band structure, including manybody effects).