paral input variables¶

This document lists and provides the description of the name (keywords) of the paral input variables to be used in the input file for the abinit executable.

autoparal¶

Mnemonics: AUTOmatisation of the PARALlelism
Characteristics: DEVELOP
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 0
Added in version: before_v9

Test list (click to open). Moderately used, [25/1314] in all abinit tests, [7/178] in abinit tutorials

hpc_gpu_omp: t03.abi, t05.abi, t07.abi …
mpiio: t62.abi, t62.abi, t69.abi …
paral: t08.abi, t08.abi, t08.abi …
tutoparal: tdfpt_03.abi, tparal_bandpw_01.abi, tpsic_01.abi …
tutoplugs: tz2_2.abi …
tutorial: tgwr_1.abi …
v67mbpt: t36.abi …
wannier90: t04.abi, t05.abi …

This input variable controls the automatic determination of input parameters related to parallel work distribution if not specified in input file.

Note

Note that this variable is only used when running ground-state calculations in parallel with MPI (optdriver=1). Other optdriver runlevels implement different MPI algorithms that rely on other input variables that are not automatically set by autoparal . For example, consult the tutorial on parallelism for Many-Body Perturbation Theory to learn how to run beyond-GS calculations with MPI. Other tutorials on parallelism are also available.

Given a total number of processors, ABINIT can find a suitable distribution that fill (when possible) all the different levels of parallelization. ABINIT can also determine optimal parameters for the use of parallel Linear Algebra routines (using Scalapack or Cuda, at present). The different values are:

0 → The autoparal feature is deactivated. For ground-state and response function calculations, ABINIT can only activate automatically the parallelism over spins and k-points.
1 → The number of processors per parallelization level is determined by mean of a simple (but relatively efficient) heuristic method. A scaling factor is attributed to each level and an simple speedup factor is computed. The selected parameters are those giving the best speedup factor. Possibly concerned parameters: npimage, np_spkpt, npspinor, npfft, npband, bandpp.
2 → The number of processors per parallelization level is first determined by mean of a simple (but relatively efficient) heuristic method (see 1 above). Then the code performs a series of small benchmarks using the scheme applied for the LOBPCG algorithm (see wfoptalg = 4 or 14). The parallel distribution is then changed according to the benchmarks. Possibly concerned parameters: npimage, np_spkpt, npspinor, npfft, npband, bandpp.
3 → Same as autoparal = 1, plus automatic determination of Linear Algebra routines parameters. In addition, the code performs a series of small benchmarks using the Linear Algebra routines (ScaLapack or Cuda-GPU). The parameters used to optimize Linear Algebra work distribution are then changed according to the benchmarks. Possibly concerned parameters (in addition to those modified for autoparal = 1): use_slk, np_slk, gpu_linalg_limit
4 → combination of autoparal 2 and 3.

Note that autoparal = 1 can be used on every set of processors; autoparal > 1 should be used on a sufficiently large number of MPI process. Also note that autoparal can be used simultaneously with max_ncpus. In this case, ABINIT performs an optimization of process distribution for each total number of processors from 2 to max_ncpus. A weight is associated to each distribution and the higher this weight is the better the distribution is. After having printed out the weights, the code stops.

bandpp¶

Mnemonics: BAND Per Processor
Characteristics: DEVELOP
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_kgb == 1
Added in version: before_v9

Test list (click to open). Moderately used, [65/1314] in all abinit tests, [10/178] in abinit tutorials

gpu: t04.abi …
gpu_omp: t01.abi, t02.abi, t02.abi …
hpc_gpu_omp: t01.abi, t02.abi, t03.abi …
mpiio: t21.abi, t22.abi, t24.abi …
paral: t21.abi, t22.abi, t24.abi …
tutoparal: tparal_bandpw_03.abi, tparal_bandpw_04.abi, timages_01.abi …
v4: t93.abi, t94.abi …
v5: t75.abi …
v9: t201.abi, t202.abi, t205.abi …

Affect the size of a block in the LOBPCG algorithm. One should use nblock_lobpcg instead, which sets bandpp depending on nband and npband.

Important

This keyword works only with paral_kgb = 1 and has to be either 1 or a multiple of 2. Moreover nband / (npband $\times$ bandpp ) has to be integer.

With npband = 1:

bandpp =1 → band-per-band algorithm
bandpp /=1 → The minimization is performed using nband/ bandpp blocks of bandpp bands.

With npband > 1:

bandpp =1 → The minimization is performed using nband / npband blocks of npband bands.
bandpp /=1 → The minimization is performed using nband / (npband $\times$ bandpp ) blocks of npband $\times$ bandpp bands.

By minimizing a larger number of bands together in LOBPCG, we increase the convergence of the residuals. The better minimization procedure (as concerns the convergence, but not as concerns the speed) is generally performed by using bandpp $\times$ npband = nband.

When performing Gamma-only calculations (istwfk = 2), it is recommended to set bandpp = 2 (or a multiple of 2) as the time spent in FFTs is divided by two. Also, the time required to apply the non-local part of the KS Hamiltonian can be significantly reduced if bandpp > 1 is used in conjunction with use_gemm_nonlop = 1.

Note that increasing the value of bandpp can have a significant impact (reduction) on the computing time (especially if use_gemm_nonlop is used) but keep in mind that the size of the workspace arrays will also increase so the calculation may go out-of-memory if a too large bandpp is used in systems if many atoms.

gpu_devices¶

Mnemonics: GPU: choice of DEVICES on one node
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: (12)
Default value: -12
Only relevant if: gpu_option > 0 and CUDA (Nvidia GPU)
Added in version: before_v9

Test list (click to open). Rarely used, [3/1314] in all abinit tests, [0/178] in abinit tutorials

gpu: t05.abi, t05.abi, t05.abi

@legacy This variable is intended to be used when several GPU devices are present on each node, assuming the same number of devices on all nodes. Allows to choose in which order the GPU devices are chosen and distributed among MPI processes (see examples below). When the default value (-1) is set, the GPU devices are chosen by order of performance (FLOPS, memory).

Examples:

2 GPU devices per node, 4 MPI processes per node, gpu_device = *-1 (default): MPI processes 0 and 2 use the best GPU card, MPI processes 1 and 3 use the slowest GPU card.
3 GPU devices per node, 5 MPI processes per node, gpu_device =[1,0,2,-1,-1, . . .]: MPI processes 0 and 3 use GPU card 1, MPI processes 1 and 4 use GPU card 0, MPI process 2 uses GPU card 2.
3 GPU devices per node, 5 MPI processes per node, gpu_device =[0,1,-1,-1, . . .]: MPI processes 0, 2 and 4 use GPU card 0, MPI processes 1 and 3 use GPU card 1; the 3^rd GPU card is not used.

GPU card are numbered starting from 0; to get the GPU devices list, type f.i. (Nvidia): “nvidia-smi” or “lspci | grep -i nvidia”.

gpu_kokkos_nthrd¶

Mnemonics: GPU KOKKOS implementation: Number of THReaDs
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: number of OPENMP threads
Only relevant if: gpu_option == 3 (KOKKOS GPU implementation)
Added in version: v9.12

Test list (click to open). Rarely used, [0/1314] in all abinit tests, [0/178] in abinit tutorials

When GPU acceleration is enabled (via KOKKOS implementation), OpenMP parallelism on CPU cores is not fully supported. ABINIT will ignore OMP_NUM_THREADS value, using 1 instead. But we may locally increase the number of OpenMP threads to speed-up some specific parts of the code. In that case we use gpu_kokkos_nthrd to specify the number of OpenMP threads.

gpu_linalg_limit¶

Mnemonics: GPU: LINear ALGebra LIMIT
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 2000000
Only relevant if: gpu_option == 1 and CUDA (Nvidia GPU, old CUDA implementation)
Added in version: before_v9

Test list (click to open). Rarely used, [4/1314] in all abinit tests, [0/178] in abinit tutorials

gpu: t04.abi, t05.abi, t05.abi, t05.abi

@legacy This variable is obsolete and is only intended to be used with the old 2013 cuda implementation of ABINIT on GPU (gpu_option=1). In that case, the use of linear/matrix algebra on GPU is only efficient if the size of the involved matrices is large enough. The gpu_linalg_limit parameter defines the threshold above which linear (and matrix) algebra operations are done on the Graphics Processing Unit. The matrix size is evaluated as to: SIZE=(%mpw $\times$ nspinor / npspinor) $\times$ (npband $\times$ bandpp) $^2$ When SIZE>= gpu_linalg_limit , wfoptalg parameter is automatically set to 14 which corresponds to the use of the legacy LOBPCG algorithm for the calculation of the eigenstates.

gpu_nl_distrib¶

Mnemonics: GPU: Non-Local operator, DISTRIBute projections
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 0
Only relevant if: gpu_option == 2 (OPENMP_OFFLOAD)
Added in version: 9.12

Test list (click to open). Rarely used, [2/1314] in all abinit tests, [0/178] in abinit tutorials

gpu_omp: t42.abi, t43.abi

When using GPU acceleration, the wave-function projections ($<\tilde{p}i|\Psi> (used in the non-local operator) are all stored on all GPU devices. gpu_nl_distrib enable the distribution of these projections in slices on several GPU devices. This uses less memory per GPU but requires communications beween GPU devices. These communications may penalize the execution time, especially if splitting size is higher than the amount of GPU per node or if GPU-aware MPI wasn’t enabled at compile time. By default, the projections splitting size is automatically set by ABINIT after assessing GPU memory consumption. One may force a specific splitting size by setting gpu_nl_splitsize. In standard executions, the distribution is only needed on large use cases to address the high memory needs. It is recommended to use it with a GPU-aware MPI and enable its use when compiling ABINIT (--enable-mpi-gpu-aware).

gpu_nl_splitsize¶

Mnemonics: GPU: Non-Local operator SPLITing SIZE
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: gpu_option == 2 (OPENMP_OFFLOAD)
Added in version: 9.12

Test list (click to open). Rarely used, [4/1314] in all abinit tests, [0/178] in abinit tutorials

gpu_omp: t42.abi, t43.abi, t44.abi, t45.abi

When using GPU acceleration, the wave-function projections ($<\tilde{p}i|\Psi> (used in the non-local operator) are all stored on all GPU devices. gpu_nl_splitsize defines the number of slices used to split these projections to save GPU memory usage. By default, each task recomputes all projection slices when computing non-local operator. When gpu_nl_distrib is set to 1, projections slices are distributed among MPI tasks and computed only once per tasks, to be shared between MPI tasks when computing non-local operator. In standard executions, the distribution is only needed on large use cases to address the high memory needs. ABINIT automatically assess memory consumption and enable splitting to fit in memory if needed. As its heuristic to assess memory may be error-prone, user may to use this parameter to force a specific splitting size.

gpu_option¶

Mnemonics: GPU: OPTION to choose the implementation
Mentioned in topic(s): topic_parallelism
Variable type: integer or string
Dimensions: scalar
Default value: 2 if OPENMP_OFFLOAD, 3 if KOKKOS, 1 if CUDA, 0 otherwise.

Added in version: v9.12

Test list (click to open). Moderately used, [74/1314] in all abinit tests, [0/178] in abinit tutorials

gpu: t01.abi, t02.abi, t03.abi …
gpu_kokkos: t01.abi, t02.abi …
gpu_omp: t01.abi, t02.abi, t02.abi …
hpc_gpu_omp: t01.abi, t02.abi, t03.abi …
v5: t75.abi …

Only relevant for Ground-State calculations (optdriver == 0). This option is only available if ABINIT executable has been compiled for the purpose of being used with GPU accelerators. It allows to choose between the different GPU programming models available in ABINIT:

gpu_option = “GPU_DISABLED” or gpu_option = 0: no use of GPU (even if compiled for GPU).
gpu_option = “GPU_LEGACY” or gpu_option = 1: use the “legacy” 2013 implementation of GPU. This is a partial CUDA implementation, using the nvcc CUDA compiler. The old LOBPCG algorithm is automatically used to compute the eigenstates (wfoptalg=14). The external linear algebra library `MAGMA can also be linked to ABINIT to improve performances on large systems (see gpu_linalg_limit).
gpu_option = “GPU_OPENMP” or gpu_option = 2: use of the OPENMP_OFFLOAD programming model to execute time consuming parts of the code on GPU. This implementation works on NVidia accelerators, if ABINIT has been compiled with a CUDA compatible compiler and linked with NVidia FFT/linear algebra libraries (cuFFT, cuBLAS and cuSOLVER). It also works on `AMD accelerators (EXPERIMENTAL), if ABINIT has been compiled with a AMD compatible compiler and linked with NVidia FFT/linear algebra libraries (ROCm or HIP).
gpu_option = “GPU_KOKKOS” or gpu_option = 3: use of the KOKKOS+CUDA programming model to execute time consuming parts of the code on GPU. This implementation – at present – is only compatible with NVidia accelerators. It required that ABINIT has been linked to the Kokkos and YAKL performance libraries. It also uses NVidia FFT/linear algebra libraries (cuFFT, cuBLAS). The KOKKOS GPU implementation can be used in conjuction with openMP threads on CPU (see gpu_kokkos_nthrd).

For an expert use of ABINIT on GPU, some additional keywords can be used. See gpu_nl_distrib, gpu_nl_splitsize.

gpu_thread_limit¶

Mnemonics: GPU: Thread Limit
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: Minimum between 4 and number of OPENMP threads, if GPU is enabled, 0 otherwise.
Only relevant if: gpu_option /= 0
Added in version: 10.4

Test list (click to open). Rarely used, [1/1314] in all abinit tests, [0/178] in abinit tutorials

gpu_omp: t51.abi

When using GPU acceleration, some parts still runs on CPU and are parallelised using MPI distribution or OpenMP threads. Usually, GPU runs with one MPI task per GPU, which limits MPI-based parallelism, but with more OpenMP threads per tasks, which favors OpenMP parallelism. However, some OpenMP regions tend to be significantly slower with a high number of threads (OMP_NUM_THREADS env variable), hence ABINIT limit the number of threads in those select regions, to 4 threads by default, if GPU is enabled. This parameter allows to change this limit, for tuning purposes.

gwpara¶

Mnemonics: GW PARAllelization level
Mentioned in topic(s): topic_parallelism, topic_GW, topic_Susceptibility, topic_SelfEnergy
Variable type: integer
Dimensions: scalar
Default value: 2
Comment: The default value has been changed in v8, from 1 to 2.
Only relevant if: optdriver in [3,4]
Added in version: before_v9

Test list (click to open). Moderately used, [59/1314] in all abinit tests, [4/178] in abinit tutorials

libxc: t41.abi, t42.abi, t43.abi …
paral: t71.abi, t71.abi, t71.abi …
tutoparal: tmbt_2.abi, tmbt_3.abi, tucalc_crpa_2.abi …
tutorial: tbs_1.abi …
v3: t30.abi, t31.abi …
v4: t84.abi, t85.abi, t88.abi …
v5: t71.abi …
v67mbpt: t01.abi, t02.abi, t03.abi …
v7: t68.abi, t78.abi, t79.abi …
v8: t90.abi, t91.abi, t92.abi …
v9: t32.abi, t40.abi …
wannier90: t03.abi …

gwpara is used to choose between the two different parallelization levels available in the GW code. The available options are:

1 → parallelisation on k points.
2 → parallelisation on bands.

In the present status of the code, only the parallelization over bands ( gwpara = 2) allows one to reduce the memory allocated by each processor. Using gwpara = 1, indeed, requires the same amount of memory as a sequential run, irrespectively of the number of CPUs used.

In the screening calculation optdriver=3, with gwpara =2, the code distributes the wavefunctions such that each processing unit owns the FULL set of occupied bands while the empty states are DISTRIBUTED among the nodes. Thus the parallelisation is over the unoccupied states.

The parallelism of the self-energy calculation optdriver=4, with gwpara =2, is somehow different. It is over the entire set of bands, and has different characteristics for the correlation calculation and for the exchange calculation.. The MPI computation of the correlation part is efficient when the number of processors divides nband. Optimal scaling in the exchange part is obtained only when each node possesses the full set of occupied states.

localrdwf¶

Mnemonics: LOCAL ReaD WaveFunctions
Characteristics: DEVELOP
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Added in version: before_v9

Test list (click to open). Moderately used, [41/1314] in all abinit tests, [0/178] in abinit tutorials

mpiio: t01.abi, t01.abi, t01.abi …
paral: t01.abi, t01.abi, t01.abi …

This input variable is used only when running abinit in parallel. If localrdwf = 1, the input wavefunction disk file or the KSS/SCR file in case of GW calculations, is read locally by each processor, while if localrdwf = 0, only one processor reads it, and broadcast the data to the other processors.

The option localrdwf = 0 is NOT allowed when parallel I/O are activated (MPI-IO access), i.e. when iomode == 1.

In the case of a parallel computer with a unique file system, both options are as convenient for the user. However, if the I/O are slow compared to communications between processors,, localrdwf = 0 should be much more efficient; if you really need temporary disk storage, switch to localrdwf=1).

In the case of a cluster of nodes, with a different file system for each machine, the input wavefunction file must be available on all nodes if localrdwf = 1, while it is needed only for the master node if localrdwf = 0.

max_ncpus¶

Mnemonics: MAXimum Number of CPUS
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 0
Added in version: before_v9

Test list (click to open). Rarely used, [1/1314] in all abinit tests, [1/178] in abinit tutorials

tutoparal: tparal_bandpw_01.abi

If autoparal > 1 and max_ncpus is greater than 0, ABINIT analyzes the efficiency of the process distribution for each possible number of processors from 2 to max_ncpus . After having printed out the efficiency, the code stops.

np_slk¶

Mnemonics: Number of mpi Processors used for ScaLapacK calls
Characteristics: DEVELOP
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1000000
Only relevant if: optdriver == 1 and paral_kgb == 1 (Ground-state calculations with LOBPCG algorithm)
Added in version: before_v9

Test list (click to open). Rarely used, [2/1314] in all abinit tests, [0/178] in abinit tutorials

paral: t22.abi, t30.abi

When using Scalapack (or any similar Matrix Algebra library such as ELPA), the efficiency of the eigenproblem solver saturates as the number of CPU cores increases. In this case, it is more efficient to use a smaller number of CPUs for the LINALG calls. The maximum number of cores can be set with np_slk . A large number for np_slk (i.e. 1000000) means that all cores are used for the Linear Algebra calls. np_slk must divide the number of processors involved in diagonalizations (npband $\times$ npfft $\times$ npspinor). Note (bef v8.8): an optimal value for this parameter can be automatically found by using the autoparal input keyword. Note (since v8.8 and only for LOBPCG (wfoptalg == 114) with paral_kgb = 1): * If set to 0 then scalapack is disabled. * If set to its default value, then abinit uses between 2 and npbandnpfftnpspinor cpus according to the system size (default auto behaviour). See slk_rankpp for more customization. * If set to a number >1 then forces abinit to use exactly this number of cpus. Due to legacy behaviour, although it is not mandatory in theory, this value *must divide npbandnpfftnpspinor. See also slk_rankpp for a better tuning instead of this variable.

np_spkpt¶

Mnemonics: Number of Processors at the SPin and K-Point Level
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_kgb == 1
Added in version: 9.4.0

Test list (click to open). Moderately used, [69/1314] in all abinit tests, [4/178] in abinit tutorials

gpu_kokkos: t01.abi …
gpu_omp: t03.abi, t04.abi, t05.abi …
libxc: t53.abi …
mpiio: t21.abi, t22.abi, t24.abi …
paral: t08.abi, t08.abi, t08.abi …
tutoparal: tmoldyn_01.abi, tpsic_01.abi, tpsic_02.abi …
v7: t65.abi, t66.abi, t67.abi …

Relevant only for the band/FFT/k-point parallelisation (see the paral_kgb input variable). np_spkpt gives the number of processors among which the work load over the k-point/spin-component level is shared. np_spkpt , npfft, npband and npspinor are combined to give the total number of processors (nproc) working on the band/FFT/k-point parallelisation. See npband, npfft, npspinor and paral_kgb for the additional information on the use of band/FFT/k-point parallelisation.

Previously, this input variable was called npkpt.

np_spkpt should be a divisor of or equal to the number of k-point/spin- components (nkpt $\times$ nsppol) in order to have a good load-balancing and efficiency. Note: an optimal value for this parameter can be automatically found by using the autoparal input keyword.

npband¶

Mnemonics: Number of Processors at the BAND level
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_kgb == 1
Added in version: before_v9

Test list (click to open). Moderately used, [117/1314] in all abinit tests, [7/178] in abinit tutorials

gpu_kokkos: t01.abi, t02.abi …
gpu_omp: t03.abi, t04.abi, t05.abi …
hpc_gpu_omp: t01.abi, t02.abi, t03.abi …
mpiio: t21.abi, t22.abi, t24.abi …
paral: t18.abi, t19.abi, t20.abi …
tutoparal: tparal_bandpw_02.abi, tparal_bandpw_03.abi, tparal_bandpw_04.abi …
v10: t03.abi, t04.abi, t05.abi …
v67mbpt: t15.abi …
v9: t206.abi, t207.abi …

Relevant only for the band/FFT parallelisation (see the paral_kgb input variable). npband gives the number of processors among which the work load over the band level is shared. npband , npfft, np_spkpt and npspinor are combined to give the total number of processors (nproc) working on the band/FFT/k-point parallelisation. See npfft, np_spkpt, npspinor and paral_kgb for the additional information on the use of band/FFT/k-point parallelisation. npband has to be a divisor or equal to nband Note: an optimal value for this parameter can be automatically found by using the autoparal input keyword.

npfft¶

Mnemonics: Number of Processors at the FFT level
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_kgb == 1
Added in version: before_v9

Test list (click to open). Moderately used, [75/1314] in all abinit tests, [5/178] in abinit tutorials

gpu_kokkos: t01.abi, t02.abi …
gpu_omp: t03.abi, t04.abi, t05.abi …
hpc_gpu_omp: t01.abi, t02.abi, t03.abi …
mpiio: t21.abi, t22.abi, t24.abi …
paral: t21.abi, t22.abi, t24.abi …
tutoparal: tparal_bandpw_02.abi, tparal_bandpw_03.abi, tparal_bandpw_04.abi …
tutorial: tgwr_1.abi …
v67mbpt: t15.abi …

Relevant only for the band/FFT/k-point parallelisation (see the paral_kgb input variable). npfft gives the number of processors among which the work load over the FFT level is shared. npfft , np_spkpt, npband and npspinor are combined to give the total number of processors (nproc) working on the band/FFT/k-point parallelisation. See npband, np_spkpt, npspinor, and paral_kgb for the additional information on the use of band/FFT/k-point parallelisation.

Note: ngfft is automatically adjusted to npfft . If the number of processor is changed from a calculation to another one, npfft may change, and then ngfft also. Note: an optimal value for this parameter can be automatically found by using the autoparal input keyword.

nphf¶

Mnemonics: Number of Processors for (Hartree)-Fock exact exchange
Mentioned in topic(s): topic_Hybrids, topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Added in version: before_v9

Test list (click to open). Rarely used, [10/1314] in all abinit tests, [0/178] in abinit tutorials

libxc: t53.abi
paral: t93.abi, t94.abi
v7: t65.abi, t66.abi, t67.abi, t69.abi, t70.abi, t71.abi, t72.abi

Relevant only for the k-point/fock parallelisation (option paral_kgb input variable). nphf gives the number of processors among which the work load over the occupied states level is shared. nphf and np_spkpt are combined to give the total number of processors (nproc) working on the parallelisation.

Note: nphf should be a divisor or equal to the number of k-point times the number of bands for exact exchange (nkpthf $\times$ nbandhf) in order to have the better load-balancing and efficiency.

npimage¶

Mnemonics: Number of Processors at the IMAGE level
Mentioned in topic(s): topic_parallelism, topic_PIMD, topic_TransPath
Variable type: integer
Dimensions: scalar
Default value: 1
Added in version: before_v9

Test list (click to open). Rarely used, [8/1314] in all abinit tests, [4/178] in abinit tutorials

paral: t08.abi, t08.abi, t08.abi, t08.abi
tutoparal: timages_04.abi, tpsic_01.abi, tpsic_02.abi, tpsic_03.abi

Relevant only when sets of images are activated (see imgmov and nimage ). npimage gives the number of processors among which the work load over the image level is shared. It is compatible with all other parallelization levels available for ground-state calculations. Note: an optimal value for this parameter can be automatically found by using the autoparal input keyword.

See paral_kgb, np_spkpt, npband, npfft and npspinor for the additional information on the use of k-point/band/FFT parallelisation.

npkpt¶

Mnemonics: Number of Processors at the SPin and K-Point Level
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_kgb == 1
Added in version: before_v9

Test list (click to open). Moderately used, [15/1314] in all abinit tests, [0/178] in abinit tutorials

gpu_kokkos: t02.abi
hpc_gpu_omp: t01.abi, t02.abi, t03.abi, t04.abi, t05.abi, t06.abi, t07.abi, t08.abi, t09.abi, t10.abi, t11.abi, t12.abi, t13.abi
paral: t39.abi

This input variable has been superceded by np_spkpt. For the time being, for backward compatibility with AbiPy, npkpt is still recognized, with the same meaning than np_spkpt, despite the incorrect lack of mention of the spin parallelism in the name npkpt . Please, stop using npkpt as soon as possible.

nppert¶

Mnemonics: Number of Processors at the PERTurbation level
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_rf == 1
Added in version: before_v9

Test list (click to open). Rarely used, [4/1314] in all abinit tests, [0/178] in abinit tutorials

mpiio: t62.abi, t62.abi, t69.abi, t69.abi

This parameter is used in connection to the parallelization over perturbations (see paral_rf ), for a linear response calculation. nppert gives the number of processors among which the work load over the perturbation level is shared. It can even be specified separately for each dataset.

npspinor¶

Mnemonics: Number of Processors at the SPINOR level
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Only relevant if: paral_kgb == 1
Added in version: before_v9

Test list (click to open). Rarely used, [3/1314] in all abinit tests, [0/178] in abinit tutorials

mpiio: t28.abi
paral: t28.abi, t32.abi

Can be 1 or 2 (if nspinor = 2). Relevant only for the band/FFT/k-point parallelisation (see the paral_kgb input variable). npspinor gives the number of processors among which the work load over the spinorial components of wave-functions is shared. npspinor , npfft, npband and np_spkpt are combined to give the total number of processors (nproc) working on the band/FFT/k-point parallelisation. Note: an optimal value for this parameter can be automatically found by using the autoparal input keyword.

See np_spkpt, npband, npfft, and paral_kgb for the additional information on the use of band/FFT/k-point parallelisation.

paral_atom¶

Mnemonics: activate PARALelization over (paw) ATOMic sites
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 1
Added in version: before_v9

Test list (click to open). Moderately used, [20/1314] in all abinit tests, [1/178] in abinit tutorials

gpu_omp: t31.abi
libxc: t53.abi
mpiio: t62.abi, t62.abi
paral: t07.abi, t07.abi, t07.abi
tutorial: tnuc_4.abi
v10: t40.abi, t41.abi, t42.abi
v5: t06.abi
v7: t32.abi, t70.abi
v9: t44.abi, t140.abi, t141.abi, t142.abi, t143.abi, t144.abi

Relevant only for PAW calculations. This keyword controls the parallel distribution of memory over atomic sites. Calculations are also distributed using the “kpt-band” communicator. Compatible with ground-state calculations and response function calculations.

This parallelization concerns only a small part of the whole calculation in the sequential case. When using parallelism, it might be that this small part becomes predominant if paral_atom is not activated.

paral_kgb¶

Mnemonics: activate PARALelization over K-point, G-vectors and Bands
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 0
Added in version: before_v9

Test list (click to open). Moderately used, [198/1314] in all abinit tests, [16/178] in abinit tutorials

gpu: t02.abi …
gpu_kokkos: t01.abi, t02.abi …
gpu_omp: t03.abi, t04.abi, t05.abi …
hpc_gpu_omp: t01.abi, t02.abi, t03.abi …
mpiio: t01.abi, t01.abi, t01.abi …
paral: t01.abi, t01.abi, t01.abi …
tutoparal: tdfpt_03.abi, tparal_bandpw_01.abi, tparal_bandpw_02.abi …
tutorial: tgwr_1.abi, tnuc_4.abi …
v10: t40.abi, t41.abi, t42.abi …
v5: t76.abi …
v67mbpt: t15.abi …
v7: t76.abi, t77.abi, t78.abi …
v9: t29.abi, t30.abi, t44.abi …
wannier90: t04.abi, t05.abi …

Note

Note that this variable is only used when running ground-state calculations in parallel with MPI (optdriver=1) (or GW Lanczos-Sternheimer optdriver=66, but this is very rare). Other optdriver runlevels implement different MPI algorithms that rely on other input variables that are not automatically set by autoparal. For example, consult the tutorial on parallelism for Many-Body Perturbation Theory to learn how to run beyond-GS calculations with MPI. Other tutorials on parallelism are also available.

If paral_kgb is not explicitely put in the input file, ABINIT automatically detects if the job has been sent in sequential or in parallel. In this last case, it detects the number of processors on which the job has been sent and calculates values of np_spkpt, npfft, npband, bandpp, npimage and npspinor that are compatible with the number of processors. It then set paral_kgb to 0 or 1 (see hereunder) and launches the job.

If paral_kgb = 0, the parallelization over k-points only is activated. In this case, np_spkpt, npspinor, npfft and npband are ignored. Require compilation option –enable-mpi=”yes”.

If paral_kgb = 1, the parallelization over bands, FFTs, and k-point/spin- components is activated (see np_spkpt, npfft npband and possibly npspinor). With this parallelization, the work load is split over four levels of parallelization (three level of parallelisation (kpt-band-fft )+ spin). The different communications almost occur along one dimension only. Require compilation option –enable-mpi=”yes”.

HOWTO fix the number of processors along one level of parallelisation: At first, try to parallelise over the k point and spin (see np_spkpt,npspinor). Otherwise, for unpolarized calculation at the gamma point, parallelise over the two other levels: the band and FFT ones. For nproc $\leq$ 50, the best speed-up is achieved for npband = nproc and npfft = 1 (which is not yet the default). For nproc $\geq$ 50, the best speed-up is achieved for npband $\geq$ 4 $\times$ npfft.

For additional information, download F. Bottin presentation at the ABINIT workshop 2007

Suggested acknowledgments: [Bottin2008], also available on arXiv, http://arxiv.org/abs/0707.3405.

If the total number of processors used is compatible with the four levels of parallelization, the values for np_spkpt, npspinor, npfft, npband and bandpp will be filled automatically, although the repartition may not be optimal. To optimize the repartition use:

If paral_kgb = 1 and max_ncpus = n $\ne$ 0 ABINIT will test automatically if all the processor numbers between 2 and n are convenient for a parallel calculation and print the possible values in the log file. A weight is attributed to each possible processors repartition. It is adviced to select a processor repartition for which the weight is high (as closed to the number of processors as possible). The code will then stop after the printing. This test can be done as well with a sequential as with a parallel version of the code. The user can then choose the adequate number of processor on which he can run his job. He must put again paral_kgb = 1 in the input file and put the corresponding values for np_spkpt, npfft, npband,bandpp and possibly npspinor in the input file.

paral_rf¶

Mnemonics: Activate PARALlelization over Response Function perturbations
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 0
Added in version: before_v9

Test list (click to open). Rarely used, [4/1314] in all abinit tests, [0/178] in abinit tutorials

mpiio: t62.abi, t62.abi, t69.abi, t69.abi

This parameter activates the parallelization over perturbations which can be used during RF-Calculation. It is possible to use this type of parallelization in combination to the parallelization over k-points.

Currently total energies calculated by groups, where the master process is not in, are saved in.status_LOGxxxx files.

If paral_rf is set to -1, the code reports the list of irreducible perturbations for the specified q-point in the log file (YAML format) and then stops.

paral_rf can be specified separately for each dataset.

pw_unbal_thresh¶

Mnemonics: Plane Wave UNBALancing: THRESHold for balancing procedure
Mentioned in topic(s): topic_parallelism
Variable type: real
Dimensions: scalar
Default value: 40%
Only relevant if: paral_kgb == 1
Added in version: before_v9

Test list (click to open). Rarely used, [1/1314] in all abinit tests, [0/178] in abinit tutorials

mpiio: t26.abi

This parameter (in %) activates a load balancing procedure when the distribution of plane wave components over MPI processes is not optimal. The balancing procedure is activated when the ratio between the number of plane waves treated by a processor and the ideal one is higher than pw_unbal_thresh %.

use_slk¶

Mnemonics: USE ScaLapacK
Characteristics: DEVELOP
Mentioned in topic(s): topic_parallelism
Variable type: integer
Dimensions: scalar
Default value: 0
Added in version: before_v9

Test list (click to open). Rarely used, [4/1314] in all abinit tests, [0/178] in abinit tutorials

mpiio: t27.abi
paral: t24.abi, t25.abi, t29.abi

If set to 1, enable the use of ScaLapack within LOBPCG.