Tutorial on BetheSalpeter calculations¶
Absorption spectra including excitonic effects.¶
This tutorial discusses how to calculate the macroscopic dielectric function including excitonic effects within the BetheSalpeter (BS) approach. Crystalline silicon is used as test case. A brief description of the formalism can be found in the BetherSalpeter notes.
The user should be familiarized with the four basic tutorials of ABINIT and the first GW tutorial.
Important
All the necessary input files to run the examples can be found in the ~abinit/tests/ directory where ~abinit is the absolute path of the abinit toplevel directory.
To execute the tutorials, you are supposed to create a working directory (Work*
) and
copy there the input files and the files file of the lesson.
The files file ending with _x (e.g. tbase1_x.files) must be edited every time you start to use a new input file. You will discover more about the files file in section 1.1 of the help file.
To make things easier, we suggest to define some handy environment variables by executing the following lines in the terminal:
export ABI_HOME=Replace_with_the_absolute_path_to_the_abinit_top_level_dir export ABI_TESTS=$ABI_HOME/tests/ export ABI_TUTORIAL=$ABI_TESTS/tutorial/ # Files for base1234, GW ... export ABI_TUTORESPFN=$ABI_TESTS/tutorespfn/ # Files specific to DFPT tutorials. export ABI_TUTOPARAL=$ABI_TESTS/tutoparal/ # Tutorials about parallel version export ABI_TUTOPLUGS=$ABI_TESTS/tutoplugs/ # Examples using external libraries. export ABI_PSPDIR=$ABI_TESTS/Psps_for_tests/ # Pseudos used in examples. export PATH=$ABI_HOME/src/98_main/:$PATH
The examples in this tutorial will use these shell variables so that one can easily copy and paste the code snippets into the terminal (remember to set ABI_HOME first!)
The last line adds the directory containing the executables to your PATH so that one can invoke the code by simply typing abinit in the terminal instead of providing the absolute path.
Finally, to run the examples in parallel with e.g. 2 MPI processes, use mpirun (mpiexec) and the syntax:
mpirun n 2 abinit < files_file > log 2> err
The standard output of the application is redirected to log
while err
collects the standard error
(runtime error messages, if any, are written here).
This tutorial should take about one hour to be completed.
Visualisation tools are NOT covered in this tutorial. Powerful visualisation procedures have been developed in the Abipy context, relying on matplotlib. See the README of Abipy and the Abipy tutorials.
Preparatory steps (WFK and the SCR file)¶
Before starting, you might consider to work in a different subdirectory as for the other tutorials. Why not Work_bs?
Copy the files file $ABI_TUTORIAL/Input/tbs_1.files in the working directory Work_bs. Now run immediately the calculation with the command:
abinit < tbs_1.files > tbs_1.log 2> err &
so that we can analyze the input file while the code is running.
The input file is located in $ABI_TUTORIAL/Input/tbs_1.in. The header reports a brief description of the calculation so read it carefully. Don’t worry if some parts are not clear to you as we are going to discuss the calculation in step by step fashion.
../tbs_1.in tbs_1.out tbs_1i tbs_1o tbs_1t ../../../Psps_for_tests/14si.pspnc
# Crystalline silicon # Preparatory run for BS calculations # # There are four datasets specified in this input: # 1) Groundstate calculation to get the density. # 2) NSCF run to generate the WFK file on a symmetric kmesh (4x4x4, gammacentered) # 3) NSCF run to generate another WFK file on a shifted 4x4x4 kmesh that breaks the symmetry of the BZ sampling # 4) SCR calculation with the WFK file generated in the second dataset # ndtset 4 # Definition of the kpoint grid kptopt 1 # Option for the automatic generation of k points, ngkpt 4 4 4 # This mesh is too coarse for optical properties. nshiftk 1 shiftk 0.0 0.0 0.0 # Gammacentered kmesh # Dataset1: selfconsistent calculation # tolvrs1 1.0d8 prtden1 1 # Dataset2: definition of parameters for the calculation of the WFK file on the symmetric kmesh. # iscf2 2 # NSCF run getden2 1 # Read previous density file tolwfr2 1.0d8 nband2 105 # bands treated in the CG algorithm nbdbuf2 5 # The last five states are excluded from the converge check # to facilitate the convergence # Dataset3: calculation of the WFK file on the shifted kmesh to break the symmetry. # iscf3 2 getden3 1 tolwfr3 1.0d8 nband3 15 # Here we can reduce the number of bands since this WFK file # will be used to construct the transition space nbdbuf3 5 chksymbreak3 0 # To skip the check on the kmesh. shiftk3 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. # Dataset3: creation of the screening (eps^1) matrix # optdriver4 3 gwpara4 2 inclvkb4 2 awtr4 1 symchi4 1 getwfk4 2 ecuteps4 6 ecutwfn4 8 nband4 100 # This value leads to well converged QP energies, see the first GW tutorial nfreqre4 1 # Only the static limit is needed for standard BSE calculations. nfreqim4 0 # VARIABLES COMMON TO THE DIFFERENT DATASETS # Definition of the unit cell: fcc acell 3*10.217 # This is equivalent to 10.217 10.217 10.217 rprim 0.0 0.5 0.5 # FCC primitive vectors (to be scaled by acell) 0.5 0.0 0.5 0.5 0.5 0.0 # Definition of the atom types ntypat 1 # There is only one type of atom znucl 14 # The keyword "zatnum" refers to the atomic number of the # possible type(s) of atom. The pseudopotential(s) # mentioned in the "files" file must correspond # to the type(s) of atom. Here, the only type is Silicon. # Definition of the atoms natom 2 # There are two atoms typat 1 1 # They both are of type 1, that is, Silicon. xred # Reduced coordinate of atoms 0.0 0.0 0.0 0.25 0.25 0.25 # Definition of the planewave basis set ecut 8 # Maximal kinetic energy cutoff, in Hartree istwfk *1 nstep 50 # Maximal number of SCF cycles diemac 12.0 ## After modifying the following section, one might need to regenerate the pickle database with runtests.py r #%%<BEGIN TEST_INFO> #%% [setup] #%% executable = abinit #%% test_chain = tbs_1.in, tbs_2.in, tbs_3.in, tbs_4.in #%% [files] #%% files_to_test = #%% tbs_1.out, tolnlines= 20, tolabs= 1.1e2, tolrel= 4.0e2 #%% psp_files = 14si.pspnc #%% [shell] #%% post_commands = #%% ww_cp tbs_1o_DS3_WFK tbs_2i_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_2i_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_3i_DS1_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_3i_DS1_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_3i_DS2_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_3i_DS2_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_3i_DS3_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_3i_DS3_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_3i_DS4_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_3i_DS4_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_3i_DS5_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_3i_DS5_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_4i_DS1_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_4i_DS1_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_4i_DS2_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_4i_DS2_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_4i_DS3_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_4i_DS3_SCR; #%% ww_cp tbs_1o_DS3_WFK tbs_4i_DS4_WFK; #%% ww_cp tbs_1o_DS4_SCR tbs_4i_DS4_SCR; #%% [paral_info] #%% max_nprocs = 4 #%% [extra_info] #%% authors = M. Giantomassi #%% keywords = GW, BSE #%% description = #%% Crystalline silicon #%% Preparatory run for BS calculations #%% #%% There are four datasets specified in this input: #%% 1) Groundstate calculation to get the density. #%% 2) NSCF run to generate the WFK file on a symmetric kmesh (4x4x4, gammacentered) #%% 3) NSCF run to generate another WFK file on a shifted 4x4x4 kmesh that breaks the symmetry of the BZ sampling #%% 4) SCR calculation with the WFK file generated in the second dataset #%%<END TEST_INFO>
This input file generates the two WFK files and the SCR file needed for the subsequent BetheSalpeter computations. The first dataset performs a rather standard groundstate calculation on an \Gammacentered 4x4x4 grid (64 k points in the full Brillouin Zone, folding to 8 k points in the irreducible wedge). Then the groundstate density is used in dataset 2 and 3 to generate two WFK files with a standard NSCF run and the conjugategradient method.
Note that the WFK file computed in dataset 2 contains 100 bands on the 4x4x4 gammacentered kmesh whereas the WFK file produced in dataset 3 has only 10 bands on a 4x4x4 kmesh shifted along the direction
shiftk3 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh.
The \Gammacentered kmesh contains 8 points in the IBZ while the shifted kmesh breaks the symmetry of the crystal leading to 64 points in the IBZ (actually the IBZ now coincides with the full Brillouin zone). The second mesh is clearly inefficient, so you might wonder why we are using such a bizarre sampling and, besides, why we need to generate two different WFK files!
Indeed this approach strongly differs from the one we followed in the GW tutorials, but there is a good reason for doing so. It is anticipated that optical spectra converge slowly with the BZ sampling, and that symmetrybreaking kmeshes lead to faster convergence in nkpt than the standard symmetric kmeshes commonly used for groundstate or GW calculations.
This explains the bizarre shift but still why two WFK files? Why don’t we simply use the WFK file on the shifted kmesh to compute the screening?
The reason is that a screening calculation done with many empty bands on the shifted kmesh would be very memory demanding as the code should allocate a huge portion of memory whose size scales with (nband * nkpt), and no symmetry can be used to reduce the number of kpoints.
To summarize: the WFK with the symmetric kpoint sampling and 100 bands will be used to compute the screening, while the WFK file with the shifted kmesh and 10 bands will be used to construct the transition space employed for solving the BetheSalpeter equation. The two kmeshes differ just for the shift thus they produce the same set of qpoints (the list of qpoints in the screening is defined as all the possible differences between the kpoints of the WFK file). This means that, in the BetheSalpeter run, we can use the SCR file generated with the symmetric mesh even though the transition space is constructed with the shifted kmesh.
After this lengthy discussion needed to clarify this rather technical point, we can finally proceed to analyze the screening computation performed in the last dataset of tbs_1.in.
The SCR file is calculated in dataset 4 using nband = 100 and ecuteps = 6.0 Ha. In the first GW tutorial, these values were found to give QP energies converged within 0.01 eV, so we are confident that our SCR file is well converged and it can be safely used for performing convergence tests in the BetheSalpeter part.
Note that, for efficiency reasons, only the static limit of W is computed:
nfreqre4 1 # Only the static limit of W is needed for standard BSE calculations. nfreqim4 0
Indeed, in the standard formulation of the BetheSalpeter equation, only the static limit of the screened interaction is needed to construct the Coulomb term of the BS Hamiltonian. Using a single frequency allows us to save some CPU time in the screening part, but keep in mind that this SCR file can only be used either for BetheSalpeter computations or for GW calculations employing the plasmonpole models corresponding to ppmodel = 3, 4.
At this point the calculation should have completed, but there’s still one thing that we have to do before moving to the next paragraph.
As we said, we will need the WFK file on the shifted kmesh and the SCR file for our BS calculations so do not delete them! It is also a good idea to rename these precious files using more meaningful names e.g.:
mv tbs_1o_DS2_WFK 444_gamma_WFK mv tbs_1o_DS3_WFK 444_shifted_WFK mv tbs_1o_DS4_SCR 444_SCR
Important
The list of kpoints specified in the BS input files must equal the one used to generate the WFK file. Two new WFK files and a new SCR file must be generated from scratch if we want to change the kpoint sampling used to construct the transition space.
Computing the absorption spectrum within the TammDancoff approximation¶
This section is intended to show how to perform a standard excitonic calculation within the TammDancoff approximation (TDA) using the Haydock iterative technique. The input file is $ABI_TUTORIAL/tutorial/Input/tbs_2.in.
Before running the job, we have to connect this calculation with the output results produced in tbs_1.in.
Use the Unix commands:
ln s 444_shifted_WFK tbs_2i_WFK ln s 444_SCR tbs_2i_SCR
to create two symbolic links for the shifted WFK and the SCR file. The reason for doing so will be clear afterwards once we discuss the input file.
This job lasts 12 minutes on a modern machine so it is worth running it before inspecting the input file.
Copy the files file $ABI_TUTORIAL/Input/tbs_2.files in the working directory and issue:
abinit < tbs_2.files > tbs_2.log 2> err &
to put the job in background so that we can examine tbs_2.in.
Now open $ABI_TUTORIAL/Input/tbs_2.in in your preferred editor and go to the next section where we discuss the most important variables governing a typical BS computation.
../tbs_2.in tbs_2.out tbs_2i tbs_2o tbs_2t ../../../Psps_for_tests/14si.pspnc
# Crystalline silicon # BS run: TammDancoff approximation solved with the Haydock algorithm. # BSE run with Haydock iterative method (only resonant + W + v) optdriver 99 # BS calculation irdwfk 1 # Read the WFK file produced in tbs_1 irdscr 1 # Read the SCR file produced in tbs_1 bs_calctype 1 # L0 is contstructed with KS orbitals and energies. mbpt_sciss 0.8 eV # Scissors operator used to correct the KS band structure. bs_exchange_term 1 # Exchange term included. bs_coulomb_term 11 # Coulomb term included using the full matrix W_GG' bs_coupling 0 # TammDancoff approximation. bs_loband 2 nband 8 bs_freq_mesh 0 6 0.02 eV # Frequency mesh. bs_algorithm 2 # Haydock method. bs_haydock_niter 200 # Max number of iterations for the Haydock method. bs_haydock_tol 0.05 0 # Tolerance for the iterative method. zcut 0.15 eV # complex shift to avoid divergences in the continued fraction. # Definition of the kpoint grid # MUST be equal to the grid used for generating the WFK file. kptopt 1 # Option for the automatic generation of k points, ngkpt 4 4 4 # This mesh is too coarse for optical properties. nshiftk 1 shiftk 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. chksymbreak 0 # Mandatory for using symmetrybreaking kmeshes in the BS code. ecutwfn 8.0 # Cutoff for the wavefunction. ecuteps 2.0 # Cutoff for W and /bare v used to calculate the BS matrix elements. inclvkb 2 # The commutator for the optical limit is correctly evaluated. icutcoul 3 # Legacy value # VARIABLES COMMON TO THE DIFFERENT DATASETS # Definition of the unit cell: fcc acell 3*10.217 # This is equivalent to 10.217 10.217 10.217 rprim 0.0 0.5 0.5 # FCC primitive vectors (to be scaled by acell) 0.5 0.0 0.5 0.5 0.5 0.0 # Definition of the atom types ntypat 1 # There is only one type of atom znucl 14 # The keyword "zatnum" refers to the atomic number of the # possible type(s) of atom. The pseudopotential(s) # mentioned in the "files" file must correspond # to the type(s) of atom. Here, the only type is Silicon. # Definition of the atoms natom 2 # There are two atoms typat 1 1 # They both are of type 1, that is, Silicon. xred # Reduced coordinate of atoms 0.0 0.0 0.0 0.25 0.25 0.25 # Definition of the planewave basis set ecut 8 # Maximal kinetic energy cutoff, in Hartree istwfk *1 nstep 50 # Maximal number of SCF cycles diemac 12.0 ## After modifying the following section, one might need to regenerate the pickle database with runtests.py r #%%<BEGIN TEST_INFO> #%% [setup] #%% executable = abinit #%% test_chain = tbs_1.in, tbs_2.in, tbs_3.in, tbs_4.in #%% [files] #%% files_to_test = #%% tbs_2.out, tolnlines= 10, tolabs= 1.1e2, tolrel= 4.0e2, fld_options =ridiculous #%% psp_files = 14si.pspnc #%% [paral_info] #%% max_nprocs = 4 #%% [extra_info] #%% authors = M. Giantomassi #%% keywords = GW, BSE #%% description = #%% Crystalline silicon #%% BS run: TammDancoff approximation solved with the Haydock algorithm. #%%<END TEST_INFO>
The structure of the input file¶
First we need to set optdriver = 99 to call the BSE routines
optdriver 99 # BS calculation
The variables irdwfk and irdscr are similar to other “ird” variables of ABINIT and are used to read the files produced in the previous paragraph
irdwfk 1 # Read the WFK file produced in tbs_1 irdscr 1 # Read the SCR file produced in tbs_1
The code expects to find an input WFK file and an input SCR file whose name is constructed according to prefix specified in the files file tbs_2.files (see this section of the abinit_help file). This is the reason why we had to create the two symbolic links before running the code.
Then we have a list of five variables specifying how to construct the excitonic Hamiltonian.
bs_calctype 1 # L0 constructed with KS orbitals and energies. mbpt_sciss 0.8 eV # Scissors operator used to correct the KS band structure. bs_exchange_term 1 # Exchange term included. bs_coulomb_term 11 # Coulomb term included using the full matrix W_GG' bs_coupling 0 # TammDancoff approximation.
The value bs_calctype = 1 specifies that the independentparticle polarizability should be constructed with the KohnSham orbitals and energies read from the WFK file. To simulate the selfenergy correction, the KS energies are corrected with a scissors operator of energy mbpt_sciss = 0.8 eV. This permits us to avoid a cumbersome GW calculation for each state included in our transition space. The use of the scissors operator is a reasonable approximation for silicon but it might fail in more complicated systems in which the GW corrections cannot be simulated in terms of a simple rigid shift of the initial KS bands structure.
The remaining three variables specify how to construct the excitonic Hamiltonian. bs_exchange_term = 1 tells the code to calculate the exchange part of the kernel, hence this calculation includes localfield effects. The variable bs_coulomb_term is used to select among different options that are available for the Coulomb term (please take some time to read the description of the variable and the relevant equations in the BetheSalpeter notes. Finally bs_coupling = 0 specifies that the offdiagonal coupling blocks should be neglected (TammDancoff approximation). This particular combination of parameters thus corresponds to a BetheSalpeter calculation within the TammDancoff approximation with local field effects included.
Then we have the specification of the bands used to construct the transition space:
bs_loband 2 nband 8
In this case all the bands around the gap whose index is between 2 and 8 are included in the basis set.
The frequency mesh for the macroscopic dielectric function is specified by bs_freq_mesh
bs_freq_mesh 0 6 0.02 eV # Frequency mesh.
This triplet of real values defines a linear mesh that covers the range [0, 6] eV with a step of 0.02 eV. The number of frequency points in the mesh does not have any significant effect on the CPU time, but it is important to stress that the number of bands included in the transition space defines, in conjunction with the number of kpoints, the frequency range that can be described. As a consequence bs_loband and nband should be subject to an accurate converge study.
Then we have the parameters that define and control the algorithm employed to calculate the macroscopic dielectric function
bs_algorithm 2 # Haydock method (this is the default value). bs_haydock_niter 100 # Max number of iterations for the Haydock method. bs_haydock_tol 0.05 # Tolerance for the iterative method. zcut 0.15 eV # Complex shift to avoid divergences in the continued fraction.
bs_algorithm specifies the algorithm used to calculate the macroscopic dielectric function. In this case we use the iterative Haydock technique whose maximum number of iterations is given by bs_haydock_niter. The iterative algorithm stops when the difference between two consecutive evaluations of the optical spectra is less than bs_haydock_tol. The input variable zcut gives the complex shift to avoid divergences in the continued fraction. From a physical point of view, this parameters mimics the experimental broadening of the absorption peaks. In this test, due to the coarseness of the kmesh, we have to use a value slightly larger than the default one (0.1 eV) in order to facilitate the convergence of the Haydock algorithm. Ideally, one should perform a convergence study by decreasing the value of zcut for increasing number of kpoints.
The kpoint sampling is specified by the set of variables.
kptopt 1 # Option for the automatic generation of k points, ngkpt 4 4 4 # This mesh is too coarse for optical properties. nshiftk 1 shiftk 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. chksymbreak 0 # Mandatory for using symmetrybreaking kmeshes in the BS code.
Important
The values of kptopt, ngkpt, nshiftk, and shiftk must equal the ones used to specify the grid for the WFK file. chksymbreak = 0 is used to bypass the check on symmetry breaking that, otherwise, would make the code stop.
The last section of the input file
ecutwfn 8.0 # Cutoff for the wavefunction. ecuteps 2.0 # Cutoff for W and /bare v used to calculate the BS matrix elements. inclvkb 2 # The Commutator for the optical limit is correctly evaluated.
specifies the parameters used to calculate the kernel matrix elements and the matrix elements of the dipole operator. We have already encountered these variables in the first GW tutorial of the GW tutorial so their meaning is (hopefully) familiar to you. A more detailed discussion of the role played by these variables in the BS code can be found in the BetherSalpeter notes.
Output files¶
The output file, tbs_2.out, reports the basic parameters of the calculation and eventual WARNINGs that are issued if the iterative method does not converge. Please take some time to understand its structure.
Could you answer the following questions?
 How many transitions are included in the basis set?
 How many directions are used to evaluate the optical limit?
 What is the value of the Lorentzian broadening used in the continued fraction?
After this digression on the main output file, we can finally proceed to analyse the output data of the computation.
The most important results are stored in five different files:
 tbs_2o_BSR
 tbs_2o_HAYDR_SAVE
 tbs_2o_RPA_NLF_MDF
 tbs_2o_GW_NLF_MDF
 tbs_2o_EXC_MDF
In what follows, we provide a brief description of the format and of the content of each output file.
 tbs_2o_BSR:

This binary file stores the upper triangle of the resonant block (the matrix is Hermitian hence only the nonredundant part is computed and saved on file). The BSR file can be used to restart the run from a previous computation using the variables getbsreso or irdbsreso. This restart capability is useful for restarting the Haydock method if convergence was not achieved or to execute Haydock computations with different values of zcut. getbsreso and irdbsreso are also handy if one wants to include the coupling on top of a preexisting TDA calculation since the code uses two different files to store the resonant and the coupling block (BSC is the prefix used for the files storing the coupling term).
 tbs_2o_HAYDR_SAVE:

It is a binary file containing the results of the Haydock method: the coefficient of the tridiagonal matrix and the three vectors employed in the iterative algorithm. It is usually used to restart the algorithm if convergence has not been achieved (see the related input variables gethaydock and irdhaydock).
 tbs_2o_RPA_NLF_MDF and tbs_2o_GW_NLF_MDF

The RPA spectrum without local field effects obtained with KS energies and the GW energies, respectively (mnemonics: NLF stands for No Local Field, while MDF stands for Macroscopic Dielectric Function).
 tbs_2o_EXC_MDF

Formatted file reporting the macroscopic dielectric function with excitonic effects.
The EXC_MDF file contains the most important results of our calculation so it is worth spending some time to discuss its format.
First we have a header reporting the basic parameters of the calculation:
# Macroscopic dielectric function obtained with the BS equation. # RPA L0 with KS energies and KS wavefunctions LOCAL FIELD EFFECTS INCLUDED # RESONANTONLY calculation # Coulomb term constructed with full W(G1,G2) # Scissor operator energy = 0.8000 [eV] # Tolerance = 0.0500 # npweps = 27 # npwwfn = 283 # nbands = 8 # loband = 2 # nkibz = 64 # nkbz = 64 # Lorentzian broadening = 0.1500 [eV]
then the list of qpoints giving the direction of the incident photon:
# List of qpoints for the optical limit: # q = 0.938821, 0.000000, 0.000000, [Reduced coords] # q = 0.000000, 0.938821, 0.000000, [Reduced coords] # q = 0.000000, 0.000000, 0.938821, [Reduced coords] # q = 0.000000, 0.813043, 0.813043, [Reduced coords] # q = 0.813043, 0.000000, 0.813043, [Reduced coords] # q = 0.813043, 0.813043, 0.000000, [Reduced coords]
By default the code calculates the macroscopic dielectric function considering six different directions in qspace (the three basis vectors of the reciprocal lattice and the three Cartesian axis). It is possible to specify custom directions using the input variables gw_nqlwl and gw_qlwl.
Then comes the section with the real and the imaginary part of the macroscopic dielectric as a function of frequency for the different directions:
# omega [eV] RE(eps(q=1)) IM(eps(q=1) RE(eps(q=2) ) ... 0.000 1.8026E+01 0.0000E+00 1.7992E+01 0.0000E+00 1.4292E+01 0.0000E+00 1.3993E+01 0.0000E+00 1.7117E+01 0.0000E+00 1.7080E+01 0.0000E+00 .... .... ...
You can visualize the data using your preferred software. For instance, with gnuplot
p "tbs_2o_EXC_MDF" u 1:3 w l
will plot the imaginary part of the macroscopic dielectric function (the absorption spectrum) for the first qpoint. You should obtain a graphic similar to the one reported below
Note
These results are not converged, we postpone the discussion about convergence tests to the next paragraphs of this tutorial.
The most important feature of the spectrum is the presence of two peaks located at around 3.4 and 4.3 eV. To understand the nature of these peaks and the role played by the BS kernel, it is useful to compare the excitonic spectra with the RPA results obtained without local field effects.
Use the sequence of gnuplot command
p "tbs_2o_EXC_MDF" u 1:3 w l rep "tbs_2o_RPA_NLF_MDF" u 1:3 w l rep "tbs_2o_GW_NLF_MDF" u 1:3 w l
to plot the absorption spectrum obtained with the three different approaches. The final result is reported in the figure below.
The RPAKS spectrum underestimates the experimental optical threshold due to the well know band gap problem of DFT. Most importantly, the amplitude of the first peak is underestimated, a problem than is not solved when localfield effects are correctly included in the calculation.
The RPAGW results with QP corrections simulated with mbpt_sciss does not show any significant improvement over RPAKS: the RPAGW spectrum is just shifted towards higher frequencies due to opening of the gap, but the shape of the two spectra is very similar, in particular the amplitude of the first peak is still underestimated.
On the contrary, the inclusion of the BS kernel leads to important changes both in the optical threshold as well as in the amplitude of the first peak. This simple analysis tells us that the first peak in the absorption spectrum of silicon has a strong excitonic character that is not correctly described within the RPA. Our first BS spectrum is not converged at all and it barely resembles the experimental result, nevertheless this unconverged calculation is already able to capture the most important physics.
Tip
If AbiPy is installed on your machine, you can use the abiopen script
with the expose
option to visualize the dielectric functions stored in the MDF.nc file:
abiopen.py tbs_2o_MDF.nc expose seaborn
For further information about the MDF file and the postprocessing tools provided by AbiPy, please consult the MdfFile notebook .
Optional Exercises¶

Change the value of the Lorentzian broadening zcut used to avoid divergences in the continued fraction. Then restart the Haydock algorithm from the _BSR and _HAYDR_SAVE files using the appropriate variables. What is the main effect of the broadening on the final spectrum. Does the number of iterations needed to converge depend on the broadening?

Use the appropriate values for bs_exchange_term and bs_coulomb_term to calculate the BS spectrum without local field effects. Compare the results obtained with and without local field effects.

Modify the input file tbs_2.in so that the code reads in the resonant block produced in the previous run and calculates the spectrum employing the method based on the direct diagonalization (use irdbsreso to restart the run but remember to rename the file with the resonant block). Compare the CPU time needed by the two algorithms as a function of the number of transitions in the transition space. Which one has the best scaling?
Preliminary discussion about convergence studies¶
Converging the excitonic spectrum requires a careful analysis of many different parameters:
Since the memory requirements scale quadratically with the number of kpoints in the full Brillouin zone times the number of valence bands times the number of conduction bands included in the transition space, it is very important to find a good compromise between accuracy and computational efficiency.
First of all one should select the frequency range of interest since this choice has an important effect on the number of valence and conduction states that have to be included in the transition space. The optical spectrum is expected to converge faster in the number of bands than the GW corrections since only those transitions whose energy is “close” to the frequency range under investigation are expected to contribute.
ecutwfn usually plays a secondary role since it only affects the accuracy of the oscillator matrix elements. We suggest avoiding any truncation of the initial basis set by setting ecutwfn to a value slightly larger than the value of ecut used to generate the WFK file. One should truncate the initial planewave basis set only when experiencing memory problems although this kind of problems can be usually solved by just increasing the number of processors or, alternatively, with an appropriate choice of gwmem.
The value of ecuteps affects the accuracy of the matrix elements of the Coulomb term, the fundamental term that drives the creation of the excitons. As a consequence ecuteps should be subject to an accurate convergence test. As a rule of thumb, ecuteps can be chosen equal or, sometimes, even smaller than the value needed to converge the GW corrections.
As already stated: optical spectra converge slowly with the Brillouin zone sampling. The convergence in the number of kpoints thus represents the most important and tedious part of our convergence study. For this reason, this study should be done once converged values for the other parameters have been already found.
Convergence with respect to the number of bands in the transition space¶
In this section we take advantage of the multidataset capabilities of ABINIT to perform calculations with different values for bs_loband and nband
Before running the test take some time to read the input file $ABI_TUTORIAL/Input/tbs_3.in.
# Crystalline silicon # Convergence of the number of bands in the transition space. ndtset 2 #bs_loband1 4 nband1 5 bs_loband1 3 nband1 6 bs_loband2 2 nband2 7 bs_loband3 2 nband3 8 #bs_loband5 1 nband5 8 bs_coulomb_term 10 # Coulomb term evaluated within the diagonal approximation. irdwfk 1 irdscr 1 #getwfk 20 # Trick to read the same file tbs_o3_DS20_WFK in each dataset #getscr 20 # Same trick for the SCR file # Definition of the kpoint grid # MUST equal the grid used for generating the WFK file. kptopt 1 # Option for the automatic generation of k points, ngkpt 4 4 4 nshiftk 1 chksymbreak 0 # Mandatory for using symmetrybreaking kmeshes in the BS code. shiftk 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. # BSE run with Haydock iterative method (only resonant + W + v) optdriver 99 # BS calculation bs_calctype 1 mbpt_sciss 0.8 eV # Scissors operator used to correct the KS band structure. bs_exchange_term 1 # Exchange term included. bs_coupling 0 # TammDancoff approximation. bs_freq_mesh 0 6 0.02 eV # Frequency mesh. bs_algorithm 2 # Haydock method. bs_haydock_niter 100 # Max number of iterations for the Haydock method. bs_haydock_tol 0.05 0 # Tolerance for the iterative method. zcut 0.15 eV # complex shift to avoid divergences in the continued fraction. ecutwfn 8.0 # Cutoff for the wavefunction. ecuteps 2.0 # Cutoff for W and /bare v used to calculate the BS matrix elements. inclvkb 2 icutcoul 3 # Legacy value # VARIABLES COMMON TO THE DIFFERENT DATASETS # Definition of the unit cell: fcc acell 3*10.217 # This is equivalent to 10.217 10.217 10.217 rprim 0.0 0.5 0.5 # FCC primitive vectors (to be scaled by acell) 0.5 0.0 0.5 0.5 0.5 0.0 # Definition of the atom types ntypat 1 # There is only one type of atom znucl 14 # The keyword "zatnum" refers to the atomic number of the # possible type(s) of atom. The pseudopotential(s) # mentioned in the "files" file must correspond # to the type(s) of atom. Here, the only type is Silicon. # Definition of the atoms natom 2 # There are two atoms typat 1 1 # They both are of type 1, that is, Silicon. xred # Reduced coordinate of atoms 0.0 0.0 0.0 0.25 0.25 0.25 # Definition of the planewave basis set ecut 8 # Maximal kinetic energy cutoff, in Hartree istwfk *1 nstep 50 # Maximal number of SCF cycles diemac 12.0 ## After modifying the following section, one might need to regenerate the pickle database with runtests.py r #%%<BEGIN TEST_INFO> #%% [setup] #%% executable = abinit #%% test_chain = tbs_1.in, tbs_2.in, tbs_3.in, tbs_4.in #%% [files] #%% files_to_test = #%% tbs_3.out, tolnlines= 10, tolabs= 1.1e2, tolrel= 4.0e2, fld_options = ridiculous #%% psp_files = 14si.pspnc #%% [paral_info] #%% max_nprocs = 4 #%% [extra_info] #%% authors = M. Giantomassi #%% keywords = GW, BSE #%% description = #%% Crystalline silicon #%% Convergence of the number of bands in the transition space. #%%<END TEST_INFO>
The convergence in the number of transitions is performed by defining two datasets with different values for nband and bs_loband
ndtset 2 bs_loband1 3 nband1 6 bs_loband2 2 nband2 7
The parameters defining how to build the excitonic Hamiltonian are similar to the ones used in tbs_2.in. The only difference is in the value used for bs_coulomb_term, i.e.
bs_coulomb_term 10 # Coulomb term evaluated within the diagonal approximation.
that allows us to save some CPU time during the computation of the Coulomb term.
Also in this case, before running the test, we have to connect tbs_3.in to the WFK and the SCR file produced in the first step. Note that tbs_3.in uses irdwfk and irdscr to read the external files, hence we have to create symbolic links for each dataset:
ln s 444_SCR tbs_3i_DS1_SCR ln s 444_SCR tbs_3i_DS2_SCR ln s 444_shifted_WFK tbs_3i_DS1_WFK ln s 444_shifted_WFK tbs_3i_DS2_WFK
Now we can finally run the test with
abinit < tbs_3.files > tbs3.log 2> err &
This job should last 34 minutes so be patient!
Let us hope that your calculation has been completed, and that we can examine the output results.
Use the following sequence of gnuplot commands:
p "tbs_3o_DS1_EXC_MDF" u 1:3 w l rep "tbs_3o_DS2_EXC_MDF" u 1:3 w l
to plot on the same graphic the absorption spectrum obtained with different transition spaces. You should obtain a graphic similar (but not equal) to this one
Note indeed that the above figure has been produced with more datasets in order to highlight the convergence behaviour.
The results obtained with (bs_loband = 4, nband = 5) are clearly unconverged as the basis set contains too few transitions that are not able to describe the frequencydependence of the polarizability in the energy range under investigation. For a converged spectrum, we have to include the three higher occupied states and the first four conduction bands (the blue curve corresponding to bs_loband = 2, and nband = 7).
Note that adding the first occupied band, curve (18), gives results that are almost on top of (2,8). This is due to the fact that, in silicon, the bottom of the first band is located at around 12 eV from the top of the conduction band therefore its inclusion does not lead to any significant improvement of the transition space in the frequency range [0, 8] eV. For completeness, we also report the results obtained in a separate calculation done with bs_loband = 2 nband = 9 to show that four empty states are enough to converge the spectrum.
We therefore fix the number of bands for the transition space using bs_loband = 2, nband = 7 and we proceed to analyse the convergence of the spectrum with respect to the number of planewaves in the screening.
Tip
If AbiPy is installed on your machine, you can use the abicomp script
with the mdf
command and the expose
option to compare
multiple dielectric functions:
abicomp.py mdf tbs_3o_DS*_MDF.nc expose seaborn
Tips for expert users¶
The use of irdwfk and irdscr is not handy when we have several datasets that are reading the same external file as we are forced to use different names for the input of each dataset. To work around this annoyance, one can introduce a fictitious dataset (say dataset 99), and let the code use the output of this nonexistent dataset as the input of the real datasets. An example will help clarify: Instead of using the lengthy list of links as done before, we might use the much simpler sequence of commands
ln s 444_shifted_WFK tbs_3o_DS99_WFK ln s 444_SCR tbs_3o_DS99_SCR
provided that, in the input file, we replace irdwfk and irdscr with
getwfk 99 # Trick to read the same file tbs_o3_DS99_WFK in each dataset getscr 99 # Same trick for the SCR file
Convergence with respect to the number of planewaves in the screening¶
First of all, before running the calculation, take some time to understand what is done in $ABI_TUTORIAL/Input/tbs_4.in.
The structure of the input file is very similar to the one of tbs_3.in, the main difference is in the first section:
ndtset 2 ecuteps: 2 ecuteps+ 1 bs_coulomb_term 11
that instructs the code to execute two calculations where the direct term is constructed using different value of ecuteps. We also relax the diagonalonly approximation for the screening by setting bs_coulomb_term = 11 so that the nonlocality of W(\rr, \rr') is correctly taken into account.
It is important to stress that it is not necessary to recalculate the SCR file from scratch just to modify the value of ecuteps used in the BS run. The SCR file calculated in the preparatory step contains Gvectors whose energy extends up to ecuteps = 6.0 Ha. This is the maximum cutoff energy that can be used in our convergence tests. If the value of ecuteps specified in the input file is smaller than the one stored on disk, the code will read a sub block of the initial matrix. A WARNING message is issued if the value specified in the input file is larger than the one available in the SCR file.
Now we can finally run the calculation. As usual, we have to copy $ABI_TUTORIAL/Input/tbs_4.files in the working directory Work_bs, then we have to create a bunch of symbolic links for the input WFK and SCR files:
ln s 444_SCR tbs_4i_DS1_SCR ln s 444_SCR tbs_4i_DS2_SCR ln s 444_shifted_WFK tbs_4i_DS1_WFK ln s 444_shifted_WFK tbs_4i_DS2_WFK
Now issue
abinit < tbs_4.files > tbs4.log 2> err &
to execute the test (it should take around 2 minutes).
# Crystalline silicon # BS run: convergence in ecuteps ndtset 2 ecuteps: 2 ecuteps+ 1 bs_loband 2 nband 7 bs_coulomb_term 11 # Coulomb term evaluated within the diagonal approximation. irdwfk 1 irdscr 1 #getwfk 20 # Trick to read the same file tbs_o3_DS20_WFK in each dataset #getscr 20 # Same trick for the SCR file # Definition of the kpoint grid # MUST be equal to the grid used for generating the WFK file. kptopt 1 # Option for the automatic generation of k points, ngkpt 4 4 4 # This mesh is too coarse for optical properties. nshiftk 1 chksymbreak 0 # Mandatory for using symmetrybreaking kmeshes in the BS code. shiftk 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. # BSE run with Haydock iterative method (only resonant + W + v) optdriver 99 # BS calculation bs_calctype 1 mbpt_sciss 0.8 eV # Scissors operator used to correct the KS band structure. bs_exchange_term 1 # Exchange term included. bs_coupling 0 # TammDancoff approximation. bs_freq_mesh 0 6 0.02 eV # Frequency mesh. bs_algorithm 2 # Haydock method. bs_haydock_niter 100 # Max number of iterations for the Haydock method. bs_haydock_tol 0.05 0 # Tolerance for the iterative method. zcut 0.15 eV # complex shift to avoid divergences in the continued fraction. ecutwfn 8.0 # Cutoff for the wavefunction. inclvkb 2 icutcoul 3 # Legacy value # VARIABLES COMMON TO THE DIFFERENT DATASETS # Definition of the unit cell: fcc acell 3*10.217 # This is equivalent to 10.217 10.217 10.217 rprim 0.0 0.5 0.5 # FCC primitive vectors (to be scaled by acell) 0.5 0.0 0.5 0.5 0.5 0.0 # Definition of the atom types ntypat 1 # There is only one type of atom znucl 14 # The keyword "zatnum" refers to the atomic number of the # possible type(s) of atom. The pseudopotential(s) # mentioned in the "files" file must correspond # to the type(s) of atom. Here, the only type is Silicon. # Definition of the atoms natom 2 # There are two atoms typat 1 1 # They both are of type 1, that is, Silicon. xred # Reduced coordinate of atoms 0.0 0.0 0.0 0.25 0.25 0.25 # Definition of the planewave basis set ecut 8 # Maximal kinetic energy cutoff, in Hartree istwfk *1 nstep 50 # Maximal number of SCF cycles diemac 12.0 ## After modifying the following section, one might need to regenerate the pickle database with runtests.py r #%%<BEGIN TEST_INFO> #%% [setup] #%% executable = abinit #%% test_chain = tbs_1.in, tbs_2.in, tbs_3.in, tbs_4.in #%% [files] #%% files_to_test = #%% tbs_4.out, tolnlines= 10, tolabs= 1.1e2, tolrel= 4.0e2, fld_options = ridiculous #%% psp_files = 14si.pspnc #%% [paral_info] #%% max_nprocs = 4 #%% [extra_info] #%% authors = M. Giantomassi #%% keywords = GW, BSE #%% description = #%% Crystalline silicon #%% BS run: convergence in ecuteps #%%<END TEST_INFO>
Once the calculation is completed, plot the spectra obtained with different ecuteps using gnuplot :
p "tbs_4o_DS1_EXC_MDF" u 1:3 w l rep "tbs_4o_DS2_EXC_MDF" u 1:3 w l
Here we show the results of a convergence study done with four different values or ecuteps
to highlight the convergence behavior:
The spectrum is found to converge quickly in ecuteps. The curves obtained with ecuteps = 3 and 4 Ha are almost indistinguishable from each other. Our final estimate for ecuteps is therefore 3 Ha.
Note that this value is smaller than the one required to converge the QP corrections within 0.01 eV (in the first GW tutorial of the GW tutorial we obtained 6.0 Ha). This is a general behavior, in the sense that BetheSalpeter spectra, unlike GW corrections, are not very sensitive to truncations in the planewave expansion of W. Reasonable BS spectra are obtained even when W is treated within the diagonal approximation or, alternatively, with model dielectric functions.
Note also how the two peaks are affected in a different way by the change of ecuteps, with the first peak affected the most. This behavior is consistent with our affirmation that the first peak of silicon has a strong excitonic character.
Convergence with respect to the number of kpoints¶
The last parameter that should be checked for convergence is the number of kpoints. This convergence study represents the most tedious and difficult part since it requires the generation of new WFK files and of the new SCR file for each kmesh (the list of kpoints for the wavefunctions and the set of qpoints in the screening must be consistent with each other).
The file $ABI_TUTORIAL/Input/tbs_5.in gathers the different steps of a standard BS calculation (generation of two WFK file, screening calculation, BS run) into a single input. The calculation is done with the converged parameters found in the previous studies, only ngkpt has been intentionally left undefined.
# Crystalline silicon # Template for converge study on ngkpt # 1) GS # 2) generation of the WFK file on a symmetric kmesh # 3) generation of the WFK file on a shifted kmesh that breaks the symmetry of the BZ sampling # 4) SCR calculation using the WFK generated in the second dataset # 5) BS run with Haydock method (no coupling) # ndtset 5 # Definition of the kpoint grid kptopt 1 # Option for the automatic generation of k points, ngkpt ?? nshiftk 1 shiftk 0.0 0.0 0.0 # Gammacentered kmesh is # Dataset1: selfconsistent calculation tolvrs1 1.0d8 prtden1 1 # Dataset2: definition of parameters for the calculation of the wfk file on the symmetric kmesh. iscf2 2 # non selfconsistency, read previous density file getden2 1 tolwfr2 1.0d8 nband2 105 nbdbuf2 5 # The last five states are excluded from the converge check # Dataset3: calculation of the WFK file on the shifted kmesh to break the symmetry. iscf3 2 # non selfconsistency, read previous density file getden3 1 tolwfr3 1.0d8 nband3 15 # Here we can reduce the number of bands since this WFK file is only used in the BS run. nbdbuf3 5 # The last five states are excluded from the converge check chksymbreak3 0 # To skip the check on the kmesh. shiftk3 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. # Dataset3: creation of the screening (eps^1) matrix optdriver4 3 gwpara4 2 inclvkb4 2 awtr4 1 symchi4 1 getwfk4 2 ecuteps4 6 ecutwfn4 8 nband4 100 nfreqre4 1 # Only the static limit is needed for standard BSE calculations. nfreqim4 0 # BSE run with Haydock iterative method (only resonant + W + v) optdriver5 99 # BS calculation getscr5 4 getwfk5 3 # Read the WFK generated on the shifted kmesh. chksymbreak5 0 # To skip the check on the kmesh. shiftk5 0.11 0.21 0.31 # This shift breaks the symmetry of the kmesh. bs_calctype5 1 mbpt_sciss5 0.8 eV # Scissors operator used to correct the KS band structure. bs_exchange_term5 1 # Exchange term included. bs_coulomb_term5 11 bs_coupling5 0 # TammDancoff approximation. bs_loband5 2 nband5 8 ecuteps5 3 bs_freq_mesh5 0 6 0.02 eV # Frequency mesh. bs_algorithm5 2 # Haydock method. bs_haydock_niter5 100 # Max number of iterations for the Haydock method. bs_haydock_tol5 0.05 0 # Tolerance for the iterative method. zcut5 0.1 eV # complex shift to avoid divergences in the continued fraction. ecutwfn5 8.0 # Cutoff for the wavefunction. inclvkb5 2 # VARIABLES COMMON TO THE DIFFERENT DATASETS # Definition of the unit cell: fcc acell 3*10.217 # This is equivalent to 10.217 10.217 10.217 rprim 0.0 0.5 0.5 # FCC primitive vectors (to be scaled by acell) 0.5 0.0 0.5 0.5 0.5 0.0 # Definition of the atom types ntypat 1 # There is only one type of atom znucl 14 # The keyword "zatnum" refers to the atomic number of the # possible type(s) of atom. The pseudopotential(s) # mentioned in the "files" file must correspond # to the type(s) of atom. Here, the only type is Silicon. # Definition of the atoms natom 2 # There are two atoms typat 1 1 # They both are of type 1, that is, Silicon. xred # Reduced coordinate of atoms 0.0 0.0 0.0 0.25 0.25 0.25 # Definition of the planewave basis set ecut 8 # Maximal kinetic energy cutoff, in Hartree istwfk *1 nstep 50 # Maximal number of SCF cycles diemac 12.0
Use tbs_5.in as a template for performing BS calculations with different kmeshes. For example, you might try to compare the three meshes 4x4x4, 5x5x5, and 6x6x6. To facilitate the analysis of the results, we suggest to run the calculations in different directories so that we can keep the output results separated.
Be aware that both the CPU time as well as the memory requirements increase quickly with the number of divisions in the mesh. These are, for example, the CPU times required by different kmeshes on Intel Xeon X5570:
4x4x4: +Overall time at end (sec) : cpu= 112.4 wall= 112.4 5x5x5: +Overall time at end (sec) : cpu= 362.8 wall= 362.8 6x6x6: +Overall time at end (sec) : cpu= 914.8 wall= 914.8 8x8x8: +Overall time at end (sec) : cpu= 5813.3 wall= 5813.3 10x10x10: +Overall time at end (sec) : cpu= 20907.1 wall= 20907.1 12x12x12: +Overall time at end (sec) : cpu= 62738.2 wall= 62738.2
6x6x6 is likely the most dense sampling you can afford on a singleCPU machine. For you convenience, we have collected the results of the convergence test in the figure below.
As anticipated, the spectrum converges slowly with the number of kpoints and our first calculation done with the 4x4x4 grid is severely unconverged. The most accurate results are obtained with the 12x12x12 kmesh, but even this sampling leads to converged results only for frequencies below 4.5 eV. This is a problem common to all BS computations, in the sense that it is extremely difficult to achieve global converge in the spectra. This analysis shows that we can trust the 12x12x12 results in the [0:4,5] eV range while the correct description of the spectrum at higher energies would require the inclusion of more kpoint and, possibly, more bands so that the band dispersion is correctly taken into account (even the RPA spectrum does not converge at high frequencies when 12x12x12 is used).
It should be stressed that zcut plays a very important role in these converge tests. For example, the results obtained with the 8x8x8 or the 10x10x10 kmesh can be brought closer to the 12x12x12 by just increasing the Lorentzian broadening. When comparing theory with experiments, it is common to treat zcut as an a posteriori parameter chosen to produce the best agreement with the experiment.
Additional exercises¶

Use bs_coupling = 1 to perform an excitonic calculation for silicon including the coupling term. Compare the imaginary part of the macroscopic dielectric function obtained with and without coupling. Do you find significant differences? (Caveat: calculations with coupling cannot use the Haydock method and are much more CPU demanding. You might have to decrease some input parameters to have results in reasonable time.)

Calculate the oneshot GW corrections for silicon following the first GW tutorial. Then use the
_GW
file produced by the code to calculate the absorption spectrum. 
Learn how to use AbiPy to automate BS calculations as described in this jupyter notebook.
Notes on the MPI implementation¶
In this section, we discuss the approach used to parallelize the two steps of the BS run, i.e. the construction of the H matrix and the evaluation of the macroscopic dielectric function.
First of all, it is important to stress that, unlike the GW code, the BS routines do not employ any kind of memory distribution for the wavefunctions. The entire set of orbitals used to construct the transition space is stored on each node This choice has been dictated by the fact that the size of H is usually much larger than the array used to store the wavefunction, hence it is much more important to distribute the matrix than the wavefunctions. Besides, having all the states on each node simplifies the calculation of several intermediate quantities needed at runtime.
The memory allocated for the wavefunctions and the screening thus will not scale with the number of processors. However, for very memory demanding calculations, the real space orbitals can be calculated on the fly with an increase in computational time instead. This option is controlled by the second digit of the input variable gwmem.
When discussing the MPI parallelization of the BetheSalpeter routines, we have to consider the two steps separately.
In the first step, the upper triangle of the resonant (coupling) block is distributed among the nodes. Each CPU computes its own portion and stores the results in a temporary array. At the end of the computation, the portions of the upper triangle are communicated to the master node which writes the binary file BSR (BSC).
In the second step, each node reads the data stored in the external files in order to build the excitonic Hamiltonian. The matrix is distributed using a columnblock partitioning, so that the matrixvector multiplications required in the Haydock iterative scheme can be easily performed in parallel (see the schematic representation reported below). A similar distribution scheme is also employed for the conjugategradient minimization. For a balanced distribution of computational work, the number of processors should divide the total number of resonant transitions.