Skip to content

YAML-based test suite: Fortran and Python API

This page gives an overview of the internal implementation in order to facilitate future developments and extensions. In the first part we discuss how to write new Yaml documents using the low-level Fortran API. In the second part, we describe the python implementation and the steps required to register a new Yaml document in the framework. Finally, we provide recommendations for the name of the variables used in Yaml documents in order to improve readability and facilitate the integration between Yaml documents and the python code.

Fortran API

The low-level Fortran API consists of the following modules:

m_pair_list.F90:

a Fortran module for constructing dictionaries mapping strings to values. Internally, the dictionary is implemented in C in terms of a list of key-value pairs. The pair list can dynamically hold either real, integer or string values. It implements the basic getter and setter methods and an iterator system to allow looping over the key-value pairs easily in Fortran.

m_stream_string.F90:

a Fortran module providing tools to manipulate variable length strings with a file like interface (stream object)

m_yaml_out.F90:

a Fortran module to easily output YAML documents from Fortran. This module represents the main entry point for client code and may be used to implement methods to output Fortran objects. It provides basic routines to produce YAML documents based on a mapping at the root and containing well formatted (as human readable as possible and valid YAML) 1D and 2D arrays of numbers (integer or real), key-value mapping (using pair lists), list of key-value mapping and scalar fields. Routines support tags for specifying special data structures.

m_neat.F90 (NEw Abinit Test system):

a higher level module providing Fortran procedures to create specific documents associated to important physical properties. Currently routines for total energy components and ground state results are implemented.

From the perspective of the developer, adding a new YAML document requires three steps:

  1. Implement the output of the YAML document in Fortran using the pre-existent API. Associate a unique tag to the new document. The tag identify the document and its structure.

  2. register the tag and the associated class in the python code. Further details about this procedure can be found below.

  3. Create a YAML configuration file defining the quantities that should be compared with the corresponding tolerances.

An example will help clarify. Let’s assume we want to implement the output of the !Etot document with the different components of the total free energy.

The Fortran code that builds the list of (key, values) entries will look like: MG: The Fortran examples should be refactored and merged with the text in Creating a new document in Fortran

! Import modules
use m_neat, only: neat_etot
use m_pair_list, only: pair_list

real(dp) :: etot
type(pair_list) :: e_components  ! List of (key, value) entries

call e_components%set("comment", s="Total energie and its components")
call e_components%set("Total Energy", r=etot)

! footer of the routine, once all data have been stored
call neat_etot(e_components, unit)

In 47_neat/m_neat.F90 we will implement neat_etot:

subroutine neat_etot(components, unit)

type(pair_list), intent(in) :: components
integer, intent(in) :: unit

call yaml_single_dict("Etot", "", components, 35, 500, width=20, &
                      file=unit, real_fmt='(ES20.13)')
 !
 ! 35 -> max size of the labels, needed to extract from the pairlist,
 ! 500 -> max size of the strings, needed to extract the comment
 !
 ! width -> width of the field name side, permit a nice alignment of the values

end subroutine neat_etot

Creating a new document in Fortran

Developers are invited to browse the sources of m_neat and m_yaml_out to have a comprehensive overview of the available tools. Indeed the routines are documented and commented and creating a hand-written reference is likely to go out of sync quicker than on-site documentation. Here we show a little example to give the feeling of the process.

The simplest way to create a YAML document have been introduced above with the use of yaml_single_dict. However this method is limited to scalars only. It is possible to put 1D and 2D arrays, dictionaries and even tabular data. The best way to create a new document is to have a routine neat_my_new_document that will take all required data in argument and will do all the formatting work at once. This routine will declare a stream_string object to build the YAML document inside, open the document with yaml_open_doc, fill it, close it with yaml_close_doc and finally print it with wrtout_stream. Here come a basic skeleton of neat routine:

subroutine neat_my_new_document(data_1, data_2,... , iout)
  ! declare your pieces of data... (arrays, numbers, pair_list...)
  ...
  integer,intent(in) :: iout  ! this is the output file descriptor
!Local variables-------------------------------
  type(stream_string) :: stream

  ! open the document
  call yaml_open_doc('my label', 'some comments on the document', stream=stream)

  ! fill the document
  ...

  ! close and output the document
  call yaml_close_doc(stream=stream)
  call wrtout_stream(stream, iout)

end subroutine neat_my_new_document

Suppose we want a 2D matrix of real number with the tag !NiceMatrix in our document for the field name ‘that matrix’ we will add the following to the middle section:

call yaml_add_real2d('that matrix', dimension_1, dimension_2, mat_data, &
                     tag='NiceMatrix', stream=stream)

Other m_yaml_out routines provide a similar interface: first the label, then the data and its structural metadata, then a bunch of optional arguments for formatting, adding tags, tweaking spacing etc.

Python API

  • The fldiff algorithm has been slightly modified to extract YAML documents from the output file and store them for later treatment.

  • Newt tools have been written to facilitate the creation of new python classes corresponding to YAML tags for implementing new logic operating on the extracted data. These tools are very easy to use via class decorators.

  • These tools have been used to create basic classes for futures tags, among other classes that directly convert YAML list of numbers into NumPy arrays. These classes may be used as examples for the creation of further tags.

  • A parser for test configuration have been added and all facilities to do tests are in place.

  • A command line tool testtools.py to allow doing different manual actions (see Test CLI)

fldiff algorithm

The machinery used to extract data from the output file is defined in ~abinit/tests/pymods/data_extractor.py. This module takes cares of the identification of the YAML documents inside the output file including the treatment of the meta-characters found in the first column following the original fldiff.pl implementation.

~abinit/tests/pymods/fldiff.py is the main driver called by testsuite.py. This part implements the legacy fldiff algorithm and uses the objects defined in yaml_tools to perform the YAML-based tests and produce the final report.

Interface with the PyYAML library

The interface with the PyYAML library is implemented in ~abinit/tests/pymods/yaml_tools/. The __init__.py module exposes the public API that consists in:

  • the yaml_parse function that parses documents from the output files
  • the Document class that gives an interface to an extracted document with its metadata.
  • register_tag.py defines tools to register tags. See below for further details.

YAML-based testing logic

The logic used to parse configuration files is defined in meta_conf_parser.py. It contains the creation of config trees, the logic to register constraints and parameters and the logic to apply constraints. The Constraint class hosts the code to build a constraint, to identify candidates to its application and to apply it. The identification is based on the type of the candidate which is compared to a reference type or as set of types.

How to extend the test suite

Main entry points

There are three main entry points to the system that are discussed in order of increasing complexity. The first one is the YAML configuration file. It does not require any Python knowledge, only a basic comprehension of the conventions used for writing tests. Being fully declarative (no logic) it should be quite easy to learn its usage from the available examples. The second one is the ~abinit/tests/pymods/yaml_tests/conf_parser.py file. It contains the declarations of the available constraints and parameters. A basic python understanding is required in order to modify this file. Comments and doc strings should help users to grasp the meaning of this file. More details available in this section.

The third one is the file ~abinit/tests/pymods/yaml_tests/structures/. It defines the structures used by the YAML parser when encountering a tag (starting with !), or in some cases when reaching a given pattern (undef for example). The structures directory is a package organised by features (ex: there is a file structures/ground_state.py). Each file define structures for a given feature. All files are imported by the main script structures/__init__.py. Even if the abstraction layer on top of the yaml module should help, it is better to have a good understanding of more “advanced” python concepts like inheritance, decorators, classmethod etc.

Tag registration and classes implicit methods and attributes

Basic tag registration tools

The register_tag.py module defines python decorators that are used to build python classes associated to Yaml documents.

yaml_map
Registers a structure based on a YAML mapping. The class must provide a method from_map that receives a dict object as argument and returns an instance of the class. from_map should be a class method, but might be a normal method if the constructor accepts zero arguments.
yaml_seq
Register a structure based on a YAML sequence/list. The class must provide a from_seq method that takes a list as input and returns an instance of the class.
yaml_scalar
Register a structure based on a YAML scalar/anything that does not fit in the two other types but can be a string. The class must provide a from_scalar method that takes a string in argument and returns an instance of the class. It is useful to have complex parsing. A practical case is the CSV table parsed by pandas.
yaml_implicit_scalar
Provide the possibility to have special parsing without explicit tags. The class it is applied to should meet the same requirements than for yaml_scalar but should also provide a class attribute named yaml_pattern. It can be either a string or a compile regex and it should match the expected structure of the scalar.
auto_map
Provide a basic general purpose interface for map objects: - dict-like interface inherited from BaseDictWrapper (get, __contains__, __getitem__, __setitem__, __delitem__, __iter__, keys and items) - a from_map method that accept any key from the dictionary and add them as attributes. - a __repr__ method that show all attributes
yaml_auto_map
equivalent to auto_map followed by yaml_map
yaml_not_available_tag
This is a normal function (not a decorator) that take a tag name and a message as arguments and will register the tag but will raise a warning each time the tag is used. An optional third argument fatal replace the warning by an error if set to True.

Implicit methods and attributes

Some class attributes and methods are used here and there in the tester logic when available. They are never required but can change some behaviour when provided.

_is_base_array Boolean class attribute
Assert that the object derived from BaseArray when set to True (isinstance is not reliable because of the manipulation of the sys.path)
_not_available Boolean class attribute
set to True in object returned when a tag is registered with yaml_not_available_tag to make all constraints fail with a more useful message.
is_dict_like Boolean class attribute
Assert that the object provide a full dict like interface when set to True. Used in BaseDictWrapper.
__iter__ method
Standard python implicit method for iterables. If an object has it the test driver will crawl its elements.
get_children method
Easy way to allow the test driver to crawl the children of the object. It does not take any argument and return a dictionary {child_name: child_object ...}.
has_no_child Boolean class attribute
Prevent the test driver to crawl the children even if it has an __iter__ method (not needed for strings).
short_str method
Used (if available) when the string representation of the object is too long for error messages. It should return a string representing the object and not too long. There is no constraints on the output length but one should keep things readable.

Constraints and parameters registration

Add a new parameter

To have the parser recognise a new token as a parameter, one should edit the ~abinit/tests/pymods/conf_parser.py.

The conf_parser variable in this file have a method parameter to register a new parameter. Arguments are the following:

  • token: mandatory, the name used in configuration for this parameter
  • default: optional (None), the default value used when the parameter is not available in the configuration.
  • value_type: optional (float), the expected type of the value found in the configuration.
  • inherited: optional (True), whether or not an explicit value in the configuration should be propagated to deeper levels.

Example:

conf_parser.parameter('tol_eq', default=1e-8, inherited=True)

Adding a constraint

To have the parser recognise a new token as a constraint, one should also edit ~abinit/tests/pymods/conf_parser.py.

conf_parser has a method constraint to register a new constraint. It is supposed to be used as a decorator (on a function) that takes keywords arguments. The arguments are all optional and are the following:

  • name: (str) the name to be used in config files. If not specified, the name of the function is used.
  • value_type: (float) the expected type of the value found in the configuration.
  • inherited: (True), whether or not the constraint should be propagated to deeper levels.
  • apply_to: ('number'), the type of data this constraint can be applied to. This can be a type or one of the special strings ('number', 'real', 'integer', 'complex', 'Array' (refer to numpy arrays), 'this') 'this' is used when the constraint should be applied to the structure where it is defined and not be inherited.
  • use_params: ([]) a list of names of parameters to be passed as argument to the test function.
  • exclude: (set()) a set of names of constraints that should not be applied when this one is.
  • handle_undef: (True) whether or not the special value undef should be handled before calling the test function. If True and a undef value is present in the data the test will fail or succeed depending on the value of the special parameter allow_undef. If False, undef values won’t be checked. They are equivalent to NaN.

The decorated function contains the actual test code. It should return True if the test succeed and either False or an instance of FailDetail (from common.py) if the test failed.

If the test is simple enough one should use False. However if the test is compound of several non-trivial checks FailDetail come in handy to tell the user which part failed. When you want to signal a failed test and explaining what happened return FailDetail('some explanations'). The message passed to FailDetail will be transmitted to the final report for the user.

Example with FailDetail:

@conf_parser.constraint(exclude={'ceil', 'tol_abs', 'tol_rel', 'ignore'})
def tol(tolv, ref, tested):
    '''
        Valid if both relative and absolute differences between the values
        are below the given tolerance.
    '''
    if abs(ref) + abs(tested) == 0.0:
        return True
    elif abs(ref - tested) / (abs(ref) + abs(tested)) >= tolv:
        return FailDetail('Relative error above tolerance.')
    elif abs(ref - tested) >= tolv:
        return FailDetail('Absolute error above tolerance.')
    else:
        return True

How to add a new tag

Pyyaml offer the possibility to directly convert YAML documents to a Python class using tags. To register a new tag, edit the file ~abinit/tests/pymods/yaml_tools/structures.py. In this file there are several classes that are decorated with one of @yaml_map, @yaml_scalar, @yaml_seq or @yaml_auto_map. These decorators are the functions that actually register the class as a known tag. Whether you should use one or another depend on the organization of the data in the YAML document. Is it a dictionary, a scalar or a list/sequence ?

In the majority of the cases, one wants the tester to browse the children of the structure and apply the relevant constraints on it and to access attributes through their original name from the data. In this case, the simpler way to register a new tag is to use yaml_auto_map. For example to register a tag Etot that simply register all fields from the data tree and let the tester check them with tol_abs or tol_rel we would put the following in ~abinit/tests/pymods/yaml_tools/structures/ground_state.py:

@yaml_auto_map
class Etot(object):
    pass

Now the name of the class “Etot” is associated with a tag recognized by the YAML parser.

yaml_auto_map does several things for us:

  • it gives the class a dict like interface by defining relevant methods, which allows the tester to browse the children
  • it registers the tag in the YAML parser
  • it automatically registers all attributes found in the data tree as attributes of the class instance. These attributes are accessible through the attribute syntax (ex: my_object.my_attribute_unit) with a normalized name (basically remove characters that cannot be in a python identifier like spaces and punctuation characters) or through the dictionary syntax with their original name (ex: my_object['my attribute (unit)'])

Sometimes one wants more control over the building of the class instance. This is what yaml_map is for. Let suppose we still want to register tag but we want to select only a subset of the components for example. We will use yaml_map to gives us control over the building of the instance. This is done by implementing the from_map class method (a class method is a method that is called from the class instead of being called from an instance). This method take in argument a dictionary built from the data tree by the YAML parser and should return an instance of our class.

@yaml_map
class EnergyTerms(object):
    def __init__(self, kin, hart, xc, ew):
        self.kinetic = kin
        self.hartree = hart
        self.xc = xc
        self.ewald = ew

    @classmethod
    def from_map(cls, d):
        # cls is the class (Etot here but it can be
        # something else if we subclass Etot)
        # d is a dictionary built from the data tree

        kin = d['kinetic']
        hart = d['hartree']
        xc = d['xc']
        ew = d['Ewald energy']
        return cls(kin, hart, xc, ew)

Now we fully control the building of the structure, however we lost the ability for the tester to browse the components to check them. If we want to only make our custom check it is fine. For example we can define a method check_components and use the callback constraint like this:

In ~abinit/tests/pymods/yaml_tools/structure/ground_state.py

@yaml_map
class EnergyTerms(object):
    # same code as the previous example

    def check_components(self, tested):
        # self will always be the reference object
        return (
            abs(self.kinetic - other.kinetic) < 1.0e-10
            and abs(self.hartree - other.hartree) < 1.0e-10
            and abs(self.xc - other.xc) < 1.0e-10
            and abs(self.ewald - other.ewald) < 1.0e-10
        )

In the configuration file

EnergyTerms:
    callback:
        method: check_components

However it can be better to give back the control to tester. For that purpose we can implement the method get_children that should return a dictionary of the data to be checked automatically. tester will detect it and use it to check what you gave him.

@yaml_map
class EnergyTerms(object):
    # same code as the previous example

    def get_children(self):
        return {
            'kin': self.kinetic,
            'hart': self.hartree,
            'xc': self.xc,
            'ew': self.ewald
        }

Now tester will be able to apply tol_abs and friends to the components we gave him.

If the class has a complete dict-like read interface (__iter__ yielding keys, __contains__, __getitem__, keys and items) then it can have a class attribute is_dict_like set to True and it will be treated as any other node (it not longer need get_children). yaml_auto_map registered classes automatically address these requirements.

yaml_seq is analogous to yaml_map however to_map became to_seq and the YAML source data have to match the YAML sequence structure that is either [a, b, c] or

- a
- b
- c

The argument passed to to_seq is a list. If one wants tester to browse the elements of the resulting object one can either implement a get_children method or implement the __iter__ python special method.

If for some reason a class have the __iter__ method implemented but one does not want tester to browse its children (BaseArray and its subclasses are such a case) one can defined the has_no_child attribute and set it to True. Then tester won’t try to browse it. Strings are a particular case. They have the __iter__ method but will never been browsed.

To associate a tag to anything else than a YAML mapping or sequence one can use yaml_scalar. This decorator expect the class to have a method from_scalar that takes the raw source as a string in argument and return an instance of the class. It can be used to create custom parsers of new number representation. For example to create a 3D vector with unit tag:

@yaml_scalar
class Vec3Unit(object):

    def __init__(self, x, y, z, unit):
        self.x, self.y, self.z = x, y, z
        self.unit = unit

    @classmethod
    def from_scalar(cls, raw):
        sx, sy, sz, unit, *_ = raw.split()  # split on blanks
        return cls(float(sx), float(sy), float(sz), unit)

With that new tag registered the YAML parser will happily parse something like

kpt: !Vec3Unit 0.5 0.5 0.5 Bohr^-1

Finally when the scalar have a easily detectable form one can create an implicit scalar. An implicit scalar have the advantage to be detected by the YAML parser without the need of writing the tag before as soon as it match a given regular expression. For example to parse directly complex numbers we could (naively) do the following:

@yaml_implicit_scalar
class YAMLComplex(complex):

    # A (quite rigid) regular expression matching complex numbers
    yaml_pattern = r'[+-]?\d+\.\d+) [+-] (\d+\.\d+)i'

    # this have nothing to do with yam_implicit_scalar, it is just the way to
    # create a subclass of a python *native* object
    @staticmethod
    def __new__(*args, **kwargs):
        return complex.__new__(*args, **kwargs)

    @classmethod
    def from_scalar(cls, scal):
        # few adjustment to match the python restrictions on complex number
        # writing
        return cls(scal.replace('i', 'j').replace(' ', ''))

Recommend conventions for Yaml documents

This section discusses the basic conventions that should be followed when writing YAML documents in Fortran. Note that these conventions are motivated by technical aspects that will facilitate the integration with the python language as well as the implementation of post-processing tools.

  • The tag name should use CamelCase so that one can directly map tag names to python classes that are usually given following this convention (see also PEP8).

  • The tag name should be self-explanatory.

  • Whenever possible, the keywords should be a valid python identifier. This means one should avoid white spaces as much possible and avoid names starting with non-alphabetic characters. White spaces should be replaced by underscores. Prefer lower-case names whenever possible.

  • By default, quantities are supposed to be given in atomic units. If the document contains quantities given in other units, we recommended to encode this information in the name using the syntax: foo_eV.

  • Avoid names starting with an underscore because these names are reserved for future additions in the python infrastructure.

  • tol_abs, tol_rel, tol_vec, tol_eq, ceil, ignore, equation, equations, callback and callbacks are reserved keywords that shall not be used in Yaml documents.

  • The comment field is optional but it is recommended especially when the purpose of the document is not obvious.

An example of well-formed document

--- !EnergyTerms
comment             : Components of total free energy (in Hartree)
kinetic_energy      :  5.279019930263079807E+00
hartree_energy      :  8.846303409910728499E-01
xc_energy           : -4.035286123400158687E+00
total_energy_eV     : -2.756386620520307815E+02
...

An example of Yaml document that does not follow our guidelines:

--- !ETOT
Kinetic energy      :  5.279019930263079807E+00
Hartree energy      :  8.846303409910728499E-01
_XC energy          : -4.035286123400158687E+00
Total energy(eV)    : -2.756386620520307815E+02
equation            : 1 + 1 = 3
...