Configuring the checks

You can pass a Config object to check() to customise the checks done on your Data Package’s properties. The following configuration options are available:

Important

The Data Package standard uses language from RFC 2119 to define its specifications. They use “MUST” for required properties and “SHOULD” for properties that should be included but are not strictly required. We try to match this language in check-datapackage by using the terms “MUST” and “SHOULD”, though we also use “required” for “MUST” in our documentation.

Excluding checks

You can exclude checks based on their type and the fields they apply to.

The Data Package standard defines a range of check types (e.g., required or pattern) and it is also possible to create your own. For example, to exclude checks flagging missing fields, you would exclude the required check by defining an Exclusion object with this type:

from textwrap import dedent
import check_datapackage as cdp

exclusion_required = cdp.Exclusion(type="required")

To exclude checks of a specific field or fields, you can use a JSON path in the jsonpath attribute of an Exclusion object. For example, you can exclude all checks on the name field of the Data Package properties by writing:

exclusion_name = cdp.Exclusion(jsonpath="$.name")

Or you can use the wildcard JSON path selector to exclude checks on the path field of all Data Resource properties:

exclusion_path = cdp.Exclusion(jsonpath="$.resources[*].path")

The type and jsonpath arguments can also be combined:

exclusion_desc_required = cdp.Exclusion(type="required", jsonpath="$.resources[*].description")

This will exclude required checks on the description field of Data Resource properties.

To apply your exclusions when running the check(), you add them to the Config object passed to the check() function:

package_properties = {
    "name": 123,
    "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
    "id": "123-abc-123",
    "created": "2014-05-14T05:00:01+00:00",
    "version": "1.0.0",
    "licenses": [{"name": "odc-pddl"}],
    "resources": [
        {
            "name": "woolly-dormice-2015",
            "title": "Body fat percentage in the hibernating woolly dormouse",
            "path": "https://en.wikipedia.org/wiki/Woolly_dormouse",
        }
    ],
}

config = cdp.Config(exclusions=[exclusion_required, exclusion_name, exclusion_path])
cdp.check(properties=package_properties, config=config)
[]

In the example above, we would expect four Issue items: the package name is a number, the required description field is missing in both the package and resource properties, and the resource path doesn’t point to a data file. However, as we have defined exclusions for all of these, the function will flag no issues.

Adding custom checks

It is possible to create custom checks in addition to the ones defined in the Data Package standard.

Let’s say your organisation only accepts Data Packages licensed under MIT. You can express this requirement in a CustomCheck as follows:

license_check = cdp.CustomCheck(
    type="only-mit",
    jsonpath="$.licenses[*].name",
    message=dedent("""
        Data Packages may only be licensed under MIT. Please review
        the licenses listed in the Data Package.
        """),
    check=lambda license_name: license_name == "mit",
)

Here’s a breakdown of what each argument does:

  • type: An identifier for your custom check. This is what will show up in error messages and what you will use if you want to exclude your check. Each CustomCheck should have a unique type.
  • jsonpath: The location of the field or fields the custom check applies to, expressed in JSON path notation. This check applies to the name field of all package licenses.
  • message: The message that is shown when the check is violated.
  • check: A function that expresses the custom check. It takes the value at the jsonpath location as input and returns true if the check is met, false if it isn’t.

To register your custom checks with the check() function, you add them to the Config object passed to the function:

package_properties = {
    "name": "woolly-dormice",
    "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
    "description": dedent("""
        This scoping review explores the hibernation physiology of the
        woolly dormouse, drawing on data collected over a 10-year period
        along the Taurus Mountain range in Turkey.
        """),
    "id": "123-abc-123",
    "created": "2014-05-14T05:00:01+00:00",
    "version": "1.0.0",
    "licenses": [{"name": "odc-pddl"}, {"name": "mit"}],
    "resources": [
        {
            "name": "woolly-dormice-2015",
            "title": "Body fat percentage in the hibernating woolly dormouse",
            "path": "resources/woolly-dormice-2015/data.parquet",
        }
    ],
}

config = cdp.Config(custom_checks=[license_check])
cdp.check(properties=package_properties, config=config)

We can see that the custom check was applied: check() returned one issue flagging the first license attached to the Data Package.

Strict mode

The Data Package standard includes properties that “MUST” and “SHOULD” be included and/or have a specific format in a compliant Data Package. By default, check() only the check() function only includes “MUST” checks. To include “SHOULD” checks, set the strict argument to True. For example, the name field of a Data Package “SHOULD” not contain special characters. So running check() in strict mode (strict=True) on the following properties would output an issue.

package_properties = {
    "name": "Woolly Dormice (Toros Dağları)",
    "title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
    "description": dedent("""
        This scoping review explores the hibernation physiology of the
        woolly dormouse, drawing on data collected over a 10-year period
        along the Taurus Mountain range in Turkey.
        """),
    "id": "123-abc-123",
    "created": "2014-05-14T05:00:01+00:00",
    "version": "1.0.0",
    "licenses": [{"name": "odc-pddl"}],
    "resources": [
        {
            "name": "woolly-dormice-2015",
            "title": "Body fat percentage in the hibernating woolly dormouse",
            "path": "resources/woolly-dormice-2015/data.parquet",
        }
    ],
}

cdp.check(properties=package_properties, strict=True)