from textwrap import dedent
import check_datapackage as cdp
exclusion_required = cdp.Exclusion(type="required")Configuring the checks
You can pass a Config object to check() to customise the checks done on your Data Package’s properties. The following configuration options are available:
version: The version of Data Package standard to check against. Defaults tov2.exclusions: A list of checks to exclude.custom_checks: The list of custom checks to run in addition to the checks defined in the standard.strict: Whether to include “SHOULD” checks in addition to “MUST” checks. Defaults toFalse.
The Data Package standard uses language from RFC 2119 to define its specifications. They use “MUST” for required properties and “SHOULD” for properties that should be included but are not strictly required. We try to match this language in check-datapackage by using the terms “MUST” and “SHOULD”, though we also use “required” for “MUST” in our documentation.
Excluding checks
You can exclude checks based on their type and the fields they apply to.
The Data Package standard defines a range of check types (e.g., required or pattern) and it is also possible to create your own. For example, to exclude checks flagging missing fields, you would exclude the required check by defining an Exclusion object with this type:
To exclude checks of a specific field or fields, you can use a JSON path in the jsonpath attribute of an Exclusion object. For example, you can exclude all checks on the name field of the Data Package properties by writing:
exclusion_name = cdp.Exclusion(jsonpath="$.name")Or you can use the wildcard JSON path selector to exclude checks on the path field of all Data Resource properties:
exclusion_path = cdp.Exclusion(jsonpath="$.resources[*].path")The type and jsonpath arguments can also be combined:
exclusion_desc_required = cdp.Exclusion(type="required", jsonpath="$.resources[*].description")This will exclude required checks on the description field of Data Resource properties.
To apply your exclusions when running the check(), you add them to the Config object passed to the check() function:
package_properties = {
"name": 123,
"title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
"id": "123-abc-123",
"created": "2014-05-14T05:00:01+00:00",
"version": "1.0.0",
"licenses": [{"name": "odc-pddl"}],
"resources": [
{
"name": "woolly-dormice-2015",
"title": "Body fat percentage in the hibernating woolly dormouse",
"path": "https://en.wikipedia.org/wiki/Woolly_dormouse",
}
],
}
config = cdp.Config(exclusions=[exclusion_required, exclusion_name, exclusion_path])
cdp.check(properties=package_properties, config=config)[]
In the example above, we would expect four Issue items: the package name is a number, the required description field is missing in both the package and resource properties, and the resource path doesn’t point to a data file. However, as we have defined exclusions for all of these, the function will flag no issues.
Adding custom checks
It is possible to create custom checks in addition to the ones defined in the Data Package standard.
Let’s say your organisation only accepts Data Packages licensed under MIT. You can express this requirement in a CustomCheck as follows:
license_check = cdp.CustomCheck(
type="only-mit",
jsonpath="$.licenses[*].name",
message=dedent("""
Data Packages may only be licensed under MIT. Please review
the licenses listed in the Data Package.
"""),
check=lambda license_name: license_name == "mit",
)Here’s a breakdown of what each argument does:
type: An identifier for your custom check. This is what will show up in error messages and what you will use if you want to exclude your check. EachCustomCheckshould have a uniquetype.jsonpath: The location of the field or fields the custom check applies to, expressed in JSON path notation. This check applies to thenamefield of all package licenses.message: The message that is shown when the check is violated.check: A function that expresses the custom check. It takes the value at thejsonpathlocation as input and returns true if the check is met, false if it isn’t.
To register your custom checks with the check() function, you add them to the Config object passed to the function:
package_properties = {
"name": "woolly-dormice",
"title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
"description": dedent("""
This scoping review explores the hibernation physiology of the
woolly dormouse, drawing on data collected over a 10-year period
along the Taurus Mountain range in Turkey.
"""),
"id": "123-abc-123",
"created": "2014-05-14T05:00:01+00:00",
"version": "1.0.0",
"licenses": [{"name": "odc-pddl"}, {"name": "mit"}],
"resources": [
{
"name": "woolly-dormice-2015",
"title": "Body fat percentage in the hibernating woolly dormouse",
"path": "resources/woolly-dormice-2015/data.parquet",
}
],
}
config = cdp.Config(custom_checks=[license_check])
cdp.check(properties=package_properties, config=config)We can see that the custom check was applied: check() returned one issue flagging the first license attached to the Data Package.
Strict mode
The Data Package standard includes properties that “MUST” and “SHOULD” be included and/or have a specific format in a compliant Data Package. By default, check() only the check() function only includes “MUST” checks. To include “SHOULD” checks, set the strict argument to True. For example, the name field of a Data Package “SHOULD” not contain special characters. So running check() in strict mode (strict=True) on the following properties would output an issue.
package_properties = {
"name": "Woolly Dormice (Toros Dağları)",
"title": "Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
"description": dedent("""
This scoping review explores the hibernation physiology of the
woolly dormouse, drawing on data collected over a 10-year period
along the Taurus Mountain range in Turkey.
"""),
"id": "123-abc-123",
"created": "2014-05-14T05:00:01+00:00",
"version": "1.0.0",
"licenses": [{"name": "odc-pddl"}],
"resources": [
{
"name": "woolly-dormice-2015",
"title": "Body fat percentage in the hibernating woolly dormouse",
"path": "resources/woolly-dormice-2015/data.parquet",
}
],
}
cdp.check(properties=package_properties, strict=True)