Interface

This section of the documentation focuses on the design of the interface and how it is implemented as Python code.

Note

We use symbols to indicate parts of the design that are actively being worked on, done, or planned (see table below). For work that is planned or is in progress, we include the function signatures and docstrings to clarify the design. Once the interface is implemented (done), we will remove the signatures from the documentation and point to the reference documentation instead. The symbols we use are described in the table below.

A table showing the symbols used to indicate the status of interface components, along with their descriptions.
Status Description

Interface that has been implemented.

Interface that is currently being worked on.

Interface that is planned, but isn’t being worked on currently.

Inputs

This section describes the inputs accepted by check-datapackage.

  • Properties: This is a Python dictionary containing the properties of a Data Package. Allowed properties are defined in the Data Package standard. A common use case is loading the properties from the datapackage.json file.
  • Config: A Config object can optionally be passed to check-datapackage with settings to modify the behaviour and output of the check mechanism.
  • Error: A boolean that controls whether the function errors out or returns a value if any issues are found.

Outputs

For each failed check, check-datapackage flags the corresponding issue in the properties. If all checks pass, an empty list is returned.

  • Default mode: The issues are returned as a list without stopping program execution.
  • Error mode: The issues result in errors that stop program execution.

Functions

There are two main functions and two helper functions that we expose in this package. Together, these functions could be combined to create a command-line interface.

check()

This is the main function of this package, which will check a Data Package properties against the Data Package standard, as well as include any custom exclusions or custom checks listed in the configuration.

See the help documentation with help(check) for more details.

The Config class is described below. By default, check() does not result in an error when issues are found, but errors can be triggered by setting error=True. We don’t want check-datapackage to trigger errors by default because we want to allow users the flexibility to decide when and how failed checks should be enforced or handled. The output of this function is a list of Issue objects, which are described below.

explain()

The output of check() is a list of Issue objects, which are structured and machine-readable, but not very human-readable and user-friendly. It’s important to have this output to provide structured information about the issues, but we also want to provide a way to explain these issues in a more pleasant and ergonomic way.

def explain(issues: list[Issue]) -> list[str]:
    """Explain a list of issues in a user-friendly way.

    Args:
        issues: A list of `Issue` objects representing issues found while
            checking a Data Package properties.

    Returns:
        A list of user-friendly, human-readable messages
            explaining each issue.
    """

read_json()

This is a simple helper function to read the datapackage.json file into a Python dictionary. See help(read_json) for more details.

read_config()

This is a simple helper function to read a configuration file into a Config object for when we extend to having the configurations in a file.

def read_config(path: Path) -> Config:
    """Reads a configuration file into a `Config` object.

    Args:
        path: The path to the configuration file.

    Returns:
        A `Config` object populated with the contents of the configuration
            file.
    """

Classes

With all the configurations kept in one class, we can potentially store the configuration in a file that can be read into the class, which would be useful for a CLI interface.

Config

Config is a class that holds all the configurations for the check() function.

See the help documentation with help(Config) for more details.

Exclusion

A sub-item of Config that expresses checks to exclude. This can be useful if you want to exclude (or skip) certain checks from the Data Package standard that are not relevant to your use case.

See the help documentation with help(Exclusion) for more details.

Extensions

This sub-item of Config defines extensions, i.e., additional checks that supplement those specified by the Data Package standard.

See the help documentation with Extensions for more details.

RequiredCheck

A sub-item of Extensions that allows users to set specific properties as required that are not required by the Data Package standard. See the help documentation with help(RequiredCheck) for more details.

CustomCheck

A sub-item of Extensions that allows users to add an additional, custom check that check-datapackage will run alongside the standard checks. See the help documentation with help(CustomCheck) for more details.

Issue

This class represents an issue that is found when checking a Data Package.

See the help documentation with help(Issue) for more details.

Configuration file

When we develop the CLI, we’ll use a config file to store the settings contained within the Config class. This file will be named .cdp.toml and will be located in the same directory as the datapackage.json file. This is an example of what that file could look like:

# The Data Package standard version to check against.
version = "v2"

# Whether to check properties that must *and should* be included.
strict = true

# Exclude all issues related to the "resources" property.
[[exclusions]]
jsonpath = "$.resources"

# Exclude all issues related to the "format" type in the schema.
[[exclusions]]
type = "format"

# Exclude issues that are both a "pattern" type and found in
# the "path" property of the "contributors" field.
[[exclusions]]
jsonpath = "$.contributors[*].path"
type = "pattern"

# Require that the "description" property is included in the Data Package.
[[extensions.required_checks]]
jsonpath = "$.description"
message = "This Data Package needs to include a 'description' property."

# A custom check to ensure that all resource names are lowercase.
[[extensions.custom_checks]]
jsonpath = "$.resources[*].name"
type = "name-lowercase"
message = "The value in the 'name' property of the 'resources' must be lowercase."
check = "lambda name: name.islower()"

Flow

This is the potential flow of using check-datapackage:

flowchart TD
    descriptor_file[(datapackage.json)]
    read_json["read_json()"]
    properties[/"Properties<br>(dict)"/]

    config_file[(.cdp.toml)]
    read_config["read_config()"]

    config[/Config/]
    custom_check[/CustomCheck/]
    exclusion[/Exclusion/]
    check["check()"]
    issues[/"list[Issue]"/]

    explain["explain()"]
    messages[/messages/]

    descriptor_file --> read_json --> properties
    config_file --> read_config --> config
    custom_check & exclusion --> config
    custom_check & exclusion -.-> config_file

    properties & config --> check --> issues --> explain --> messages
Figure 1: Flow of functions and classes when using check-datapackage.