Customizing the SOLARNET Schema

Overview

The SOLARNETSchema class provides an interface to configure how metadata attributes are formatted in solar data products. The class represents a schema for metadata attribute requirements, validation, and formatting.

It is important to understand the configuration options of SOLARNETSchema objects in order to attain the desired behavior of metadata attributes.

The SOLARNETSchema class has two main properties:

  • The class contains an attribute_schema property which configures the metadata attributes.

  • Second, the class contains a default_attributes property which provides default values for attributes when none are specified.

This guide details the format of the schema, how it’s used, and how you can extend or modify it to meet your specific needs.

The schema is loaded from YAML (dict-like) files which can be combined to layer multiple schema elements into a single unified schema. This allows extensions and overrides to the default schema, and allows you to create new schema configurations for specific file types and specific metadata requirements.

Creating a SOLARNET Schema

Creating a SOLARNETSchema object optionally includes passing one or more paths to schema files to layer on top of one another, and optionally whether to use the default base layer schema file.

You can specify the schema layers as a list of file paths (List[pathlib.Path | str]), and whether to use the default schema layer by setting the use_defaults parameter to True or False.

Here is an example of instantiation of a SOLARNETSchema object:

schema_layers = ["my_schema_layer_1.yaml", "my_schema_layer_2.yaml"]
my_schema = SOLARNETSchema(
  schema_layers=schema_layers,
  use_defaults=True
)

This will create a new schema object using the default SOLARNET schema, and will overlay the layer_2 file over the layer_1 file. If there are no conflicts within the schema files, then their attributes will be merged, to create a superset of the two files. If there are conflicts in the combination of schema layers, this is resolved in a latest-priority ordering. That is, if there are conflicts or duplicate keys in layer_1 that also appear in layer_2, then the second layer will overwrite the values from the first layer in the resulting schema.

Attribute Schema Format

The SOLARNET attribute schema is used to define metadata requirements for solar data files. The schema is configured through YAML files, with the default configuration in solarnet_metadata/data/SOLARNET_attr_schema.yaml

The YAML file has two main sections:

  1. The attribute_key section: contains a dictionary of attribute information, keyed by the metadata attribute name.

  2. The conditional_requirements section: defines attributes that are required based on specific conditions.

Here’s an example of the schema file format:

attribute_key:
  attribute_name:
    data_type: <string> # one of ['int', 'float', 'str', 'date', 'bool']
    default: <Any> | null
    description: >
      Include a meaningful description of the attribute and context needed to understand its values.
    human_readable: <string> # Provides a default value for the keyword comment
    required: <string> # one of ['all', 'primary', 'obs', 'optional']
    valid_values: optional[list]
    pattern: optional[string]  # For attributes with variable indices (e.g., NAXISn, CTYPEia)
    origin: <string> # one of ['N', 'S', 'P', 'O'] (for more information, see Section 19)
conditional_requirements:
  - condition_type: <string>
    condition_key: <string>
    condition_value: optional[string]
    required_attributes: <list>

Each of the keys for the attribute_key section is defined in the table below:

Attribute Schema Keys

Schema Key

Description

Data Type

Is Required?

attribute_name

the name of the metadata attribute as it should appear in your data products

str

True

data_type

the expected data type of the attribute (int, float, str, date, bool)

str

True

default

a default value for the attribute if needed/desired

varies or null

True

description

a description for the metadata attribute and context needed to understand its values

str

True

human_readable

a human-readable version of the attribute name.

str

True

required

whether the attribute is required in your data products (all, primary, obs, optional)

various

True

valid_values

values that the attribute should be checked against

list or null

False

pattern

regular expression pattern for attributes with variable indices (e.g., NAXISn, CTYPEia)

str

False

origin

indicates the origin of the keyword attribute. For more information, see 19 Alphabetical listing of all keywords with section references

str

True

The conditional_requirements section defines when certain attributes are required based on other attribute values:

This functionality allows you to specify that certain attributes are only required when specific conditions are met, such as the value of another attribute. This section is still under development and is planned to be expanded in future releases.

Conditional Requirements Schema

Schema Key

Description

Data Type

Is Required?

condition_type

the type of condition that must be met (e.g., ‘attribute_value’)

str

True

condition_key

the attribute name that the condition requirement is based on

str

True

condition_value

the value that the condition requirement is based on

str or null

True

required_attributes

a list of attribute names that are required if the condition is met

list[str]

True

For example, the schema includes conditional requirements based on observatory type:

- condition_type: 'attribute_value'
  condition_key: 'OBS_TYPE'
  condition_value: 'ground-based'
  required_attributes:
    - OBSGEO-X
    - OBSGEO-Y
    - OBSGEO-Z

This specifies that when OBS_TYPE==ground-based, the OBSGEO-X, OBSGEO-Y, and OBSGEO-Z attributes are required.

Creating and Using Attribute Files

You can create your own schema files to extend or override the default schema. YAML syntax allows for complex data structures like anchors and aliases to create reusable components.

# Example of custom schema extension
attribute_key:
  CUSTOM_ATTR:
    data_type: str
    default: custom value
    description: A custom attribute for my specific application
    human_readable: Custom Attribute
    required: optional
    valid_values:
    - value1
    - value2
    - value3

conditional_requirements:
  - condition_type: "equals"
    condition_key: "INST_TYP"
    condition_value: "Custom_Instrument"
    required_attributes: ["CUSTOM_ATTR"]

You can then load this custom schema along with the defaults:

custom_schema = Path("custom_schema.yaml")
schema = SOLARNETSchema(
  schema_layers=[custom_schema],
  use_defaults=True
)

More information on YAML syntax.