PAL-flavoured Datatree#

The xarray Datatree is used as the core data structure for SwarmPAL. You can think of this like a file directory (a tree) which contains an arbitrary number of related xarray datasets. Data can be fetched from different resources (including VirES) and stored in a Datatree.

PalDataItem provides tools to construct an xarray.Dataset from different sources (VirES, HAPI, etc). create_paldata helps to construct a Datatree from a set of those datasets.

import datetime as dt

Defining and running a `PalProcess`#

A process can be defined which will act on datatrees obtained as above. Define processes by subclassing the abstract PalProcess class.

from swarmpal.io import PalProcess

help(PalProcess)

Help on class PalProcess in module swarmpal.io._paldata:

class PalProcess(abc.ABC)
 |  PalProcess(config: 'dict | None' = None, active_tree: 'str' = '/', inplace: 'bool' = True)
 |  
 |  Abstract class to define processes to act on datatrees
 |  
 |  Method resolution order:
 |      PalProcess
 |      abc.ABC
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __call__(self, datatree) -> 'DataTree'
 |      Run the process, defined in _call, to update the datatree
 |  
 |  __init__(self, config: 'dict | None' = None, active_tree: 'str' = '/', inplace: 'bool' = True)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  set_config(self, **kwargs) -> 'None'
 |  
 |  ----------------------------------------------------------------------
 |  Readonly properties defined here:
 |  
 |  active_tree
 |      Defines which branch of the datatree will be used
 |  
 |  process_name
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  config
 |      Dictionary that configures the process behaviour
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __abstractmethods__ = frozenset({'_call', 'process_name', 'set_config'...

Here is an example of defining a process. Still subject to change!

Three methods must be set:

process_name identifies the process, and is used to update the "PAL_meta" attribute in the datatree when the process is applied.
set_config takes keyword arguments and stores them as a dict in the config property.
_call defines the behaviour of the process itself, and should accept the input datatree and return a modified datatree

When a process object is instantiated, the user optionally provides two arguments which are set as properties of the process

active_tree (str) selects which branch of the tree is to be used
config (dict) provides parameters to control the behaviour of the process

The config can also be provided using .set_config() after the process object is created. This enables the process to provide and document default configurations, as well allowing the IDE to provide hints for what configuration is available.

from datatree import DataTree
from xarray import Dataset


class MyProcess(PalProcess):
    """Compute the first differences on a given variable"""

    @property
    def process_name(self):
        return "MyProcess"

    def set_config(self, dataset="SW_OPER_MAGA_LR_1B", parameter="B_NEC"):
        self.config = dict(dataset=dataset, parameter=parameter)

    def _call(self, datatree):
        # Identify inputs for algorithm
        subtree = datatree[f"{self.config.get('dataset')}"]
        dataset = subtree.ds
        parameter = self.config.get("parameter")
        # Apply the algorithm
        output_data = dataset[parameter].diff(dim="Timestamp")
        # Create an output dataset
        data_out = Dataset(
            data_vars={
                f"d/dt ({parameter})": output_data,
            }
        )
        # Write the output into a new path in the datatree and return it
        subtree["output"] = DataTree(data=data_out)
        return datatree

The process can now be created with some configuration:

process = MyProcess(
    config={"dataset": "SW_OPER_MAGA_LR_1B", "parameter": "B_NEC"},
)

…and there is a tool to apply this process to the datatree:

data = data.swarmpal.apply(process)
print(data)

DataTree('paldata', parent=None)
│   Dimensions:  ()
│   Data variables:
│       *empty*
│   Attributes:
│       PAL_meta:  {"MyProcess": {"dataset": "SW_OPER_MAGA_LR_1B", "parameter": "...
└── DataTree('SW_OPER_MAGA_LR_1B')
    │   Dimensions:     (Timestamp: 10800, NEC: 3)
    │   Coordinates:
    │     * Timestamp   (Timestamp) datetime64[ns] 2016-01-01 ... 2016-01-01T02:59:59
    │     * NEC         (NEC) <U1 'N' 'E' 'C'
    │   Data variables:
    │       Spacecraft  (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
    │       B_NEC_IGRF  (Timestamp, NEC) float64 -1.578e+03 -1.031e+04 ... -2.564e+04
    │       Radius      (Timestamp) float64 6.834e+06 6.834e+06 ... 6.833e+06 6.833e+06
    │       Latitude    (Timestamp) float64 -72.5 -72.56 -72.63 ... -44.9 -44.97 -45.03
    │       B_NEC       (Timestamp, NEC) float64 -1.581e+03 -1.049e+04 ... -2.564e+04
    │       Longitude   (Timestamp) float64 92.79 92.82 92.85 ... 41.83 41.83 41.83
    │   Attributes:
    │       Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
    │       MagneticModels:  ['IGRF = IGRF(max_degree=13,min_degree=1)']
    │       AppliedFilters:  []
    │       PAL_meta:        {"analysis_window": ["2016-01-01T00:00:00", "2016-01-01T...
    └── DataTree('output')
            Dimensions:       (Timestamp: 10799, NEC: 3)
            Coordinates:
              * Timestamp     (Timestamp) datetime64[ns] 2016-01-01T00:00:01 ... 2016-01-...
              * NEC           (NEC) <U1 'N' 'E' 'C'
            Data variables:
                d/dt (B_NEC)  (Timestamp, NEC) float64 -26.66 0.796 3.476 ... -10.19 -12.09

The resulting data can be interrogated with the usual tools (in this case we added a new dataset to the tree under "SW_OPER_MAGA_LR_1B/output"):

data["SW_OPER_MAGA_LR_1B/output"].ds["d/dt (B_NEC)"].plot.line(x="Timestamp");

../../_images/d66c6c5aa171d3cf5e7f0e558f669222a83453328deca2fbca8c0efdcfb0f839.png

… and the datatree carries with it the metadata about the process which has been applied:

data.swarmpal.pal_meta

{'.': {'MyProcess': {'dataset': 'SW_OPER_MAGA_LR_1B', 'parameter': 'B_NEC'}},
 'SW_OPER_MAGA_LR_1B': {'analysis_window': ['2016-01-01T00:00:00',
   '2016-01-01T03:00:00'],
  'magnetic_models': {'IGRF': 'IGRF(max_degree=13,min_degree=1)'}},
 'SW_OPER_MAGA_LR_1B/output': {}}

More tricks with `create_paldata`#

Fetching data from HAPI#

Two differences from using VirES:

Parameters follow the scheme in hapiclient
Example: http://hapi-server.org/servers/#server=VirES-for-Swarm&dataset=SW_OPER_MAGA_LR_1B&parameters=B_NEC&start=2016-01-01T00:00:00&stop=2016-01-01T03:00:00&return=script&format=python
The output dataset is not identical to that retrieved from VirES (variables and their content are the same, but less metadata etc)

data_params = dict(
    server="https://vires.services/hapi",
    dataset="SW_OPER_MAGA_LR_1B",
    parameters="B_NEC",
    start="2016-01-01T00:00:00",
    stop="2016-01-01T03:00:00",
)
data_hapi = create_paldata(alpha_hapi=PalDataItem.from_hapi(**data_params))
print(data_hapi)

DataTree('paldata', parent=None)
└── DataTree('alpha_hapi')
        Dimensions:    (Timestamp: 10800, B_NEC_dim1: 3)
        Coordinates:
          * Timestamp  (Timestamp) datetime64[ns] 2016-01-01 ... 2016-01-01T02:59:59
        Dimensions without coordinates: B_NEC_dim1
        Data variables:
            B_NEC      (Timestamp, B_NEC_dim1) float64 -1.581e+03 ... -2.564e+04
        Attributes:
            PAL_meta:  {"analysis_window": ["2016-01-01T00:00:00", "2016-01-01T03:00:...

/home/docs/checkouts/readthedocs.org/user_builds/swarmpal/envs/latest/lib/python3.11/site-packages/hapiclient/hapitime.py:287: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  Time = pandas.to_datetime(Time, infer_datetime_format=True).tz_convert(tzinfo).to_pydatetime()

Time padding#

A tuple of timedelta can be given as an extra parameter. This extends the retrieved time interval, while storing the original time interval in "analysis_window" within the "Pal_meta" attribute.

data_params = dict(
    server="https://vires.services/hapi",
    dataset="SW_OPER_MAGA_LR_1B",
    parameters="B_NEC",
    start="2016-01-01T00:00:00",
    stop="2016-01-01T03:00:00",
    pad_times=(dt.timedelta(hours=1), dt.timedelta(hours=1)),
)
data_hapi = create_paldata(alpha_hapi=PalDataItem.from_hapi(**data_params))
print(data_hapi)

DataTree('paldata', parent=None)
└── DataTree('alpha_hapi')
        Dimensions:    (Timestamp: 18000, B_NEC_dim1: 3)
        Coordinates:
          * Timestamp  (Timestamp) datetime64[ns] 2015-12-31T23:00:00 ... 2016-01-01T...
        Dimensions without coordinates: B_NEC_dim1
        Data variables:
            B_NEC      (Timestamp, B_NEC_dim1) float64 2.08e+04 -2.121e+03 ... 4.618e+04
        Attributes:
            PAL_meta:  {"analysis_window": ["2016-01-01T00:00:00", "2016-01-01T03:00:...

/home/docs/checkouts/readthedocs.org/user_builds/swarmpal/envs/latest/lib/python3.11/site-packages/hapiclient/hapitime.py:287: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  Time = pandas.to_datetime(Time, infer_datetime_format=True).tz_convert(tzinfo).to_pydatetime()

PAL-flavoured Datatree

Contents

PAL-flavoured Datatree#

Fetching data#

from VirES API#

`swarmpal` accessor#

Defining and running a `PalProcess`#

More tricks with `create_paldata`#

Fetching data from HAPI#

Time padding#

PAL-flavoured Datatree

Contents

PAL-flavoured Datatree#

Fetching data#

from VirES API#

swarmpal accessor#

Defining and running a PalProcess#

More tricks with create_paldata#

Fetching data from HAPI#

Time padding#

`swarmpal` accessor#

Defining and running a `PalProcess`#

More tricks with `create_paldata`#