WARNING! THIS PACKAGE IS IN ACTIVE DEVELOPMENT AND IS NOT YET STABLE!

Quickstart#

There are two main things to understand in SwarmPAL, data and processes. Data live within a xarray DataTree (type DataTree), and processes are callable objects, that is, they behave like functions (and are of type PalProcess). Processes act on data to transform them by adding derived parameters into the data object.

Fetching data#

Data are pulled in over the web and organised as a DataTree, which is done using create_paldata and PalDataItem:

from swarmpal.io import create_paldata, PalDataItem

data = create_paldata(
    PalDataItem.from_vires(
        server_url="https://vires.services/ows",
        collection="SW_OPER_MAGA_LR_1B",
        measurements=["B_NEC"],
        start_time="2020-01-01T00:00:00",
        end_time="2020-01-01T03:00:00",
        options=dict(asynchronous=False, show_progress=False),
    )
)
print(data)
DataTree('paldata', parent=None)
└── DataTree('SW_OPER_MAGA_LR_1B')
        Dimensions:     (Timestamp: 10800, NEC: 3)
        Coordinates:
          * Timestamp   (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
          * NEC         (NEC) <U1 'N' 'E' 'C'
        Data variables:
            Spacecraft  (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
            B_NEC       (Timestamp, NEC) float64 3.137e+03 388.2 ... 2.519e+03 4.077e+04
            Longitude   (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
            Latitude    (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
            Radius      (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
        Attributes:
            Sources:         ['SW_PREL_MAGA_LR_1B_20200101T000000_20200101T235959_060...
            MagneticModels:  []
            AppliedFilters:  []
            PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...

Now you can skip ahead to Applying Processes, or read on to learn more about data…

Swarm data are fetched from the VirES service, and SwarmPAL uses the Python package viresclient underneath to transfer and load the data. Similarly, any HAPI server can also be used, where hapiclient is used underneath.

create_paldata and PalDataItem have a few features for flexible use:

  • Pass multiple items to create_paldata to assemble a complex datatree. Pass them as keyword arguments (e.g. HAPI_SW_OPER_MAGA_LR_1B=... below) if you want to manually change the name in the datatree, otherwise they will default to the collection/dataset name.

  • Use .from_vires() and .from_hapi() to fetch data from different services. Note that the argument names and usage are a bit different (though equivalent) in each case. These follow the nomenclature used in viresclient and hapiclient respectively.

data = create_paldata(
    PalDataItem.from_vires(
        server_url="https://vires.services/ows",
        collection="SW_OPER_MAGA_LR_1B",
        measurements=["B_NEC"],
        start_time="2020-01-01T00:00:00",
        end_time="2020-01-01T03:00:00",
        options=dict(asynchronous=False, show_progress=False),
    ),
    HAPI_SW_OPER_MAGA_LR_1B=PalDataItem.from_hapi(
        server="https://vires.services/hapi",
        dataset="SW_OPER_MAGA_LR_1B",
        parameters="Latitude,Longitude,Radius,B_NEC",
        start="2020-01-01T00:00:00",
        stop="2020-01-01T03:00:00",
    ),
)
print(data)
DataTree('paldata', parent=None)
├── DataTree('SW_OPER_MAGA_LR_1B')
│       Dimensions:     (Timestamp: 10800, NEC: 3)
│       Coordinates:
│         * Timestamp   (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
│         * NEC         (NEC) <U1 'N' 'E' 'C'
│       Data variables:
│           Spacecraft  (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
│           B_NEC       (Timestamp, NEC) float64 3.137e+03 388.2 ... 2.519e+03 4.077e+04
│           Longitude   (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
│           Latitude    (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
│           Radius      (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
│       Attributes:
│           Sources:         ['SW_PREL_MAGA_LR_1B_20200101T000000_20200101T235959_060...
│           MagneticModels:  []
│           AppliedFilters:  []
│           PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...
└── DataTree('HAPI_SW_OPER_MAGA_LR_1B')
        Dimensions:    (Timestamp: 10800, B_NEC_dim1: 3)
        Coordinates:
          * Timestamp  (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
        Dimensions without coordinates: B_NEC_dim1
        Data variables:
            Latitude   (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
            Longitude  (Timestamp) float64 103.0 103.1 103.1 103.2 ... 49.91 49.91 49.91
            Radius     (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
            B_NEC      (Timestamp, B_NEC_dim1) float64 3.137e+03 388.2 ... 4.077e+04
        Attributes:
            PAL_meta:  {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T03:00:...
/home/docs/checkouts/readthedocs.org/user_builds/swarmpal/envs/latest/lib/python3.11/site-packages/hapiclient/hapitime.py:287: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  Time = pandas.to_datetime(Time, infer_datetime_format=True).tz_convert(tzinfo).to_pydatetime()

While you can learn more about using datatrees on the xarray documentation, this should not be necessary for basic usage of SwarmPAL. If you are familiar with xarray, you can access a dataset by browsing the datatree like a dictionary, then using either the .ds accessor to get an immutable view of the dataset, or .to_dataset() to extract a mutable copy.

data["SW_OPER_MAGA_LR_1B"].ds
<xarray.DatasetView>
Dimensions:     (Timestamp: 10800, NEC: 3)
Coordinates:
  * Timestamp   (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
  * NEC         (NEC) <U1 'N' 'E' 'C'
Data variables:
    Spacecraft  (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
    B_NEC       (Timestamp, NEC) float64 3.137e+03 388.2 ... 2.519e+03 4.077e+04
    Longitude   (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
    Latitude    (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
    Radius      (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
Attributes:
    Sources:         ['SW_PREL_MAGA_LR_1B_20200101T000000_20200101T235959_060...
    MagneticModels:  []
    AppliedFilters:  []
    PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...

Using the VirES API, there are additional things that can be requested outwith the original dataset (models and auxiliaries). See the viresclient documentation for details, or Swarm Notebooks for more examples. The extra options below specifies an extendable dictionary of special options which are passed to viresclient. In this case we specify asynchronous=False to process the request synchronously (faster, but will fail for longer requests), and disable the progress bars with show_progress=False.

data = create_paldata(
    PalDataItem.from_vires(
        server_url="https://vires.services/ows",
        collection="SW_OPER_MAGA_LR_1B",
        measurements=["B_NEC"],
        models=["IGRF"],
        auxiliaries=["QDLat", "MLT"],
        start_time="2020-01-01T00:00:00",
        end_time="2020-01-01T03:00:00",
        options=dict(asynchronous=False, show_progress=False),
    )
)

Applying Processes#

A process is a special object type you can import from different toolboxes in SwarmPAL.

First we import the relevant toolbox and create a process from the .processes submodule:

from swarmpal.toolboxes import tfa

process = tfa.processes.Preprocess()

Each process has a .set_config() method which configures the behaviour of the process:

help(process.set_config)
Help on method set_config in module swarmpal.toolboxes.tfa.processes:

set_config(dataset: 'str' = '', timevar: 'str' = 'Timestamp', active_variable: 'str' = '', active_component: 'int | None' = None, sampling_rate: 'float' = 1, remove_model: 'bool' = False, model: 'str' = '', convert_to_mfa: 'bool' = False, use_magnitude: 'bool' = False, clean_by_flags: 'bool' = False, flagclean_varname: 'str' = '', flagclean_flagname: 'str' = '', flagclean_maxval: 'int | None' = None) -> 'None' method of swarmpal.toolboxes.tfa.processes.Preprocess instance
    Set the process configuration
    
    Parameters
    ----------
    dataset : str
        Selects this dataset from the datatree
    timevar : str
        Identifies the name of the time variable, usually "Timestamp" or "Time"
    active_variable : str
        Selects the variable to use from within the dataset
    active_component : int, optional
        Selects the component to use (if active_variable is a vector)
    sampling_rate : float, optional
        Identify the sampling rate of the data input (in Hz), by default 1
    remove_model : bool, optional
        Remove a magnetic model prediction or not, by default False
    model : str, optional
        The name of the model
    convert_to_mfa : bool, optional
        Rotate B to mean-field aligned (MFA) coordinates, by default False
    use_magnitude : bool, optional
        Use the magnitude of a vector instead, by default False
    clean_by_flags : bool, optional
        Whether to apply additional flag cleaning or not, by default False
    flagclean_varname : str, optional
        Name of the variable to clean
    flagclean_flagname : str, optional
        Name of the flag to use to clean by
    flagclean_maxval : int, optional
        Maximum allowable flag value
    
    Notes
    -----
    Some special ``active_variable`` names exist which are added to the dataset on-the-fly:
    
    * "B_NEC_res_Model"
        where a model prediction must be available in the data, like ``"B_NEC_<Model>"``, and ``remove_model`` has been set. The name of the model can be set with, for example, ``model="CHAOS"``.
    * "B_MFA"
        when ``convert_to_mfa`` has been set.
    * "Eh_XYZ" and "Ev_XYZ"
        when using the TCT datasets, with vectors defined in ``("Ehx", "Ehy", "Ehz")`` and ``("Evx", "Evy", "Evz")`` respectively.
process.set_config(
    dataset="SW_OPER_MAGA_LR_1B",
    timevar="Timestamp",
    active_variable="B_NEC",
    active_component=0,
)

Processes are callable, which means they can be used like functions. They act on datatrees to alter them. We can use this process on the the data we built above.

data = process(data)
print(data)
DataTree('paldata', parent=None)
│   Dimensions:  ()
│   Data variables:
│       *empty*
│   Attributes:
│       PAL_meta:  {"TFA_Preprocess": {"dataset": "SW_OPER_MAGA_LR_1B", "timevar"...
└── DataTree('SW_OPER_MAGA_LR_1B')
        Dimensions:       (Timestamp: 10800, NEC: 3, TFA_Time: 10800)
        Coordinates:
          * Timestamp     (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
          * NEC           (NEC) <U1 'N' 'E' 'C'
          * TFA_Time      (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
        Data variables:
            Spacecraft    (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
            B_NEC         (Timestamp, NEC) float64 3.137e+03 388.2 ... 4.077e+04
            Longitude     (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
            MLT           (Timestamp) float64 6.569 6.571 6.574 ... 6.172 6.173 6.174
            Radius        (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06
            Latitude      (Timestamp) float64 76.7 76.76 76.82 ... 51.04 51.11 51.17
            B_NEC_IGRF    (Timestamp, NEC) float64 3.139e+03 404.4 ... 4.077e+04
            QDLat         (Timestamp) float64 71.96 72.02 72.08 ... 47.41 47.47 47.54
            TFA_Variable  (TFA_Time) float64 3.137e+03 3.116e+03 ... 1.553e+04 1.55e+04
        Attributes:
            Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
            MagneticModels:  ['IGRF = IGRF(max_degree=13,min_degree=1)']
            AppliedFilters:  []
            PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...

The data has been modified, in this case adding a new data variable called TFA_Variable. We can inspect it using the usual xarray/matplotlib tooling, for example:

data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"]
<xarray.DataArray 'TFA_Variable' (TFA_Time: 10800)>
array([ 3136.67477723,  3116.43557451,  3096.53316807, ...,
       15560.98496414, 15531.18166644, 15501.39554679])
Coordinates:
  * TFA_Time  (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
Attributes:
    units:        nT
    description:  Magnetic field vector, NEC frame
data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"].plot.line(x="TFA_Time");
../_images/98ab94d5ec0a11773064d68babfec16f844ea58fc9668c56f358ff889be57f13.png

… but in this case, the TFA toolbox has additional tools for inspecting data:

tfa.plotting.time_series(data);
../_images/adc12a7b36e6b22f119d5b39de106c06f4eb2878facec946a1f76c116d28e373.png

Saving/loading data#

Since data is just a normal datatree, we can use the usual xarray tools to write and read files. Some situations this might be useful in are:

  • Saving preprocessed (i.e. interim) data, then later reloading it for further processing. One might download a whole series of data, then in a second, more iterative workflow, analyse it (without having to wait again for the download)

  • Saving the output of a process to use in other tools

  • Saving the output of a process to later reload just for visualisation

from os import remove
from datatree import open_datatree

# Save the file as NetCDF
data.to_netcdf("testdata.nc")
# Load the data as a new datatree
reloaded_data = open_datatree("testdata.nc")
# Remove that file we just made
remove("testdata.nc")
print(reloaded_data)
DataTree('None', parent=None)
│   Dimensions:  ()
│   Data variables:
│       *empty*
│   Attributes:
│       PAL_meta:  {"TFA_Preprocess": {"dataset": "SW_OPER_MAGA_LR_1B", "timevar"...
└── DataTree('SW_OPER_MAGA_LR_1B')
        Dimensions:       (Timestamp: 10800, NEC: 3, TFA_Time: 10800)
        Coordinates:
          * Timestamp     (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
          * NEC           (NEC) <U1 'N' 'E' 'C'
          * TFA_Time      (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
        Data variables:
            Spacecraft    (Timestamp) <U1 ...
            B_NEC         (Timestamp, NEC) float64 ...
            Longitude     (Timestamp) float64 ...
            MLT           (Timestamp) float64 ...
            Radius        (Timestamp) float64 ...
            Latitude      (Timestamp) float64 ...
            B_NEC_IGRF    (Timestamp, NEC) float64 ...
            QDLat         (Timestamp) float64 ...
            TFA_Variable  (TFA_Time) float64 ...
        Attributes:
            Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
            MagneticModels:  IGRF = IGRF(max_degree=13,min_degree=1)
            AppliedFilters:  []
            PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...

The .swarmpal accessor#

Whenever you import swarmpal, this registers an accessor to datatrees, with extra tools available under <datatree>.swarmpal.<...>. One way in which this is used is to read metadata (stored within the datatree). Here we see that the Preprocess process from the TFA toolbox has saved the configuration which was used:

reloaded_data.swarmpal.pal_meta
{'.': {'TFA_Preprocess': {'dataset': 'SW_OPER_MAGA_LR_1B',
   'timevar': 'Timestamp',
   'active_variable': 'B_NEC',
   'active_component': 0,
   'sampling_rate': 1,
   'remove_model': False,
   'model': '',
   'convert_to_mfa': False,
   'use_magnitude': False,
   'clean_by_flags': False,
   'flagclean_varname': '',
   'flagclean_flagname': '',
   'flagclean_maxval': None}},
 'SW_OPER_MAGA_LR_1B': {'analysis_window': ['2020-01-01T00:00:00',
   '2020-01-01T03:00:00'],
  'magnetic_models': {'IGRF': 'IGRF(max_degree=13,min_degree=1)'}}}

Since this is stored within the data itself, this is preserved over round trips through files so that a following process can see this information, even in a different session.