Quickstart#

There are two main things to understand in SwarmPAL, data and processes. Data live within a xarray DataTree (type DataTree), and processes are callable objects, that is, they behave like functions (and are of type PalProcess). Processes act on data to transform them by adding derived parameters into the data object.

Applying Processes#

A process is a special object type you can import from different toolboxes in SwarmPAL.

First we import the relevant toolbox and create a process from the .processes submodule:

from swarmpal.toolboxes import tfa

process = tfa.processes.Preprocess()

Each process has a .set_config() method which configures the behaviour of the process:

help(process.set_config)

Help on method set_config in module swarmpal.toolboxes.tfa.processes:

set_config(dataset: 'str' = '', timevar: 'str' = 'Timestamp', active_variable: 'str' = '', active_component: 'int | None' = None, sampling_rate: 'float' = 1, remove_model: 'bool' = False, model: 'str' = '', convert_to_mfa: 'bool' = False, use_magnitude: 'bool' = False, clean_by_flags: 'bool' = False, flagclean_varname: 'str' = '', flagclean_flagname: 'str' = '', flagclean_maxval: 'int | None' = None) -> 'None' method of swarmpal.toolboxes.tfa.processes.Preprocess instance
    Set the process configuration
    
    Parameters
    ----------
    dataset : str
        Selects this dataset from the datatree
    timevar : str
        Identifies the name of the time variable, usually "Timestamp" or "Time"
    active_variable : str
        Selects the variable to use from within the dataset
    active_component : int, optional
        Selects the component to use (if active_variable is a vector)
    sampling_rate : float, optional
        Identify the sampling rate of the data input (in Hz), by default 1
    remove_model : bool, optional
        Remove a magnetic model prediction or not, by default False
    model : str, optional
        The name of the model
    convert_to_mfa : bool, optional
        Rotate B to mean-field aligned (MFA) coordinates, by default False
    use_magnitude : bool, optional
        Use the magnitude of a vector instead, by default False
    clean_by_flags : bool, optional
        Whether to apply additional flag cleaning or not, by default False
    flagclean_varname : str, optional
        Name of the variable to clean
    flagclean_flagname : str, optional
        Name of the flag to use to clean by
    flagclean_maxval : int, optional
        Maximum allowable flag value
    
    Notes
    -----
    Some special ``active_variable`` names exist which are added to the dataset on-the-fly:
    
    * "B_NEC_res_Model"
        where a model prediction must be available in the data, like ``"B_NEC_<Model>"``, and ``remove_model`` has been set. The name of the model can be set with, for example, ``model="CHAOS"``.
    * "B_MFA"
        when ``convert_to_mfa`` has been set.
    * "Eh_XYZ" and "Ev_XYZ"
        when using the TCT datasets, with vectors defined in ``("Ehx", "Ehy", "Ehz")`` and ``("Evx", "Evy", "Evz")`` respectively.

process.set_config(
    dataset="SW_OPER_MAGA_LR_1B",
    timevar="Timestamp",
    active_variable="B_NEC",
    active_component=0,
)

Processes are callable, which means they can be used like functions. They act on datatrees to alter them. We can use this process on the the data we built above.

data = process(data)
print(data)

DataTree('paldata', parent=None)
│   Dimensions:  ()
│   Data variables:
│       *empty*
│   Attributes:
│       PAL_meta:  {"TFA_Preprocess": {"dataset": "SW_OPER_MAGA_LR_1B", "timevar"...
└── DataTree('SW_OPER_MAGA_LR_1B')
        Dimensions:       (Timestamp: 10800, NEC: 3, TFA_Time: 10800)
        Coordinates:
          * Timestamp     (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
          * NEC           (NEC) <U1 'N' 'E' 'C'
          * TFA_Time      (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
        Data variables:
            Spacecraft    (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
            B_NEC         (Timestamp, NEC) float64 3.137e+03 388.2 ... 4.077e+04
            Longitude     (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
            MLT           (Timestamp) float64 6.569 6.571 6.574 ... 6.172 6.173 6.174
            Radius        (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06
            Latitude      (Timestamp) float64 76.7 76.76 76.82 ... 51.04 51.11 51.17
            B_NEC_IGRF    (Timestamp, NEC) float64 3.139e+03 404.4 ... 4.077e+04
            QDLat         (Timestamp) float64 71.96 72.02 72.08 ... 47.41 47.47 47.54
            TFA_Variable  (TFA_Time) float64 3.137e+03 3.116e+03 ... 1.553e+04 1.55e+04
        Attributes:
            Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
            MagneticModels:  ['IGRF = IGRF(max_degree=13,min_degree=1)']
            AppliedFilters:  []
            PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...

The data has been modified, in this case adding a new data variable called TFA_Variable. We can inspect it using the usual xarray/matplotlib tooling, for example:

data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"]

<xarray.DataArray 'TFA_Variable' (TFA_Time: 10800)>
array([ 3136.67477723,  3116.43557451,  3096.53316807, ...,
       15560.98496414, 15531.18166644, 15501.39554679])
Coordinates:
  * TFA_Time  (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
Attributes:
    units:        nT
    description:  Magnetic field vector, NEC frame

data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"].plot.line(x="TFA_Time");

../_images/98ab94d5ec0a11773064d68babfec16f844ea58fc9668c56f358ff889be57f13.png

… but in this case, the TFA toolbox has additional tools for inspecting data:

tfa.plotting.time_series(data);

../_images/adc12a7b36e6b22f119d5b39de106c06f4eb2878facec946a1f76c116d28e373.png

Saving/loading data#

Since data is just a normal datatree, we can use the usual xarray tools to write and read files. Some situations this might be useful in are:

Saving preprocessed (i.e. interim) data, then later reloading it for further processing. One might download a whole series of data, then in a second, more iterative workflow, analyse it (without having to wait again for the download)
Saving the output of a process to use in other tools
Saving the output of a process to later reload just for visualisation

from os import remove
from datatree import open_datatree

# Save the file as NetCDF
data.to_netcdf("testdata.nc")
# Load the data as a new datatree
reloaded_data = open_datatree("testdata.nc")
# Remove that file we just made
remove("testdata.nc")
print(reloaded_data)

DataTree('None', parent=None)
│   Dimensions:  ()
│   Data variables:
│       *empty*
│   Attributes:
│       PAL_meta:  {"TFA_Preprocess": {"dataset": "SW_OPER_MAGA_LR_1B", "timevar"...
└── DataTree('SW_OPER_MAGA_LR_1B')
        Dimensions:       (Timestamp: 10800, NEC: 3, TFA_Time: 10800)
        Coordinates:
          * Timestamp     (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
          * NEC           (NEC) <U1 'N' 'E' 'C'
          * TFA_Time      (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
        Data variables:
            Spacecraft    (Timestamp) <U1 ...
            B_NEC         (Timestamp, NEC) float64 ...
            Longitude     (Timestamp) float64 ...
            MLT           (Timestamp) float64 ...
            Radius        (Timestamp) float64 ...
            Latitude      (Timestamp) float64 ...
            B_NEC_IGRF    (Timestamp, NEC) float64 ...
            QDLat         (Timestamp) float64 ...
            TFA_Variable  (TFA_Time) float64 ...
        Attributes:
            Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
            MagneticModels:  IGRF = IGRF(max_degree=13,min_degree=1)
            AppliedFilters:  []
            PAL_meta:        {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...

The `.swarmpal` accessor#

Whenever you import swarmpal, this registers an accessor to datatrees, with extra tools available under <datatree>.swarmpal.<...>. One way in which this is used is to read metadata (stored within the datatree). Here we see that the Preprocess process from the TFA toolbox has saved the configuration which was used:

reloaded_data.swarmpal.pal_meta

{'.': {'TFA_Preprocess': {'dataset': 'SW_OPER_MAGA_LR_1B',
   'timevar': 'Timestamp',
   'active_variable': 'B_NEC',
   'active_component': 0,
   'sampling_rate': 1,
   'remove_model': False,
   'model': '',
   'convert_to_mfa': False,
   'use_magnitude': False,
   'clean_by_flags': False,
   'flagclean_varname': '',
   'flagclean_flagname': '',
   'flagclean_maxval': None}},
 'SW_OPER_MAGA_LR_1B': {'analysis_window': ['2020-01-01T00:00:00',
   '2020-01-01T03:00:00'],
  'magnetic_models': {'IGRF': 'IGRF(max_degree=13,min_degree=1)'}}}

Since this is stored within the data itself, this is preserved over round trips through files so that a following process can see this information, even in a different session.

Quickstart

Contents

Quickstart#

Fetching data#

Applying Processes#

Saving/loading data#

The `.swarmpal` accessor#

Quickstart

Contents

Quickstart#

Fetching data#

Applying Processes#

Saving/loading data#

The .swarmpal accessor#

The `.swarmpal` accessor#