Quickstart#
There are two main things to understand in SwarmPAL, data and processes. Data live within a xarray DataTree (type DataTree
), and processes are callable objects, that is, they behave like functions (and are of type PalProcess
). Processes act on data to transform them by adding derived parameters into the data object.
Fetching data#
Data are pulled in over the web and organised as a DataTree
, which is done using create_paldata
and PalDataItem
:
from swarmpal.io import create_paldata, PalDataItem
data = create_paldata(
PalDataItem.from_vires(
server_url="https://vires.services/ows",
collection="SW_OPER_MAGA_LR_1B",
measurements=["B_NEC"],
start_time="2020-01-01T00:00:00",
end_time="2020-01-01T03:00:00",
options=dict(asynchronous=False, show_progress=False),
)
)
print(data)
DataTree('paldata', parent=None)
└── DataTree('SW_OPER_MAGA_LR_1B')
Dimensions: (Timestamp: 10800, NEC: 3)
Coordinates:
* Timestamp (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
* NEC (NEC) <U1 'N' 'E' 'C'
Data variables:
Spacecraft (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
B_NEC (Timestamp, NEC) float64 3.137e+03 388.2 ... 2.519e+03 4.077e+04
Longitude (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
Latitude (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
Radius (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
Attributes:
Sources: ['SW_PREL_MAGA_LR_1B_20200101T000000_20200101T235959_060...
MagneticModels: []
AppliedFilters: []
PAL_meta: {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...
Now you can skip ahead to Applying Processes, or read on to learn more about data…
Swarm data are fetched from the VirES service, and SwarmPAL uses the Python package viresclient
underneath to transfer and load the data. Similarly, any HAPI server can also be used, where hapiclient
is used underneath.
create_paldata
and PalDataItem
have a few features for flexible use:
Pass multiple items to
create_paldata
to assemble a complex datatree. Pass them as keyword arguments (e.g.HAPI_SW_OPER_MAGA_LR_1B=...
below) if you want to manually change the name in the datatree, otherwise they will default to the collection/dataset name.Use
.from_vires()
and.from_hapi()
to fetch data from different services. Note that the argument names and usage are a bit different (though equivalent) in each case. These follow the nomenclature used inviresclient
andhapiclient
respectively.
data = create_paldata(
PalDataItem.from_vires(
server_url="https://vires.services/ows",
collection="SW_OPER_MAGA_LR_1B",
measurements=["B_NEC"],
start_time="2020-01-01T00:00:00",
end_time="2020-01-01T03:00:00",
options=dict(asynchronous=False, show_progress=False),
),
HAPI_SW_OPER_MAGA_LR_1B=PalDataItem.from_hapi(
server="https://vires.services/hapi",
dataset="SW_OPER_MAGA_LR_1B",
parameters="Latitude,Longitude,Radius,B_NEC",
start="2020-01-01T00:00:00",
stop="2020-01-01T03:00:00",
),
)
print(data)
DataTree('paldata', parent=None)
├── DataTree('SW_OPER_MAGA_LR_1B')
│ Dimensions: (Timestamp: 10800, NEC: 3)
│ Coordinates:
│ * Timestamp (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
│ * NEC (NEC) <U1 'N' 'E' 'C'
│ Data variables:
│ Spacecraft (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
│ B_NEC (Timestamp, NEC) float64 3.137e+03 388.2 ... 2.519e+03 4.077e+04
│ Longitude (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
│ Latitude (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
│ Radius (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
│ Attributes:
│ Sources: ['SW_PREL_MAGA_LR_1B_20200101T000000_20200101T235959_060...
│ MagneticModels: []
│ AppliedFilters: []
│ PAL_meta: {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...
└── DataTree('HAPI_SW_OPER_MAGA_LR_1B')
Dimensions: (Timestamp: 10800, B_NEC_dim1: 3)
Coordinates:
* Timestamp (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
Dimensions without coordinates: B_NEC_dim1
Data variables:
Latitude (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17
Longitude (Timestamp) float64 103.0 103.1 103.1 103.2 ... 49.91 49.91 49.91
Radius (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06
B_NEC (Timestamp, B_NEC_dim1) float64 3.137e+03 388.2 ... 4.077e+04
Attributes:
PAL_meta: {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T03:00:...
/home/docs/checkouts/readthedocs.org/user_builds/swarmpal/envs/latest/lib/python3.11/site-packages/hapiclient/hapitime.py:287: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
Time = pandas.to_datetime(Time, infer_datetime_format=True).tz_convert(tzinfo).to_pydatetime()
While you can learn more about using datatrees on the xarray documentation, this should not be necessary for basic usage of SwarmPAL. If you are familiar with xarray, you can access a dataset by browsing the datatree like a dictionary, then using either the .ds
accessor to get an immutable view of the dataset, or .to_dataset()
to extract a mutable copy.
data["SW_OPER_MAGA_LR_1B"].ds
<xarray.DatasetView> Dimensions: (Timestamp: 10800, NEC: 3) Coordinates: * Timestamp (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59 * NEC (NEC) <U1 'N' 'E' 'C' Data variables: Spacecraft (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A' B_NEC (Timestamp, NEC) float64 3.137e+03 388.2 ... 2.519e+03 4.077e+04 Longitude (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91 Latitude (Timestamp) float64 76.7 76.76 76.82 76.89 ... 51.04 51.11 51.17 Radius (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06 6.806e+06 Attributes: Sources: ['SW_PREL_MAGA_LR_1B_20200101T000000_20200101T235959_060... MagneticModels: [] AppliedFilters: [] PAL_meta: {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...
Using the VirES API, there are additional things that can be requested outwith the original dataset (models and auxiliaries). See the viresclient documentation for details, or Swarm Notebooks for more examples. The extra options
below specifies an extendable dictionary of special options which are passed to viresclient
. In this case we specify asynchronous=False
to process the request synchronously (faster, but will fail for longer requests), and disable the progress bars with show_progress=False
.
data = create_paldata(
PalDataItem.from_vires(
server_url="https://vires.services/ows",
collection="SW_OPER_MAGA_LR_1B",
measurements=["B_NEC"],
models=["IGRF"],
auxiliaries=["QDLat", "MLT"],
start_time="2020-01-01T00:00:00",
end_time="2020-01-01T03:00:00",
options=dict(asynchronous=False, show_progress=False),
)
)
Applying Processes#
A process is a special object type you can import from different toolboxes in SwarmPAL.
First we import the relevant toolbox and create a process from the .processes
submodule:
from swarmpal.toolboxes import tfa
process = tfa.processes.Preprocess()
Each process has a .set_config()
method which configures the behaviour of the process:
help(process.set_config)
Help on method set_config in module swarmpal.toolboxes.tfa.processes:
set_config(dataset: 'str' = '', timevar: 'str' = 'Timestamp', active_variable: 'str' = '', active_component: 'int | None' = None, sampling_rate: 'float' = 1, remove_model: 'bool' = False, model: 'str' = '', convert_to_mfa: 'bool' = False, use_magnitude: 'bool' = False, clean_by_flags: 'bool' = False, flagclean_varname: 'str' = '', flagclean_flagname: 'str' = '', flagclean_maxval: 'int | None' = None) -> 'None' method of swarmpal.toolboxes.tfa.processes.Preprocess instance
Set the process configuration
Parameters
----------
dataset : str
Selects this dataset from the datatree
timevar : str
Identifies the name of the time variable, usually "Timestamp" or "Time"
active_variable : str
Selects the variable to use from within the dataset
active_component : int, optional
Selects the component to use (if active_variable is a vector)
sampling_rate : float, optional
Identify the sampling rate of the data input (in Hz), by default 1
remove_model : bool, optional
Remove a magnetic model prediction or not, by default False
model : str, optional
The name of the model
convert_to_mfa : bool, optional
Rotate B to mean-field aligned (MFA) coordinates, by default False
use_magnitude : bool, optional
Use the magnitude of a vector instead, by default False
clean_by_flags : bool, optional
Whether to apply additional flag cleaning or not, by default False
flagclean_varname : str, optional
Name of the variable to clean
flagclean_flagname : str, optional
Name of the flag to use to clean by
flagclean_maxval : int, optional
Maximum allowable flag value
Notes
-----
Some special ``active_variable`` names exist which are added to the dataset on-the-fly:
* "B_NEC_res_Model"
where a model prediction must be available in the data, like ``"B_NEC_<Model>"``, and ``remove_model`` has been set. The name of the model can be set with, for example, ``model="CHAOS"``.
* "B_MFA"
when ``convert_to_mfa`` has been set.
* "Eh_XYZ" and "Ev_XYZ"
when using the TCT datasets, with vectors defined in ``("Ehx", "Ehy", "Ehz")`` and ``("Evx", "Evy", "Evz")`` respectively.
process.set_config(
dataset="SW_OPER_MAGA_LR_1B",
timevar="Timestamp",
active_variable="B_NEC",
active_component=0,
)
Processes are callable, which means they can be used like functions. They act on datatrees to alter them. We can use this process on the the data we built above.
data = process(data)
print(data)
DataTree('paldata', parent=None)
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ PAL_meta: {"TFA_Preprocess": {"dataset": "SW_OPER_MAGA_LR_1B", "timevar"...
└── DataTree('SW_OPER_MAGA_LR_1B')
Dimensions: (Timestamp: 10800, NEC: 3, TFA_Time: 10800)
Coordinates:
* Timestamp (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
* NEC (NEC) <U1 'N' 'E' 'C'
* TFA_Time (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
Data variables:
Spacecraft (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
B_NEC (Timestamp, NEC) float64 3.137e+03 388.2 ... 4.077e+04
Longitude (Timestamp) float64 103.0 103.1 103.1 ... 49.91 49.91 49.91
MLT (Timestamp) float64 6.569 6.571 6.574 ... 6.172 6.173 6.174
Radius (Timestamp) float64 6.803e+06 6.803e+06 ... 6.806e+06
Latitude (Timestamp) float64 76.7 76.76 76.82 ... 51.04 51.11 51.17
B_NEC_IGRF (Timestamp, NEC) float64 3.139e+03 404.4 ... 4.077e+04
QDLat (Timestamp) float64 71.96 72.02 72.08 ... 47.41 47.47 47.54
TFA_Variable (TFA_Time) float64 3.137e+03 3.116e+03 ... 1.553e+04 1.55e+04
Attributes:
Sources: ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
MagneticModels: ['IGRF = IGRF(max_degree=13,min_degree=1)']
AppliedFilters: []
PAL_meta: {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...
The data has been modified, in this case adding a new data variable called TFA_Variable
. We can inspect it using the usual xarray/matplotlib tooling, for example:
data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"]
<xarray.DataArray 'TFA_Variable' (TFA_Time: 10800)> array([ 3136.67477723, 3116.43557451, 3096.53316807, ..., 15560.98496414, 15531.18166644, 15501.39554679]) Coordinates: * TFA_Time (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59 Attributes: units: nT description: Magnetic field vector, NEC frame
data["SW_OPER_MAGA_LR_1B"]["TFA_Variable"].plot.line(x="TFA_Time");
… but in this case, the TFA toolbox has additional tools for inspecting data:
tfa.plotting.time_series(data);
Saving/loading data#
Since data
is just a normal datatree, we can use the usual xarray tools to write and read files. Some situations this might be useful in are:
Saving preprocessed (i.e. interim) data, then later reloading it for further processing. One might download a whole series of data, then in a second, more iterative workflow, analyse it (without having to wait again for the download)
Saving the output of a process to use in other tools
Saving the output of a process to later reload just for visualisation
from os import remove
from datatree import open_datatree
# Save the file as NetCDF
data.to_netcdf("testdata.nc")
# Load the data as a new datatree
reloaded_data = open_datatree("testdata.nc")
# Remove that file we just made
remove("testdata.nc")
print(reloaded_data)
DataTree('None', parent=None)
│ Dimensions: ()
│ Data variables:
│ *empty*
│ Attributes:
│ PAL_meta: {"TFA_Preprocess": {"dataset": "SW_OPER_MAGA_LR_1B", "timevar"...
└── DataTree('SW_OPER_MAGA_LR_1B')
Dimensions: (Timestamp: 10800, NEC: 3, TFA_Time: 10800)
Coordinates:
* Timestamp (Timestamp) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
* NEC (NEC) <U1 'N' 'E' 'C'
* TFA_Time (TFA_Time) datetime64[ns] 2020-01-01 ... 2020-01-01T02:59:59
Data variables:
Spacecraft (Timestamp) <U1 ...
B_NEC (Timestamp, NEC) float64 ...
Longitude (Timestamp) float64 ...
MLT (Timestamp) float64 ...
Radius (Timestamp) float64 ...
Latitude (Timestamp) float64 ...
B_NEC_IGRF (Timestamp, NEC) float64 ...
QDLat (Timestamp) float64 ...
TFA_Variable (TFA_Time) float64 ...
Attributes:
Sources: ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
MagneticModels: IGRF = IGRF(max_degree=13,min_degree=1)
AppliedFilters: []
PAL_meta: {"analysis_window": ["2020-01-01T00:00:00", "2020-01-01T...
The .swarmpal
accessor#
Whenever you import swarmpal
, this registers an accessor to datatrees, with extra tools available under <datatree>.swarmpal.<...>
. One way in which this is used is to read metadata (stored within the datatree). Here we see that the Preprocess
process from the TFA
toolbox has saved the configuration which was used:
reloaded_data.swarmpal.pal_meta
{'.': {'TFA_Preprocess': {'dataset': 'SW_OPER_MAGA_LR_1B',
'timevar': 'Timestamp',
'active_variable': 'B_NEC',
'active_component': 0,
'sampling_rate': 1,
'remove_model': False,
'model': '',
'convert_to_mfa': False,
'use_magnitude': False,
'clean_by_flags': False,
'flagclean_varname': '',
'flagclean_flagname': '',
'flagclean_maxval': None}},
'SW_OPER_MAGA_LR_1B': {'analysis_window': ['2020-01-01T00:00:00',
'2020-01-01T03:00:00'],
'magnetic_models': {'IGRF': 'IGRF(max_degree=13,min_degree=1)'}}}
Since this is stored within the data itself, this is preserved over round trips through files so that a following process can see this information, even in a different session.