Data I/O (old)#

A temporary demonstration of how the input data to toolboxes can be loaded.

The ExternalData class is not intended to be used directly. ExternalData defines some general utilities which are useful for handling the input (and output) data for each toolbox. Each toolbox defines its own subclasses of this, e.g. SecsInputs, TfaMagInputs. These define which datasets to connect to, supply some default configuration for those datasets, and perform some preprocessing (e.g. generation of auxiliary/derived parameters).

An example subclass, MagExternalData(ExternalData), is provided and used below to demonstrate the general behaviour of the data objects.

Some behaviours of ExternalData:

  • Each subclass defines which datasets to plug into, and which parameters to fetch etc.

  • The user chooses which particular dataset to fetch

  • ExternalData objects only hold a single time series, represented as an xarray Dataset

  • Remote datasets are by default configured to come from VirES

Some methods added to make the usage more flexible:

  • The expensive part (fetching data) happens at a step:
    .initialise()
    which is run by default when fetching data from VirES (but can be disabled by passing initialise=False)

  • Preloaded data can be saved to a file:
    .to_file("filename.nc") (saves a netCDF file from the xarray object)
    This can be useful to prepare a bulk dataset to be processed (i.e. download all the data first, then apply the algorithms from the toolbox)

  • Choose where data will come from to initialise the data object. On object creation, pass source = "vires" | "swarmpal_file" | "manual":

    • "vires" (default) to fetch from VirES

    • "swarmpal_file" to provide a file prepared from the .to_file() method

    • "manual" to manually pass an xarray Dataset

    • Data from file or manual are loaded after object creation in a second step: .initialise(xarray)
      .initialise("filename.nc")

# This allows module code to be reloaded live
# - useful for testing out things when working on an editable install of the package
%load_ext autoreload
%autoreload 2
from swarmpal.io import ExternalData, MagExternalData

Properties of ExternalData objects#

The base ExternalData class has unset collections and defaults. These configure which dataset (collection) to connect to, and the default (user-overridable) parameters to pass to VirES.

ExternalData.COLLECTIONS
[]
ExternalData.DEFAULTS
{'measurements': [],
 'model': '',
 'auxiliaries': [],
 'sampling_step': None,
 'pad_times': None}

Subclasses replace these to configure the data they require access to:

MagExternalData.COLLECTIONS
['SW_OPER_MAGA_LR_1B',
 'SW_OPER_MAGB_LR_1B',
 'SW_OPER_MAGC_LR_1B',
 'SW_OPER_MAGA_HR_1B',
 'SW_OPER_MAGB_HR_1B',
 'SW_OPER_MAGC_HR_1B']
MagExternalData.DEFAULTS
{'measurements': ['F', 'B_NEC', 'Flags_B'],
 'model': 'IGRF',
 'auxiliaries': ['QDLat', 'QDLon', 'MLT'],
 'sampling_step': None}

Get data from VirES#

The user creates a data object, specifying the details of the particular collection and time window they choose to use:

d_vires = MagExternalData(
    collection="SW_OPER_MAGA_LR_1B",
    model="IGRF",
    start_time="2022-01-01",
    end_time="2022-01-01T01:00:00",
    viresclient_kwargs=dict(
        asynchronous=True, show_progress=True
    ),  # optional (default)
    source="vires",  # optional (default)
    initialise=False,  # defaults to True
)

Data is stored in the .xarray property

This is not available yet because we set initialise=False

# catch the error and just print the error message
try:
    d_vires.xarray
except AttributeError as e:
    print(e)
xarray not set. Run .initialise() to fetch the data
d_vires.initialise()
d_vires.xarray
<xarray.Dataset>
Dimensions:      (Timestamp: 3600, NEC: 3)
Coordinates:
  * Timestamp    (Timestamp) datetime64[ns] 2022-01-01 ... 2022-01-01T00:59:59
  * NEC          (NEC) <U1 'N' 'E' 'C'
Data variables:
    Spacecraft   (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
    QDLon        (Timestamp) float64 78.27 78.26 78.24 ... -120.7 -120.8 -120.8
    Latitude     (Timestamp) float64 -10.7 -10.77 -10.83 ... 62.03 62.1 62.16
    Radius       (Timestamp) float64 6.81e+06 6.81e+06 ... 6.797e+06 6.797e+06
    B_NEC        (Timestamp, NEC) float64 1.65e+04 -2.005e+03 ... 4.328e+04
    Flags_B      (Timestamp) uint8 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    QDLat        (Timestamp) float64 -22.54 -22.6 -22.66 ... 57.5 57.57 57.64
    Longitude    (Timestamp) float64 5.527 5.526 5.525 ... 175.0 175.0 175.0
    F            (Timestamp) float64 2.469e+04 2.468e+04 ... 4.527e+04 4.529e+04
    F_Model      (Timestamp) float64 2.471e+04 2.47e+04 ... 4.526e+04 4.528e+04
    B_NEC_Model  (Timestamp, NEC) float64 1.652e+04 -1.977e+03 ... 4.327e+04
    MLT          (Timestamp) float64 0.04764 0.04704 0.04643 ... 11.79 11.79
Attributes:
    Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
    MagneticModels:  ['Model = IGRF(max_degree=13,min_degree=1)']
    AppliedFilters:  []

Use data from above to manually create the ExternalData#

d_manual = MagExternalData(source="manual")
try:
    d_manual.xarray
except AttributeError as e:
    print(e)
xarray not set. Run .initialise() to fetch the data

Initialise it with the data we fetched earlier

One could supply any data here but it is up to the user to ensure the data is valid input

d_manual.initialise(d_vires.xarray.copy())
d_manual.xarray
<xarray.Dataset>
Dimensions:      (Timestamp: 3600, NEC: 3)
Coordinates:
  * Timestamp    (Timestamp) datetime64[ns] 2022-01-01 ... 2022-01-01T00:59:59
  * NEC          (NEC) <U1 'N' 'E' 'C'
Data variables:
    Spacecraft   (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
    QDLon        (Timestamp) float64 78.27 78.26 78.24 ... -120.7 -120.8 -120.8
    Latitude     (Timestamp) float64 -10.7 -10.77 -10.83 ... 62.03 62.1 62.16
    Radius       (Timestamp) float64 6.81e+06 6.81e+06 ... 6.797e+06 6.797e+06
    B_NEC        (Timestamp, NEC) float64 1.65e+04 -2.005e+03 ... 4.328e+04
    Flags_B      (Timestamp) uint8 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    QDLat        (Timestamp) float64 -22.54 -22.6 -22.66 ... 57.5 57.57 57.64
    Longitude    (Timestamp) float64 5.527 5.526 5.525 ... 175.0 175.0 175.0
    F            (Timestamp) float64 2.469e+04 2.468e+04 ... 4.527e+04 4.529e+04
    F_Model      (Timestamp) float64 2.471e+04 2.47e+04 ... 4.526e+04 4.528e+04
    B_NEC_Model  (Timestamp, NEC) float64 1.652e+04 -1.977e+03 ... 4.327e+04
    MLT          (Timestamp) float64 0.04764 0.04704 0.04643 ... 11.79 11.79
Attributes:
    Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
    MagneticModels:  ['Model = IGRF(max_degree=13,min_degree=1)']
    AppliedFilters:  []

Create ExternalData from a file#

Suppose we prepared the input in an earlier step:

d1 = MagExternalData(
    collection="SW_OPER_MAGA_LR_1B",
    model="IGRF",
    start_time="2022-01-01",
    end_time="2022-01-01T01:00:00",
    viresclient_kwargs=dict(asynchronous=True, show_progress=True),
)
d1.to_file("test_file.nc")

Now we can create the object from this file directly:

d2 = MagExternalData(source="swarmpal_file")
d2.initialise("test_file.nc")
d2.xarray
<xarray.Dataset>
Dimensions:      (Timestamp: 3600, NEC: 3)
Coordinates:
  * Timestamp    (Timestamp) datetime64[ns] 2022-01-01 ... 2022-01-01T00:59:59
  * NEC          (NEC) object 'N' 'E' 'C'
Data variables:
    Spacecraft   (Timestamp) object ...
    QDLon        (Timestamp) float64 ...
    Latitude     (Timestamp) float64 ...
    Radius       (Timestamp) float64 ...
    B_NEC        (Timestamp, NEC) float64 ...
    Flags_B      (Timestamp) uint8 ...
    QDLat        (Timestamp) float64 ...
    Longitude    (Timestamp) float64 ...
    F            (Timestamp) float64 ...
    F_Model      (Timestamp) float64 ...
    B_NEC_Model  (Timestamp, NEC) float64 ...
    MLT          (Timestamp) float64 ...
Attributes:
    Sources:         ['SW_OPER_AUX_IGR_2__19000101T000000_20241231T235959_010...
    MagneticModels:  Model = IGRF(max_degree=13,min_degree=1)
    AppliedFilters:  []
from os import remove

remove("test_file.nc")