WARNING! THIS PACKAGE IS IN ACTIVE DEVELOPMENT AND IS NOT YET STABLE!

swarmpal.toolboxes.tfa#

swarmpal.toolboxes.tfa.processes#

class swarmpal.toolboxes.tfa.processes.Clean(config: dict | None = None, active_tree: str = '/', inplace: bool = True)[source]#

Bases: PalProcess

Clean TFA_Variable by removing outliers and interpolate gaps

property active_tree: str#

Defines which branch of the datatree will be used

property config: dict#

Dictionary that configures the process behaviour

property process_name: str#
set_config(window_size: int = 10, method: str = 'iqr', multiplier: float = 0.5) None[source]#

Set the process configuration

Parameters:
  • window_size (int, optional) – The size (number of points) of the rolling window, by default 10

  • method (str, optional) – “normal” or “iqr”, by default “iqr”

  • multiplier (float, optional) – Indicates the spread of the zone of accepted values, by default 0.5

class swarmpal.toolboxes.tfa.processes.Filter(config: dict | None = None, active_tree: str = '/', inplace: bool = True)[source]#

Bases: PalProcess

High-pass filter the TFA_Variable, using the SciPy Chebysev Type II filter

property active_tree: str#

Defines which branch of the datatree will be used

property config: dict#

Dictionary that configures the process behaviour

property process_name: str#
set_config(cutoff_frequency: float = 0.02) None[source]#

Set the process configuration

Parameters:

cutoff_frequency (float, optional) – The cutoff frequency (in Hz), by default 20/1000

class swarmpal.toolboxes.tfa.processes.Preprocess(config: dict | None = None, active_tree: str = '/', inplace: bool = True)[source]#

Bases: PalProcess

Prepare data for input to other TFA tools

property active_component#
property active_tree: str#

Defines which branch of the datatree will be used

property active_variable#
property config: dict#

Dictionary that configures the process behaviour

property process_name: str#
set_config(dataset: str = '', timevar: str = 'Timestamp', active_variable: str = '', active_component: int | None = None, sampling_rate: float = 1, remove_model: bool = False, model: str = '', convert_to_mfa: bool = False, use_magnitude: bool = False, clean_by_flags: bool = False, flagclean_varname: str = '', flagclean_flagname: str = '', flagclean_maxval: int | None = None) None[source]#

Set the process configuration

Parameters:
  • dataset (str) – Selects this dataset from the datatree

  • timevar (str) – Identifies the name of the time variable, usually “Timestamp” or “Time”

  • active_variable (str) – Selects the variable to use from within the dataset

  • active_component (int, optional) – Selects the component to use (if active_variable is a vector)

  • sampling_rate (float, optional) – Identify the sampling rate of the data input (in Hz), by default 1

  • remove_model (bool, optional) – Remove a magnetic model prediction or not, by default False

  • model (str, optional) – The name of the model

  • convert_to_mfa (bool, optional) – Rotate B to mean-field aligned (MFA) coordinates, by default False

  • use_magnitude (bool, optional) – Use the magnitude of a vector instead, by default False

  • clean_by_flags (bool, optional) – Whether to apply additional flag cleaning or not, by default False

  • flagclean_varname (str, optional) – Name of the variable to clean

  • flagclean_flagname (str, optional) – Name of the flag to use to clean by

  • flagclean_maxval (int, optional) – Maximum allowable flag value

Notes

Some special active_variable names exist which are added to the dataset on-the-fly:

  • “B_NEC_res_Model”

    where a model prediction must be available in the data, like "B_NEC_<Model>", and remove_model has been set. The name of the model can be set with, for example, model="CHAOS".

  • “B_MFA”

    when convert_to_mfa has been set.

  • “Eh_XYZ” and “Ev_XYZ”

    when using the TCT datasets, with vectors defined in ("Ehx", "Ehy", "Ehz") and ("Evx", "Evy", "Evz") respectively.

class swarmpal.toolboxes.tfa.processes.WaveDetection(config: dict | None = None, active_tree: str = '/', inplace: bool = True)[source]#

Bases: PalProcess

Screen out potential false waves

Removes part of the wavelet spectrum that might be due to spikes, data gaps, ESFs or trailing parts of wave activity from either above or below the range of frequencies that were used to perform the wavelet transform.

property active_tree: str#

Defines which branch of the datatree will be used

property config: dict#

Dictionary that configures the process behaviour

property process_name: str#
set_config()[source]#
class swarmpal.toolboxes.tfa.processes.Wavelet(config: dict | None = None, active_tree: str = '/', inplace: bool = True)[source]#

Bases: PalProcess

Apply wavelet analysis

property active_tree: str#

Defines which branch of the datatree will be used

property config: dict#

Dictionary that configures the process behaviour

property process_name: str#
set_config(min_frequency: float | None = None, max_frequency: float | None = None, min_scale: float | None = None, max_scale: float | None = None, dj: float = 0.1) None[source]#

Set the process configuration

Parameters:
  • min_frequency (float | None, optional) – _description_, by default None

  • max_frequency (float | None, optional) – _description_, by default None

  • min_scale (float | None, optional) – _description_, by default None

  • max_scale (float | None, optional) – _description_, by default None

  • dj (float, optional) – _description_, by default 0.1

swarmpal.toolboxes.tfa.tfalib#

# INSERT ESA PROJECT BLOCK #

@author: constantinos@noa.gr

swarmpal.toolboxes.tfa.tfalib.constant_cadence(t_obs, x_obs, sampling_rate, interp=False)[source]#

Set data points to a new time-series with constant sampling time.

Even though data are supposed to be provided at constant time steps, e.g. as times t = 1, 2, 3, … etc, they are often not, as they can be given instead at times t = 1.01, 2.03, 2.99, etc or data points along with their time stampts might be missing entirely, or even be duplicated one or more times. This function corrects these errors, by producing a new array with timestamps at exactly the requested cadence and either moves the existing values to their new proper place, or interpolates the new values at the new time stamps based on the old ones, depending on the value of the interp parameter.

t_obs is a one-dimensional array with the time in seconds x_obs is a one or two-dimensioanl array with real values sampling_rate is a real number, given in Hz

Note: Gaps in the data will NOT be filled, even when interp is True.

Parameters:
  • t_obs ((N,) array_like) – A 1-D array of real values.

  • x_obs ((...,N,...) array_like) – A N-D array of real values. The length of x_obs along the first axis must be equal to the length of t_obs.

  • sampling_rate (float) – The sampling rate of the output series in Hz (eg 2 means two measurements per second, i.e. a time step of 0.5 sec between each).

  • interp (bool, optional) – If False the function will move the existing data values to their new time stamps. If True, it will interpolate the values at the new time stamps based on the original values, using a linear interpolation scheme.

Returns:

  • t_rec ((N,) array_like) – A 1-D array of the new time values, set at constant cadence.

  • x_rec ((…,N,…) array_like) – A N-D array of real values, with the values of x_obs set at constant cadence.

  • nn_mask ((…,N,…) array_like, bool) – A N-D array of boolean values

Examples

>>> import numpy as np
>>> import tfalib
>>> import matplotlib.pyplot as plt
>>> N = 10 # No Points to generate
>>> FS = 50 # sampling rate in Hz
>>> # create time vector
>>> t = np.arange(N) * (1/FS)
>>> # add some small deviations in time, just to complicate things
>>> n = 0.1 * np.random.randn(N) * (1/FS)
>>> t = 12.81 + t + n
>>> # produce x values from a simple linear relation
>>> x = 10 + 0.01 * t
>>> # remove some values at random
>>> inds_to_rem = np.random.permutation(np.arange(1,N-1))[:int(N/4)]
>>> t_obs = np.delete(t, inds_to_rem)
>>> x_obs = np.delete(x, inds_to_rem)
>>> (t_rec, x_rec, nn) = tfalib.constant_cadence(t_obs, x_obs, FS, False)
>>> (t_int, x_int, nn) = tfalib.constant_cadence(t_obs, x_obs, FS, True)
>>> plt.figure(3)
>>> plt.plot(t_obs, x_obs, '--xk', t_rec, x_rec, 'or', t_int, x_int, '+b')
>>> plt.legend(('Original Points', 'Moved', 'Interpolated'))
>>> plt.grid(True)
>>> plt.show()
swarmpal.toolboxes.tfa.tfalib.filter(x, sampling_rate, cutoff)[source]#

High-pass filter the data

This is just a wrapper of the Chebysev Type II filter of SciPy. The way it works is that the lowpass filtered version of the series is being produced, by means of cheby2() and then it is subtracted from the data series, so that the high-pass component remains.

Parameters:
  • x ((...,N,...) array_like) – A 1-D or 2-D array of real values. If 2-D then each column is being treated separately.

  • sampling_rate (float) – The sampling rate of the data, i.e. the reciprocal of the time step

  • cutoff (float) – The cutoff frequency that the filter will use. Sinusoidal waveforms with frequencies below this cutoff will be reduced in amplitude (ideally to zero, but frequencies close to the cutoff will be less affected), while those with frequencies above this cutoff will remain unchanged.

Returns:

filtered – Array, the same size as x with the result of the filtering process.

Return type:

(…,N,…) array_like

Examples

>>> import numpy as np
>>> import tfalib
>>> import matplotlib.pyplot as plt
>>> T = np.arange(0, 3600, 0.5)
>>> Y = np.sin(2*np.pi*T/500)*(np.exp(-(T/1000)**2)) + np.sin(2*np.pi*T/250)*(np.exp(-((T-np.max(T))/1000)**2))
>>> F = tfalib.filter(Y, 2, 3/1000)
>>> plt.figure(1)
>>> plt.plot(T, Y, color=[.5,.5,.5], linewidth=5)
>>> plt.plot(T, F, '-r')
>>> plt.legend(('Original Series', 'High-Pass Filtered'))
>>> plt.grid(True)
>>> plt.show()
swarmpal.toolboxes.tfa.tfalib.magn(X)[source]#

Return the row-wise magnitude of elements in 2D array ‘X’ as a single-column array.

swarmpal.toolboxes.tfa.tfalib.mfa(B_NEC, B_MEAN_NEC, R_NEC=None)[source]#
swarmpal.toolboxes.tfa.tfalib.morlet_wave(N=600, scale=1, dx=0.01, omega=6.203607835633639, roll=True, norm=True)[source]#

Generate a morlet wave-function to be used with the wavelet_tranform()

This generates the comlex-conjugate, scaled and time-reversed form of the Morlet wavelet, so that it can be immediately used in the wavelet transform function.

Parameters:
  • N (integer) – Number of points to generate

  • scale (float) – The scale, i.e. period of the generated waveform

  • dx (float) – The time step of the data, i.e. the reciprocal of the sampling rate

  • omega (float) – The omega_zero parameter of the Morlet function. The default value is 6.2036 which is the value for which the wavelet scales directly correspond to the Fourier periods

  • roll (boolean) –

    If False, the signal is generated as is, centered at zero. If True,

    it is translated so zero becomes the first element in the time series and the part of the wavelet that corresponds to negative x values is folded back at the end of the series. Use False to plot and see the wavelet, but True to use it with the wavelet transform!

    norm: boolean

    If True the function is normalized by multiplication with the factor sqrt(dx/scale), so that its sum of squares is 1 and sum of squares of FFT coefficients is N. Use True with the wavelet transform!

Returns:

  • wavelet ((N,) array_like) – 1-D array with the complex values of the Morlet wavelet

  • x ((N,) array_like) – 1-D array with the x values that correspond to the wavelet (use for plotting only, otherwise ignore)

  • wavelet_norm_factor (float) – A number to be used for the normalization of the result of the wavelet transform.

Examples

>>> import numpy as np
>>> from scipy.fft import fft
>>> import tfalib
>>> import matplotlib.pyplot as plt
>>> N_wave = 600
>>> s_wave = 50
>>> dx_wave = .5
>>> m, m_x, m_norm = tfalib.morlet_wave(N_wave, s_wave, dx_wave, roll=False, norm=True)
>>> plt.figure()
>>> plt.plot(m_x, np.real(m), '-b', m_x, np.imag(m), '-r')
>>> plt.grid(True)
>>> plt.show()
>>> # Test wavelet function's properties
>>> print('Wavelet Integral = %f + i %f (should be zero)'%(np.trapz(np.real(m), dx=dx_wave), np.trapz(np.imag(m), dx=dx_wave)))
>>> print('Sum of squares = %f (should be 1)'%np.sum(np.abs(m)**2))
>>> print('Sum of squares of FFT = %f (should be N)'%np.sum(np.abs(fft(m, norm='backward'))**2))
swarmpal.toolboxes.tfa.tfalib.moving_mean_and_stdev(x, window_size, unbiased_std=True)[source]#

Calculate moving average and moving st.dev

Parameters:
  • x ((...,N,...) array_like) – A 1-D or 2-D array of real values. If 2-D then each column is being treated separately.

  • window_size (int) – The size of the rolling window (in number of points)

  • unbiased_std (bool, optional) – If True the unbiased estimator of the standard deviation will be used, i.e dividing by N-1. If False the standard deviation will be computed by dividing by N.

Returns:

  • moving_mean ((…,N,…) array_like) – Moving mean, the same size as x.

  • moving_stdev ((…,N,…) array_like) – Moving standard deviation, the same size as x.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import tfalib
>>> N = 1000
>>> t = np.linspace(0, 2*np.pi, N)
>>> x = np.sin(2*np.pi*t/np.pi) + 0.1*np.random.randn(N)
>>> m, s = tfalib.moving_mean_and_stdev(x, 50)
>>> plt.plot(t, x, 'xk', t, m, '-b', t, m + 3*s, '-r', t, m - 3*s, '-r')
>>> plt.show()
swarmpal.toolboxes.tfa.tfalib.moving_q25_and_q75(x, window_size)[source]#

Calculate moving 25th and 75th percentiles

The difference between these two percentiles is called the inter-quartile range (iqr) and can be used for outlier detection, i.e. accept only points that lie within the region from q25 - 1.5*iqr up to q75 + 1.5*iqr and discard the rest.

NOTE: It is recommended to use an odd integer number for window_size

Parameters:
  • x ((...,N,...) array_like) – A 1-D or 2-D array of real values. If 2-D then each column is being treated separately.

  • window_size (int) – The size of the rolling window (in number of points)

Returns:

  • moving_q25 ((…,N,…) array_like) – Moving 25th percentile, the same size as x.

  • moving_q75 ((…,N,…) array_like) – Moving 75th percentile, the same size as x.

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import tfalib
>>> N = 1000
>>> t = np.linspace(0, 2*np.pi, N)
>>> x = np.sin(2*np.pi*t/np.pi) + 0.1*np.random.randn(N)
>>> q25, q75 = tfalib.moving_q25_and_q75(x, 50)
>>> iqr = q75 - q25
>>> plt.plot(t, x, 'xk', t, q25 - 1.5*iqr, '-r', t, q75 + 1.5*iqr, '-r')
>>> plt.show()
swarmpal.toolboxes.tfa.tfalib.outliers(x, window_size, method='iqr', multiplier=nan)[source]#

Find statistical outliers in data

This uses a moving window to identify outliers, based on how larger or smaller data points are from their neighbours within the window. Two methods are used:

normal: Assumes Gaussian distribution. Calculates the meand and st.dev. inside a window of length window_size and flags as outliers points that lie below/above the window mean +/- M times that st.dev, with M being defined by the multiplier parameter.

iqr: As above, but using the quartiles q25 and q75 and the inter-quartile range iqr, to define the zone of acceptable measurements. Outliers will lie below q25 - M*iqr or above q75 + M*iqr, with M being the multiplier parameter.

multiplier can be either a single float or a list of two numbers, in which case, the first will be used to define the lower limit and the second the upper one. If you want to search only for e.g. high outliers, then set the first element of multiplier as numpy.Inf so that it will include all values.

Parameters:
  • x ((...,N,...) array_like) – A 1-D or 2-D array of real values. If 2-D then each column is being treated separately.

  • window_size (int) – The size of the rolling window (in number of points)

  • method (string) – Can be either ‘normal’ or ‘iqr’ and signifies the method used

  • multiplier (float or list (of two floats)) – The number that indicates the spread of the zone of accepted values

Returns:

outlier_inds – Boolean array, the same size as x with True where an outlier has been detected and False elsewhere.

Return type:

(…,N,…) array_like

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import tfalib
>>> N = 1000
>>> t = np.linspace(0, 2*np.pi, N)
>>> x = np.sin(2*np.pi*t/np.pi) + np.random.randn(N)
>>> M = 100 # number of outliers
>>> A = 5   # intensity of outliers
>>> spkinds = np.random.permutation(N)[0:M]
>>> x[spkinds[0:int(np.floor(M/2))]] += A # first half to be increased
>>> x[spkinds[int(np.ceil(M/2)):]] -= A # second half to be decreased
>>> outlier_inds = tfalib.outliers(x, 25, method = 'iqr', multiplier = 1.5)
>>> plt.plot(t, x, 'xk', t[outlier_inds], x[outlier_inds], 'or')
>>> plt.show()
swarmpal.toolboxes.tfa.tfalib.wavelet_normalize(wave_sq_matrix, scales, dx, dj, wavelet_norm_factor)[source]#

Apply a normalization to the squared magnitude of the output of the wavelt transform so that its results are compatible with the FFT.

Parameters:
  • wave_sq_matrix ((M,N) Array like) – The square of the magnitude of the output of the wavelet transform

  • scales ((M,) array_like) – 1-D array with the values of the scales that were used for the wavelet transform

  • dx (float) – The time step of the data, i.e. the reciprocal of the sampling rate

  • dj (float) –

    The step size to use for generating the scales that will be used for the wavelet transform. Scales are generated using the form:

    scales = minScale * np.power(2, np.arange(0, M, dj))

    with M being given by np.log2(maxScale/minScale)+dj, ensuring that the maximum scale will be equal to maxScale

  • wavelet_norm_factor (float) – The wavelet-specific normalization factor that needs to be applied

Returns:

normalized_wave_sq_matrix – The normalized square of the magnitude of the output of the wavelet transform

Return type:

(M,N) Array like

swarmpal.toolboxes.tfa.tfalib.wavelet_scales(minScale, maxScale, dj)[source]#
swarmpal.toolboxes.tfa.tfalib.wavelet_transform(x, dx, minScale, maxScale, wavelet_function=<function morlet_wave>, dj=0.1)[source]#

Apply the wavelet transform on time series data.

Parameters:
  • x ((N,) Array like) – Input time series

  • dx (float) – The time step of the data, i.e. the reciprocal of the sampling rate

  • wavelet_function (function) – The wavelet mother function to use in the transform

  • minScale (float) – The smallest scale to use for the wavelet transform

  • maxScale (float) – The largest scale to use for the wavelet transform

  • dj (float) –

    The step size to use for generating the scales that will be used for the wavelet transform. Scales are generated using the form:

    scales = minScale * np.power(2, np.arange(0, M, dj))

    with M being given by np.log2(maxScale/minScale)+dj, ensuring that the maximum scale will be equal to maxScale

Returns:

  • wave_mat ((M,N) array_like) – 2-D array with the complex values of the wavelet transform. Each row is a different scale and each column a different moment in time

  • scales ((M,) array_like) – 1-D array with the values of the scales that were used for the wavelet transform

Examples

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import tfalib
>>> fs = 8
>>> T = np.arange(0, 10000, 1/fs);
>>> N = len(T)
>>> dj=0.1
>>> W, scales = tfalib.wavelet_transform(X, 1/fs, tfalib.morlet_wave, 2, 1000, dj)
>>> Wsq = np.abs(W)**2
>>> log2scales = np.log2(scales)
>>> plt.figure()
>>> plt.imshow(Wsq[91:0:-1,:], aspect='auto',
>>>            extent=[T[0], T[-1], log2scales[0], log2scales[-1]])
>>> plt.yticks(np.arange(log2scales[0],log2scales[-1]+dj),
>>>            labels=2**np.arange(log2scales[0],log2scales[-1]+dj))