Data management

Management of data which can be saved in an h5 file.

The idea of these data structures is to provide a framework so that a lot of heterogeneous data can be saved to the _same_ h5 file coherently, as a separate group.

Each subclass of SavableData has a default group name which will become the name of the h5 file group in which that data is saved.

class SavableData[source]

Generic container for data which might need to be saved to file.

Subclasses should also be decorated with @dataclass.

Instances should only store numpy arrays, so that the h5py compatibility will work.

abstract property group_name: str: Name of the group this data should be saved as.

save_to_file(file: h5py._hl.files.File) → None[source]: Save the data to a group in an h5 file.

class Residuals(amplitude_residuals: numpy.ndarray, phase_residuals: numpy.ndarray)[source]

Dataclass which contains a set of sample frequencies as well as amplitude and phase residuals.

Parameters

amplitude_residuals (np.ndarray) – Amplitude residuals. This array should have shape (number_of_waveforms, number_of_amplitude_sample_points).
phase_residuals (np.ndarray) – Phase residuals. This array should have shape (number_of_waveforms, number_of_phase_sample_points).

Class Attributes

group_name (str = “residuals”) – Name of the group in the h5 file these will be saved in.

property combined: numpy.ndarray

Combine the amplitude and phase residuals into a single array, with shape (number_of_waveforms, number_of_amplitude_sample_points+number_of_phase_sample_points).

Returns: Combined residuals.
Return type: np.ndarray

flatten_phase(frequencies: numpy.ndarray, first_section_flat: float = 0.2) → numpy.ndarray[source]

Subtract a linear term from the phase, such that it is often close to 0.

Parameters

frequencies (np.ndarray) – Frequencies to which the phase points correspond. Required for the linear term subtraction.
first_section_flat (float, optional) – The linear term is chosen so that the first phase residual is zero, and so is the one corresponding to this fraction of the frequencies. Defaults to 0.2.
Default: 0.2

Returns

timeshifts – Timeshifts, in seconds if the frequencies given are in Hz,

Return type

np.ndarray

classmethod from_combined_residuals(combined_residuals: np.ndarray, numbers_of_points: tuple[int, int]) → Residuals[source]

Generate object from a np.ndarray containing the combined residuals: amplitude and phase appended to each other.

The number of points these will each contain is given as the argument numbers_of_points == (amp_points, phase_points).

classmethod from_two_waveform_datasets(waveforms_1: mlgw_bns.data_management.FDWaveforms, waveforms_2: mlgw_bns.data_management.FDWaveforms)[source]

Create a dataset of residuals corresponding to the difference between the two given datasets.

Parameters

waveforms_1 (FDWaveforms) –
waveforms_2 (FDWaveforms) –

class DownsamplingIndices(amplitude_indices: list[int], phase_indices: list[int])[source]

Indices to be used to select a subset of frequencies at which the waveform’s amplitude and phase can be sampled while retaining a good degree of accuracy.

The corresponding frequencies are the ones obtained when slicing the “standard frequency array” at those indices.

This array can be obtained by accessing the frequencies (natural units) or frequencies_hz (SI units) property of a Dataset instance.

It is an equally-spaced frequency array, starting at an initial frequency, finishing at half of the time-domain interpolation rate, with a step equal to the inverse of the length of the time-domain waveform.

Parameters

amplitude_indices (list[int]) –
phase_indices (list[int]) –

class FDWaveforms(amplitudes: numpy.ndarray, phases: numpy.ndarray)[source]

Dataclass which contains the amplitude and phase of a set of frequency-domain waveforms.

Parameters

amplitudes (np.ndarray) – Amplitude of the waveforms. An array with shape (n_waveforms, n_samples_amp).
phases (np.ndarray) – Phases of the waveforms. An array with shape (n_waveforms, n_samples_phi).

Class Attributes

group_name (str) – Defaults to “waveforms”.

class PrincipalComponentData(eigenvectors: numpy.ndarray, eigenvalues: numpy.ndarray, mean: numpy.ndarray, principal_components_scaling: numpy.ndarray)[source]

Dataclass which contains all the data required for a PCA model to work: eigenvalues and eigenvectors of the covariance matrix, mean of the data, and reference scaling for the principal component reperesentation.

In the parameter definitions, the number of dimensions is the \(N\) such that each data point belongs to \(\mathbb{R}^N\), while the number of components, typically denoted as \(K\), is the number of the principal components we choose to keep when reducing the dimensionality of the data.

Parameters

eigenvectors (np.ndarray) – Eigenvectors from the PCA. This array should have shape (number_of_dimensions, number_of_components).
eigenvalues (np.ndarray) – Eigenvalues from the PCA. This array should have shape (number_of_components, ).
mean (np.ndarray) – Mean subtracted from the data before decomposing the covariance matrix. This array should have shape (number_of_dimensions, ).
principal_components_scaling (np.ndarray) – Scale by which to divide the principal components, typically computed as the maximum of each in the training. Dividing the eigenvalues by this allows for the scale of the principal components to always be between 0 and 1. This array should have shape (number_of_components, ).

phase_unwrapping(waveform_cartesian: np.ndarray, eps: float = 0.01, set_zero_at_start: bool = True) → tuple[np.ndarray, np.ndarray][source]: Starting from an array of cartesian-form complex numbers, returns two real arrays: amplitude and phase.