Data management
Management of data which can be saved in an h5 file.
The idea of these data structures is to provide a framework so that a lot of heterogeneous data can be saved to the _same_ h5 file coherently, as a separate group.
Each subclass of SavableData has a default group name
which will become the name of the h5 file group in which
that data is saved.
- class SavableData[source]
Generic container for data which might need to be saved to file.
Subclasses should also be decorated with
@dataclass.Instances should only store numpy arrays, so that the h5py compatibility will work.
- class Residuals(amplitude_residuals: numpy.ndarray, phase_residuals: numpy.ndarray)[source]
Dataclass which contains a set of sample frequencies as well as amplitude and phase residuals.
- Parameters
amplitude_residuals (np.ndarray) – Amplitude residuals. This array should have shape
(number_of_waveforms, number_of_amplitude_sample_points).phase_residuals (np.ndarray) – Phase residuals. This array should have shape
(number_of_waveforms, number_of_phase_sample_points).
- Class Attributes
group_name (str = “residuals”) – Name of the group in the h5 file these will be saved in.
- property combined: numpy.ndarray
Combine the amplitude and phase residuals into a single array, with shape
(number_of_waveforms, number_of_amplitude_sample_points+number_of_phase_sample_points).- Returns
Combined residuals.
- Return type
np.ndarray
- flatten_phase(frequencies: numpy.ndarray, first_section_flat: float = 0.2) numpy.ndarray[source]
Subtract a linear term from the phase, such that it is often close to 0.
- Parameters
frequencies (np.ndarray) – Frequencies to which the phase points correspond. Required for the linear term subtraction.
first_section_flat (float, optional) – The linear term is chosen so that the first phase residual is zero, and so is the one corresponding to this fraction of the frequencies. Defaults to 0.2.
Default:0.2- Returns
timeshifts – Timeshifts, in seconds if the frequencies given are in Hz,
- Return type
np.ndarray
- classmethod from_combined_residuals(combined_residuals: np.ndarray, numbers_of_points: tuple[int, int]) Residuals[source]
Generate object from a
np.ndarraycontaining the combined residuals: amplitude and phase appended to each other.The number of points these will each contain is given as the argument
numbers_of_points == (amp_points, phase_points).
- classmethod from_two_waveform_datasets(waveforms_1: mlgw_bns.data_management.FDWaveforms, waveforms_2: mlgw_bns.data_management.FDWaveforms)[source]
Create a dataset of residuals corresponding to the difference between the two given datasets.
- Parameters
waveforms_1 (FDWaveforms) –
waveforms_2 (FDWaveforms) –
- class DownsamplingIndices(amplitude_indices: list[int], phase_indices: list[int])[source]
Indices to be used to select a subset of frequencies at which the waveform’s amplitude and phase can be sampled while retaining a good degree of accuracy.
The corresponding frequencies are the ones obtained when slicing the “standard frequency array” at those indices.
This array can be obtained by accessing the
frequencies(natural units) orfrequencies_hz(SI units) property of aDatasetinstance.It is an equally-spaced frequency array, starting at an initial frequency, finishing at half of the time-domain interpolation rate, with a step equal to the inverse of the length of the time-domain waveform.
- class FDWaveforms(amplitudes: numpy.ndarray, phases: numpy.ndarray)[source]
Dataclass which contains the amplitude and phase of a set of frequency-domain waveforms.
- Parameters
amplitudes (np.ndarray) – Amplitude of the waveforms. An array with shape
(n_waveforms, n_samples_amp).phases (np.ndarray) – Phases of the waveforms. An array with shape
(n_waveforms, n_samples_phi).
- Class Attributes
group_name (str) – Defaults to “waveforms”.
- class PrincipalComponentData(eigenvectors: numpy.ndarray, eigenvalues: numpy.ndarray, mean: numpy.ndarray, principal_components_scaling: numpy.ndarray)[source]
Dataclass which contains all the data required for a PCA model to work: eigenvalues and eigenvectors of the covariance matrix, mean of the data, and reference scaling for the principal component reperesentation.
In the parameter definitions, the number of dimensions is the \(N\) such that each data point belongs to \(\mathbb{R}^N\), while the number of components, typically denoted as \(K\), is the number of the principal components we choose to keep when reducing the dimensionality of the data.
- Parameters
eigenvectors (np.ndarray) – Eigenvectors from the PCA. This array should have shape
(number_of_dimensions, number_of_components).eigenvalues (np.ndarray) – Eigenvalues from the PCA. This array should have shape
(number_of_components, ).mean (np.ndarray) – Mean subtracted from the data before decomposing the covariance matrix. This array should have shape
(number_of_dimensions, ).principal_components_scaling (np.ndarray) – Scale by which to divide the principal components, typically computed as the maximum of each in the training. Dividing the eigenvalues by this allows for the scale of the principal components to always be between 0 and 1. This array should have shape
(number_of_components, ).