Tools

This module contains helper functions and classes that are not dependent on any of DISSTANS’s classes.

For more specialized processing functions, see processing.

Functions

best_utmzone

disstans.tools.best_utmzone(longitudes)[source]

Given a list of longitudes, find the UTM zone that is appropriate.

Parameters: longitudes (ndarray) – Array of longitudes [°].
Return type: int
Returns: UTM zone at the average input longitude.

block_permutation

disstans.tools.block_permutation(n_outer, n_inner)[source]

Convenience function to calculate a permutation matrix used to rearrange (permute) blockwise-ordered submatrices in a big matrix. n_outer outside blocks of individual n_inner-sized blocks will become n_inner outside blocks of individual n_outer-sized blocks.

Transposing the result is equivalent to calling this function with swapped arguments.

Parameters

n_outer (int) – Number of sub-matrices.
n_inner (int) – Size of the individual sub-matrices.

Return type

ndarray

Returns

Square permutation matrix with dimensions \(n = \text{n_outer} * \text{n_inner}\). To permute a matrix \(A\), calculate \(~P A P^T\).

Example

>>> import numpy as np
>>> from disstans.tools import block_permutation
>>> n_outer, n_inner = 2, 2
>>> A = np.block([[np.arange(n_inner**2).reshape(n_inner, n_inner),
...                np.zeros((n_inner, n_inner))], [np.zeros((n_inner, n_inner)),
...                np.ones((n_inner, n_inner))]])
>>> A
array([[0., 1., 0., 0.],
       [2., 3., 0., 0.],
       [0., 0., 1., 1.],
       [0., 0., 1., 1.]])
>>> P = block_permutation(n_outer, n_inner)
>>> P @ A @ P.T
array([[0., 0., 1., 0.],
       [0., 1., 0., 1.],
       [2., 0., 3., 0.],
       [0., 1., 0., 1.]])

cov2corr

disstans.tools.cov2corr(cov)[source]

Function that converts a covariance matrix into a (Pearson) correlation matrix, taking into account zero-valued variances and setting the respective correlation entries to NaN.

Parameters: cov (ndarray) – Covariance matrix.
Return type: ndarray
Returns: Correlation matrix.

create_powerlaw_noise

disstans.tools.create_powerlaw_noise(size, exponent, seed=None)[source]

Creates synthetic noise according to a Power Law model [langbein04].

Parameters

size (int | list | tuple) – Number of (equally-spaced) noise samples of the output noise array or a shape where the first entry defines the number of noise samples for the remaining dimensions.
exponent (int) – Exponent of the power law noise model. E.g. 0 corresponds to white (Gaussian) noise, 1 to flicker (pink) noise, and 2 to random walk (red, Brownian) noise.
seed (UnionType[int, Generator, None], default: None) – Pass an initial seed to the random number generator, or pass a Generator instance.

Return type

ndarray

Returns

Noise output array.

Notes

This function uses Timmer and König’s [timmerkoenig95] approach to generate the noise, and Felix Patzelt’s colorednoise code to calculate the theoretical standard deviation.

References

langbein04: Langbein, J. (2004), Noise in two-color electronic distance meter measurements revisited, J. Geophys. Res., 109, B04406, doi:10.1029/2003JB002819.
timmerkoenig95: Timmer, J.; König, M. (1995), On generating power law noise, Astronomy and Astrophysics, v.300, p.707.

date2decyear

disstans.tools.date2decyear(dates)[source]

Convert dates (just year, month, day, each day assumed to be centered at noon) to decimal years, assuming all years have 365.25 days (JPL convention for GIPSY timeseries, also used by UNR NGL).

Parameters: dates (Series | DatetimeIndex | Timestamp | datetime) – Input date(s). If a Series, needs to be a series of Timestamp-convertible data types.
Return type: ndarray
Returns: Date(s) as sorted decimal year(s).

download_unr_data

disstans.tools.download_unr_data(station_list_or_bbox, data_dir, solution='final', rate='24h', reference='IGS14', min_solutions=100, t_min=None, t_max=None, verbose=False, no_pbar=False)[source]

Downloads GNSS timeseries data from the University of Nevada at Reno’s Nevada Geodetic Laboratory. When using this data, please cite [blewitt18], as well as all the original data providers (the relevant info will be downloaded as well).

Files will only be downloaded if there is no matching file already present, or the remote file is newer than the local one.

Parameters

station_list_or_bbox (list[str] | list[float]) – Defines which stations to look for data and download. It can be either a list of station names (list of strings), a list of bounding box coordinates (the four floats [lon_min, lon_max, lat_min, lat_max] in degrees), or a three-element list defining a circle (location in degrees and radius in kilometers [center_lon, center_lat, radius]).
data_dir (str) – Folder for data.
solution (Literal['final', 'rapid', 'ultra'], default: 'final') – Which timeseries solution to download. See the Notes for approximate latency times.
rate (Literal['24h', '5min'], default: '24h') – Which sample rate to download. See the Notes for a table of which rates are available for each solution.
reference (str, default: 'IGS14') – The UNR abbreviation for the reference frame in which to download the data. Applies only for daily sample rates and final or rapid orbit solutions.
min_solutions (int, default: 100) – Only consider stations with at least a certain number of all-time solutions according to the station list file.
t_min (UnionType[str, Timestamp, None], default: None) – Only consider stations that have data on or after t_min.
t_max (UnionType[str, Timestamp, None], default: None) – Only consider stations that have data on or before t_max.
verbose (bool, default: False) – If True, individual actions are printed.
no_pbar (bool, default: False) – Suppress the progress bar with True.

Return type

DataFrame

Returns

A DataFrame, built from UNR’s data holding list, subset to the stations actually selected for download.

Notes

The following combinations of solution and sample rates are available. Note that not all stations are equipped to provide all data types. Furthermore, only the daily files will be available in a plate reference frame.

orbit solutions	24 hours	5 minutes	latency
final	yes	yes	approx. 2 weeks
rapid	yes	yes	approx. 24 hours
ultra	no	yes	approx. 2 hours

Warning

It is your responsibility that different reference frames or solution types are not downloaded into the same folders, because this could lead to the overwriting of data or ambiguities as to which files represent which solutions. This is because this script does not rename files or change the folder structure that it finds on UNR’s servers.

References

blewitt18: Blewitt, G., Hammond, W., & Kreemer, C. (2018). Harnessing the GPS Data Explosion for Interdisciplinary Science. Eos, 99. doi:10.1029/2018EO104623

See also

parse_unr_steps: Function to download and parse UNR’s main step file.

estimate_euler_pole

disstans.tools.estimate_euler_pole(locations, velocities, covariances=None, enu=True)[source]

Estimate a best-fit Euler pole assuming all velocities lie on the same rigid plate on a sphere. The calculations are based on [goudarzi14].

Parameters

locations (ndarray) – Array of shape \((\text{num_stations}, \text{num_components})\) containing the locations of each station (observation), where \(\text{num_components}=2\) if the locations are given by longitudes and latitudes [°] (enu=True) or \(\text{num_components}=3\) if the locations are given in the cartesian Earth-Centered, Earth-Fixed (ECEF) reference frame [m] (enu=False).
velocities (ndarray) – Array of shape \((\text{num_stations}, \text{num_components})\) containing the velocities [m/time] at different stations (observations), where \(\text{num_components}=2\) if the velocities are given in the East-North local geodetic reference frame (enu=True) or \(\text{num_components}=3\) if the velocities are given in the cartesian Earth-Centered, Earth-Fixed (ECEF) reference frame (enu=False).
covariances (Optional[ndarray], default: None) – Array containing the (co)variances of the velocities [m^2/time^2], allowing for different input shapes depending on what uncertainties are available. If None, all observations are weighted equally. If enu=True, the array should have shape \((\text{num_stations}, 2)\) if only variances are present, \((\text{num_stations}, 3)\) if also the covariances are present but are given as a column, or \((\text{num_stations}, 2, 2)\) if the \(2 \times 2\), the arrays should be of shapes \((\text{num_stations}, 3)\), \((\text{num_stations}, 6)\), or \((\text{num_stations}, 3, 3)\), respectively.
enu (bool, default: True) – See locations and velocities.

Return type

tuple[ndarray, ndarray]

Returns

rotation_vector – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.
rotation_covariance – Formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.

Notes

The ENU solution assumes a spherical Earth with radius 6378137 meters.

If the covariances are given in columns, the formatting of Timeseries is being used.

Contrary to [goudarzi14], the estimated covariance matrix is not scaled by the a posteriori sigma, to match the covariance definition throughout the rest of DISSTANS. The time unit is also not assumed to be in years, and then scaled to millions of years.

See also

rotvec2eulerpole: Convert the rotation vector into an Euler pole and magnitude.

References

goudarzi14(1,2,3,4): Goudarzi, M. A., Cocard, M., & Santerre, R. (2014), EPC: Matlab software to estimate Euler pole parameters, GPS Solutions, 18(1), 153–162, doi:10.1007/s10291-013-0354-4.

eulerpole2rotvec

disstans.tools.eulerpole2rotvec(euler_pole, euler_pole_covariance=None)[source]

Convert an Euler pole (and optionally, its formal covariance) into a rotation vector and associated covariance matrix. Based on [goudarzi14].

Parameters

euler_pole (ndarray) – NumPy Array containing the longitude [rad], latitude [rad], and rotation rate [rad/time] of the Euler pole.
euler_pole_covariance (Optional[ndarray], default: None) – If rotation_covariance was given, the propagated uncertainty for the Euler Pole for all three components.

Return type

tuple[ndarray, ...]

Returns

rotation_vector – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.
rotation_covariance – If euler_pole_covariance was given, formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.

See also

rotvec2eulerpole: Inverse function

full_cov_mat_to_columns

disstans.tools.full_cov_mat_to_columns(cov_mat, num_components, include_covariance=False, return_single=False)[source]

Converts a full variance(-covariance) matrix with multiple components into a column-based representation like the one used by Model or Timeseries. The extraction done basically implies the assumption that the cross-parameter/cross-observation covariance is negligible.

It is assumed the the individual elements are ordered such that all components of one parameter or observation are in neighboring rows/columns (i.e. the first parameter or observation occupies the first num_components rows/columns, the second one the second num_components rows/columns, etc.).

Parameters

cov_mat (ndarray) – Square array with dimensions \(\text{num_elements} * \text{num_components}\) where \(\text{num_elements}\) is the number of elements (e.g. observations or parameters) in each of the \(\text{num_components}\) dimensions.
num_components (int) – Number of components cov_mat contains.
include_covariance (bool, default: False) – If True, also extract the off-diagonal covariances of each element between its components. Defaults to False, i.e. only the diagonal covariances.
return_single (bool, default: False) – If False, return two arrays; if True, concatenate the two.

Return type

tuple[ndarray, ...]

Returns

variance – Array of shape \((\text{num_elements}, \text{num_components})\). If include_covariance=True and return_single=True, this array is concatenated horizontally with covariance, leading to \((\text{num_elements}, (\text{num_components}*(\text{num_components}-1))/2)\) columns instead.
covariance – If include_covariance=True and return_single=False, array of shape \(\text{num_components}) + (\text{num_components}*(\text{num_components}-1))/2\).

get_cov_dims

disstans.tools.get_cov_dims(num_components)[source]

Given a number of components, return the number of covariances that exist between the components.

Parameters: num_components (int) – Number of components of timeseries or model.
Return type: int
Returns: Number of covariances, calculated as \(\text{num_components}*(\text{num_components}-1))/2\).

See also

make_cov_index_map: For an example.

make_cov_index_map

disstans.tools.make_cov_index_map(num_components)[source]

Given a number of components, create a matrix that shows the indexing of where covariance columns present in a timeseries’ or model’s 2D dataframe will show in the covariance matrix of a single observation or parameter. Also provides the ordering in a 1D array which can be used together with reshape() to create the varaiance-covariance matrix from the columns.

Parameters

num_components (int) – Number of components of timeseries or model.

Return type

tuple[ndarray, ndarray]

Returns

index_map – Array of shape \((\text{num_components}, \text{num_components})\) that is NaN everywhere except in the upper triangle, where integer numbers denote where the column of a timseries’ or model’s 2D dataframe belong.
var_cov_map – Array of shape \((\text{num_components}^2, )\) that can be used to assemble the variance-covariance matrix from the columns given a particular timestep or parameter.

Example

>>> import numpy as np
>>> from disstans.tools import get_cov_dims, make_cov_index_map
>>> num_observations, num_components = 5, 2
>>> print(f"For {num_components} components, there should be:\n"
...       f"- {num_components} data columns,\n"
...       f"- {num_components} variance columns,\n"
...       f"- and {get_cov_dims(num_components)} covariance columns.")
For 2 components, there should be:
- 2 data columns,
- 2 variance columns,
- and 1 covariance columns.
>>> index_map, var_cov_map = make_cov_index_map(num_components)
>>> test_varcov = np.stack([np.ones(5), np.arange(5)*2, np.ones(5)*0.5], axis=1)
>>> test_varcov
array([[1. , 0. , 0.5],
       [1. , 2. , 0.5],
       [1. , 4. , 0.5],
       [1. , 6. , 0.5],
       [1. , 8. , 0.5]])

The first two columns are the variances, and the third column is the covariance column (since there is only one possible covariance). index_map will show where the covariance columns fit into, indexed from 0 to get_cov_dims(num_components) - 1. Since there is only one, the column index 0 will feature in the upper right corner:

>>> index_map
array([[nan,  0.],
       [nan, nan]])

If we want the full, symmetric variance-covariance matrix for the third observation, we use var_cov_map:

>>> var_cov_map
array([0, 2, 2, 1])
>>> test_varcov[2, var_cov_map].reshape(num_components, num_components)
array([[1. , 0.5],
       [0.5, 4. ]])

get_cov_indices

disstans.tools.get_cov_indices(icomp, index_map=None, num_components=None)[source]

Given a data or variance component index, retrieve the indices in the covariance columns of a timeseries or model that are associated with that component. Exactly one of index_map or num_components must be provided as input.

Parameters

icomp (int) – Index of the component.
index_map (Optional[ndarray], default: None) – Output of make_cov_index_map().
num_components (Optional[int], default: None) – Number of components of timeseries or model. (Function will call make_cov_index_map() to get index_map.)

Return type

list[int]

Returns

List of integer covariance column indices associated with icomp.

Example

In a 3D dataset, the second component is associated with two covariances - between the first and the second, and the second and the third. In a timeseries or model covariance dataframe, this corresponds to the following columns:

>>> from disstans.tools import get_cov_indices
>>> get_cov_indices(1, num_components=3)
[0, 2]

get_hom_vel_strain_rot

disstans.tools.get_hom_vel_strain_rot(locations, velocities, covariances=None, utmzone=None, reference=0)[source]

For a set of horizontal velocities on a 2D cartesian grid, estimate the best-fit displacement gradient matrix to calculate a homogenous velocity field characterized by a single translation vector, strain tensor, and rotation tensor. See [tape09] for an introduction.

This function uses a local approximation to the spherical Earth by converting all station locations into a suitable UTM zone, and only considering the horizontal velocities.

Parameters

locations (ndarray) – Array of shape \((\text{num_stations}, 2)\) containing the longitude and latitude [°] of the observations (stations).
velocities (ndarray) – Array of shape \((\text{num_stations}, 2)\) containing the East and North velocities [m/time] of the observations
covariances (Optional[ndarray], default: None) – Array of shape \((\text{num_stations}, 2)\) containing the variances in the East and North velocities [m^2/time^2]. Alternatively, array of shape \((\text{num_stations}, 3)\) additionally containing the East-North covariance [m2/time^2].
utmzome – If provided, the UTM zone to use for the horizontal approximation. If None, the average longitude will be calculated, and the respective UTM zone will be used.
reference (int | list, default: 0) – Reference station to be used by the calculation. This can be either a longitude-latitude [°] list, or the index of the reference station in locations.

Return type

tuple[ndarray, ndarray, ndarray]

Returns

v_O – Velocity of the origin \(\mathbf{v}_O\).
epsilon – \(2 \times 2\) strain tensor \(\mathbf{\varepsilon}\).
omega – \(2 \times 2\) rotation tensor \(\mathbf{\omega}\).

See also

strain_rotation_invariants: For calculation of invariants of the tensors.

References

tape09(1,2): Tape, C., Musé, P., Simons, M., Dong, D., & Webb, F. (2009), Multiscale estimation of GPS velocity fields, Geophysical Journal International, 179(2), 945–971, doi:10.1111/j.1365-246X.2009.04337.x.

parallelize

disstans.tools.parallelize(func, iterable, num_threads=None, chunksize=1)[source]

Convenience wrapper that given a function, an iterable set of inputs and parallelization settings automatically either runs the function in serial or parallel.

Warning

By default on most systems, NumPy will already use multiple cores and threads in its routines (you can check this by running some very large and time-consuming math, and monitoring the usage of your processors). Just using multiple Python threads will give the default number of threads to all new Python threads, completely overloading the system since it’s now out of processors, slowing down the computations by a lot. The Python multiprocessing module does not change these settings, since it is apparently hard to guess which backend NumPy uses, see this thread on GitHub. So, it is sadly currently up to the user to disable this behavior when using multiple Python threads as achieved with this function. For example, this snipped might be enough to put at the beginning of a script: import os; os.environ['OMP_NUM_THREADS'] = '1'. Then, the number of DISSTANS cores can be set by e.g. import disstans; disstans.defaults["general"]["num_threads"] = 10. Another important note is that if you’re experiencing problems when running a script, make sure the settings and the rest of the script are encapsulated in the standard if __name__ == "__main__": ... clause.

Parameters

func (Callable[[Any], Any]) – Function to wrap, can only have a single input argument.
iterable (Iterable) – Iterable object (list, generator expression, etc.) that contains all the arguments that func should be called with.
num_threads (Optional[int], default: None) – Number of threads to use. Set to 0 if no parallelization is desired. None defaults to the value in defaults.
chunksize (int, default: 1) – Chunk size used in the parallelization pool, see imap().

Yields

result – Whenever a result is calculated, return it.

Return type

Iterator[Any]

Example

Consider a simple loop to multiply two numbers:

>>> from numpy import sum
>>> iterable = [(1, 2), (2, 3)]
>>> print([sum(i) for i in iterable])
[3, 5]

In parallel with 2 threads, this could look like this:

>>> from multiprocessing import Pool
>>> with Pool(2) as p:
...     print([result for result in p.imap(sum, iterable)])
...
[3, 5]

Using parallelize(), both cases simplify to:

>>> from disstans.tools import parallelize
>>> print([result for result in parallelize(sum, iterable, num_threads=0)])
[3, 5]
>>> print([result for result in parallelize(sum, iterable, num_threads=2)])
[3, 5]

parse_maintenance_table

disstans.tools.parse_maintenance_table(csvpath, sitecol, datecols, siteformatter=None, delimiter=',', codecol=None, exclude=None, include=None, verbose=False)[source]

Function that loads a maintenance table from a .csv file (or similar) and returns a list of step times for each station. It also provides an interface to ignore certain maintenance codes (if present), and modify the site names when loading.

Parameters

csvpath (str) – Path of the file to load.
sitecol (int) – Column index of the station names.
datecols (list) – List of indices that contain the ingredients to convert the input to a valid Timestamp. It should fail gracefully, i.e. return a string if Pandas cannot interpret the column(s) appropriately.
siteformatter (Optional[Callable[[str], str]], default: None) – Function that will be called element-wise on the loaded station names to produce the output station names.
delimiter (str, default: ',') – Delimiter character for the input file.
codecol (Optional[int], default: None) – Column index of the maintenance code.
exclude (Optional[list[str]], default: None) – Maintenance records that exactly match an element in exclude will be ignored. codecol has to be set.
include (Optional[list[str]], default: None) – Only maintenance records that include an element of include will be used. No exact match is required. codecol has to be set.
verbose (bool, default: False) – If True, print loading information.

Return type

tuple[DataFrame, dict[str, list]]

Returns

maint_table – Parsed maintenance table.
maint_dict – Dictionary of that maps the station names to a list of steptimes.

Notes

If running into problems, also consult the Pandas read_csv() function (used to load the csvpath file) and DataFrame (object on which the filtering happens).

parse_unr_steps

disstans.tools.parse_unr_steps(filepath, check_update=True, only_stations=None, verbose=False)[source]

This functions parses the main step file from UNR and produces two step databases, one for maintenance and one for earthquake-related events. If a newer step file is found online, the local copy is updated.

See download_unr_data() for more information about UNR’s dataset, as well as how to access and cite it.

Parameters

filepath (str) – Path to the step file.
check_update (bool, default: True) – If True, check UNR’s server for an updated step file.
only_stations (Optional[list[str]], default: None) – If specified, a list of station IDs. Other stations are not included in the output.
verbose (bool, default: False) – If True, print actions.

Return type

tuple[DataFrame, dict[str, list], DataFrame, dict[str, list]]

Returns

maint_table – Parsed maintenance table.
maint_dict – Dictionary of that maps the station names to a list of maintenance steptimes.
eq_table – Parsed earthquake table.
eq_dict – Dictionary of that maps the station names to a list of earthquake-related steptimes.

R_ecef2enu

disstans.tools.R_ecef2enu(lon, lat)[source]

Generate the rotation matrix used to express a vector written in ECEF (XYZ) coordinates as a vector written in local east, north, up (ENU) coordinates at the position defined by geodetic latitude and longitude. See Chapter 4 and Appendix 4.A in [misraenge2010] for details.

Parameters

lon (float) – Longitude [°] of vector position.
lat (float) – Latitude [°] of vector position.

Return type

ndarray

Returns

The 3-by-3 rotation matrix.

See also

R_enu2ecef: The inverse matrix.

References

misraenge2010: Misra, P., & Enge, P. (2010), Global Positioning System: Signals, Measurements, and Performance, Lincoln, Mass: Ganga-Jamuna Press.

R_enu2ecef

disstans.tools.R_enu2ecef(lon, lat)[source]

Generate the rotation matrix used to express a vector written in local ENU coordinates as a vector written ECEF (XYZ) coordinates at the position defined by geodetic latitude and longitude. This is the transpose of the rotation matrix computed by R_ecef2enu().

Parameters

lon (float) – Longitude [°] of vector position.
lat (float) – Latitude [°] of vector position.

Return type

ndarray

Returns

The 3-by-3 rotation matrix.

See also

R_ecef2enu: The inverse matrix.

rotvec2eulerpole

disstans.tools.rotvec2eulerpole(rotation_vector, rotation_covariance=None)[source]

Convert a rotation vector containing the diagonals of a \(3 \times 3\) rotation matrix (and optionally, its formal covariance) into an Euler Pole and associated magnitude. Based on [goudarzi14].

Parameters

rotation_vector (ndarray) – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.
rotation_covariance (Optional[ndarray], default: None) – Formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.

Return type

tuple[ndarray, ...]

Returns

euler_pole – NumPy Array containing the longitude [rad], latitude [rad], and rotation rate [rad/time] of the Euler pole.
euler_pole_covariance – If rotation_covariance was given, a NumPy Array of the propagated uncertainty for the Euler Pole for all three components.

See also

eulerpole2rotvec: Inverse function

strain_rotation_invariants

disstans.tools.strain_rotation_invariants(epsilon=None, omega=None)[source]

Given a strain (rate) and/or rotation (rate) tensor, calculate scalar invariant quantities of interest. See [tape09] for an introduction.

Parameters

epsilon (Optional[ndarray], default: None) – Strain (rate) tensor \(\mathbf{\varepsilon}\).
omega (Optional[ndarray], default: None) – Rotation (rate) tensor \(\mathbf{\omega}\).

Return type

tuple[float, ...]

Returns

dilatation – Only if epsilon is provided. Scalar dilatation (rate) as defined by the first invariant of the strain (rate) tensor \(\Theta = \text{Tr} \left( \mathbf{\varepsilon} \right)\).
strain – Only if epsilon is provided. Scalar strain (rate) as defined by the Frobenius norm of the strain (rate) tensor \(\Sigma = \lVert \mathbf{\varepsilon} \rVert_F\)
shear – Only if epsilon is provided. Scalar shearing (rate) as defined by the square root of the second invariant of the deviatoric strain (rate) tensor \(\text{T} = \sqrt{\frac{1}{2} \text{Tr}(\mathbf{\varepsilon}^2) - \frac{1}{6} \text{Tr}(\mathbf{\varepsilon})^2}\).
rotation – Only if omega is provided. Scalar rotation (rate) as defined by \(\Omega = \frac{1}{\sqrt{2}} \lVert \mathbf{\omega} \rVert_F\).

tvec_to_numpycol

disstans.tools.tvec_to_numpycol(timevector, t_reference=None, time_unit='D')[source]

Converts a Pandas timestamp series into a NumPy array of relative time to a reference time in the given time unit.

Parameters

timevector (Series | DatetimeIndex) – Series of Timestamp or alternatively a DatetimeIndex of when to evaluate the model.
t_reference (UnionType[str, Timestamp, None], default: None) – Reference Timestamp or datetime-like string that can be converted to one. None chooses the first element of timevector.
time_unit (Optional[str], default: 'D') – Time unit for parameters. Refer to Timedelta for more details.

Return type

ndarray

Returns

Array of time differences.

weighted_median

disstans.tools.weighted_median(values, weights, axis=0, percentile=0.5, keepdims=False, visualize=False)[source]

Calculates the weighted median along a given axis.

Parameters

values (ndarray) – Values to calculate the medians for.
weights (ndarray) – Weights of each value along the given axis.
axis (int, default: 0) – Axis along which to calculate the median.
percentile (float, default: 0.5) – Changes the percentile (between 0 and 1) of which median to calculate.
keepdims (bool, default: False) – If True, squeezes out the axis along which the median was calculated.
visualize (bool, default: False) – If True, show a plot of the weighted median calculation.

Return type

ndarray

Returns

Weighted median of input.

Classes

Click

class disstans.tools.Click(ax, func, button=MouseButton.LEFT)[source]

Class that enables a GUI to distinguish between clicks (mouse press and release) and dragging event (mouse press, move, then release).

Parameters

ax – Axis on which to look for clicks.
func – Function to call, with the Matplotlib clicking Event as its first argument.
button (default: <MouseButton.LEFT: 1>) – Which mouse button to operate on, see MouseButton for accepted values.

RINEXDataHolding

class disstans.tools.RINEXDataHolding(df=None)[source]

Container class for a database of RINEX files.

A new object can be created by one of the two classmethods:

From one or multiple folder(s) using from_folders()
From a previously-saved file using from_file()

An object can be saved by using Pandas’ to_pickle() on the instance’s df attribute (it is recommended to add the .gz extension to enable compression).

The location information and availability metrics can be saved in the same way. To load a previously-saved file, you can use the convenience functions load_locations_from_file() and load_metrics_from_file(), specify the respective paths in the call to from_file(), or alternatively, load the data directly with Pandas and assign it to the respective instance attributes.

COLUMNS = ('station', 'station_raw', 'year', 'day', 'date', 'sequence', 'type', 'compression', 'filesize', 'filetimeutc', 'network', 'basefolder'): The necessary information about each RINEX file.

COMPRFILEEXTS = ('.Z', '.gz'): The valid (compressed) RINEX file extensions to search for.

GLOBPATTERN = '[0-9][0-9][0-9][0-9]/[0-9][0-9][0-9]/*': The YYYY/DDD folder pattern in a glob-readable format.

METRICCOLS = ('number', 'age', 'recency', 'length', 'reliability'): The metrics that can be calculated.

RINEXPATTERN = '(?P<site>\\w{4})(?P<day>\\d{3})(?P<sequence>\\w{1})\\.(?P<yy>\\d{2})(?P<type>\\w{1})\\.(?P<compression>\\w+)': The regex-style filename pattern for RINEX files.

calculate_availability_metrics(sampling=Timedelta('1 days 00:00:00'))[source]

Calculates the following metrics and stores them in the metrics DataFrame:

'number': Number of available observations.
'age': Time of first observation.
'recency': Time of last observation.
'length': Time between first and last observation.
'reliability': Reliability defined as number of observations divided by the maximum amount of possible observations between the first and last acquisition given the assumed sampling interval of the data.

Parameters: sampling (Timedelta, default: Timedelta('1 days 00:00:00')) – Assumed sampling frequency of the data files.
Return type: None

property df: DataFrame: Pandas DataFrame object containing the RINEX files database.

classmethod from_file(db_file, locations_file=None, metrics_file=None, verbose=False)[source]

Convenience class method that creates a new RINEXDataHolding object from a file using load_db_from_file() and then optionally loads the locations and metrics from their respective files.

Parameters

db_file (str) – Path of the main file.
locations_file (Optional[str], default: None) – Path of the locations file.
metrics_file (Optional[str], default: None) – Path of the metrics file.
verbose (bool, default: False) – If True, print database size and a sample entry.

Return type

RINEXDataHolding

Returns

The newly created RINEXDataHolding object.

classmethod from_folders(folders, verbose=False, no_pbar=False)[source]

Convenience class method that creates a new RINEXDataHolding object and directly calls load_db_from_folders().

Parameters

folders (tuple | list[tuple]) – Folder(s) in which the different year-folders are found, formatted as a single tuple or a list of tuples with name and respective folder ([('network1', '/folder/one/'), ...]).
verbose (bool, default: False) – If True, print final database size and a sample entry.
no_pbar (bool, default: False) – Suppress the progress bar with True.

Return type

RINEXDataHolding

Returns

The newly created RINEXDataHolding object.

get_files_by(station=None, network=None, year=None, between=None, verbose=False)[source]

Return a subset of the database by criteria.

Parameters

station (UnionType[str, list[str], None], default: None) – Return only files of this/these station(s).
network (UnionType[str, list[str], None], default: None) – Return only files of this/these network(s).
year (UnionType[int, list[int], None], default: None) – Return only files of this/these year(s).
between (Optional[tuple], default: None) – Return only files between the start and end date (inclusive) given by the length-two tuple.
verbose (bool, default: False) – If True, print the number of selected entries.

Return type

DataFrame

Returns

The DataFrame subset.

get_location(station, lla=True)[source]

Returns the approximate location of a station.

Parameters

station (str) – Name of the station
lla (bool, default: True) – If True, returns the coordinates in Longitude [°], Latitude [°] & Altitude [m], otherwise in XYZ [m] coordinates.

Return type

Series

Returns

The location of the station in the specified coordinate system.

get_rinex_header(filepath)[source]

Open a RINEX file, read the header, and format it as a dictionary. No data type conversion or stripping of whitespaces is performed.

Parameters: filepath (str) – Path to RINEX file.
Return type: dict[str, str]
Returns: Dictionary of header lines.

property list_stations: list[str]: List of stations in the database.

load_db_from_file(db_file, verbose=False)[source]

Loads a RINEXDataHolding object from a pickled Pandas DataFrame file.

Parameters

db_file (str) – Path of the main file.
verbose (bool, default: False) – If True, print database size and a sample entry.

Return type

None

load_db_from_folders(folders, verbose=False, no_pbar=False)[source]

Loads a RINEX database from folders in the file system. The data should be located in one or multiple folder structure(s) organized by YYYY/DDD, where YYYY is a four-digit year and DDD is the three-digit day of the year.

Parameters

folders (tuple | list[tuple]) – Folder(s) in which the different year-folders are found, formatted as a single tuple or a list of tuples with name and respective folder ([('network1', '/folder/one/'), ...]).
verbose (bool, default: False) – If True, print final database size and a sample entry.
no_pbar (bool, default: False) – Suppress the progress bar with True.

Return type

None

load_locations_from_file(filepath)[source]

Load a previously-saved DataFrame containing the locations of each station.

Parameters: filepath (str) – Path to the pickled DataFrame.
Return type: None

load_locations_from_rinex(keep='last', replace_not_found=False, no_pbar=True)[source]

Scan the RINEX files’ headers for approximate locations for plotting purposes.

Parameters

keep (Literal['last', 'first', 'mean'], default: 'last') – Determine which location to use. Possible values are 'last' (only scan the most recent file), 'first' (only scan the oldest file) or 'mean' (load all files and calculate average). Note that 'mean' could take a substantial amount of time, since all files have to opened, decompressed and searched.
replace_not_found (bool, default: False) – If a location is not found and replace_not_found=True, the location of Null Island (0° Longitude, 0° Latitude) is used and a warning is issued. If False, an error is raised instead.
no_pbar (bool, default: True) – Suppress the progress bar with True.

Return type

None

load_metrics_from_file(filepath)[source]

Load a previously-saved DataFrame containing the calculated availability metrics.

Parameters: filepath (str) – Path to the pickled DataFrame.
Return type: None

property locations_lla: DataFrame: Approximate positions of stations in WGS-84 (longitude [°], latitude [°], altitude [m]) coordinates.

property locations_xyz: DataFrame: Dataframe of approximate positions of stations in WGS-84 (x, y, z) [m] coordinates.

make_filenames(db)[source]

Recreate the full paths to the individual rinex files from the database or a subset thereof.

Parameters: db (DataFrame) – df or a subset thereof.
Return type: list[str]
Returns: List of paths.
Raises: NotImplementedError – If GLOBPATTERN or RINEXPATTERN for this instance are not the same as the default values. In this case, redefine this function with the appropriate folder and file patterns.

property metrics: DataFrame: Contains the station metric calculated by calculate_availability_metrics().

property num_files: int: Number of files in the database.

property num_stations: int: Number of stations in the database.

plot_availability(sampling=Timedelta('1 days 00:00:00'), sort_by_latitude=True, saveas=None)[source]

Create an availability figure for the dataset.

Parameters

sampling (Timedelta, default: Timedelta('1 days 00:00:00')) – Assume that breaks strictly larger than sampling constitute a data gap.
sort_by_latitude (bool, default: True) – If True, sort the stations by latitude, else alphabetical. (Always falls back to alphabetical if location information is missing.)
saveas (str, default: None) – If provided, the figure will be saved at this location.

Return type

None

plot_map(metric=None, orientation='horizontal', annotate_stations=True, figsize=None, saveas=None, dpi=None, gui_kw_args={})[source]

Plot a map of all the stations present in the RINEX database. The markers can be colored by the different availability metrics calculated by calculate_availability_metrics().

Parameters

metric (Optional[str], default: None) – Calculate the marker color (and respective colormap) given a certain metric. If None, no color is applied.
orientation (Literal['horizontal', 'vertical'], default: 'horizontal') – Colorbar orientation, see colorbar().
annotate_stations (bool, default: True) – If True, add the station names to the map.
figsize (Optional[tuple], default: None) – Set the figure size (width, height) in inches.
saveas (Optional[str], default: None) – If provided, the figure will be saved at this location.
dpi (Optional[float], default: None) – Use this DPI for saved figures.
gui_kw_args (dict[str, Any], default: {}) – Override default GUI settings of defaults.

Return type

None

Timedelta

class disstans.tools.Timedelta(*args, **kwargs)[source]

static __new__(cls, *args, **kwargs)[source]

DISSTANS Timedelta subclassed from Timedelta but with support for the 'Y' year time unit, defined as always exactly 365.25 days. Other possible values are:

W, D, days, day, hours, hour, hr, h, m, minute, min, minutes, T, S, seconds, sec, second, ms, milliseconds, millisecond, milli, millis, L, us, microseconds, microsecond, micro, micros, U, ns, nanoseconds, nano, nanos, nanosecond, N