Tools

This module contains helper functions and classes that are not dependent on any of DISSTANS’s classes.

For more specialized processing functions, see processing.

Functions

best_utmzone

disstans.tools.best_utmzone(longitudes)[source]

Given a list of longitudes, find the UTM zone that is appropriate.

Parameters

longitudes (ndarray) – Array of longitudes [°].

Return type

int

Returns

UTM zone at the average input longitude.

block_permutation

disstans.tools.block_permutation(n_outer, n_inner)[source]

Convenience function to calculate a permutation matrix used to rearrange (permute) blockwise-ordered submatrices in a big matrix. n_outer outside blocks of individual n_inner-sized blocks will become n_inner outside blocks of individual n_outer-sized blocks.

Transposing the result is equivalent to calling this function with swapped arguments.

Parameters
  • n_outer (int) – Number of sub-matrices.

  • n_inner (int) – Size of the individual sub-matrices.

Return type

ndarray

Returns

Square permutation matrix with dimensions \(n = \text{n_outer} * \text{n_inner}\). To permute a matrix \(A\), calculate \(~P A P^T\).

Example

>>> import numpy as np
>>> from disstans.tools import block_permutation
>>> n_outer, n_inner = 2, 2
>>> A = np.block([[np.arange(n_inner**2).reshape(n_inner, n_inner),
...                np.zeros((n_inner, n_inner))], [np.zeros((n_inner, n_inner)),
...                np.ones((n_inner, n_inner))]])
>>> A
array([[0., 1., 0., 0.],
       [2., 3., 0., 0.],
       [0., 0., 1., 1.],
       [0., 0., 1., 1.]])
>>> P = block_permutation(n_outer, n_inner)
>>> P @ A @ P.T
array([[0., 0., 1., 0.],
       [0., 1., 0., 1.],
       [2., 0., 3., 0.],
       [0., 1., 0., 1.]])

cov2corr

disstans.tools.cov2corr(cov)[source]

Function that converts a covariance matrix into a (Pearson) correlation matrix, taking into account zero-valued variances and setting the respective correlation entries to NaN.

Parameters

cov (ndarray) – Covariance matrix.

Return type

ndarray

Returns

Correlation matrix.

create_powerlaw_noise

disstans.tools.create_powerlaw_noise(size, exponent, seed=None)[source]

Creates synthetic noise according to a Power Law model [langbein04].

Parameters
  • size (int | list | tuple) – Number of (equally-spaced) noise samples of the output noise array or a shape where the first entry defines the number of noise samples for the remaining dimensions.

  • exponent (int) – Exponent of the power law noise model. E.g. 0 corresponds to white (Gaussian) noise, 1 to flicker (pink) noise, and 2 to random walk (red, Brownian) noise.

  • seed (UnionType[int, Generator, None], default: None) – Pass an initial seed to the random number generator, or pass a Generator instance.

Return type

ndarray

Returns

Noise output array.

Notes

This function uses Timmer and König’s [timmerkoenig95] approach to generate the noise, and Felix Patzelt’s colorednoise code to calculate the theoretical standard deviation.

References

langbein04

Langbein, J. (2004), Noise in two-color electronic distance meter measurements revisited, J. Geophys. Res., 109, B04406, doi:10.1029/2003JB002819.

timmerkoenig95

Timmer, J.; König, M. (1995), On generating power law noise, Astronomy and Astrophysics, v.300, p.707.

date2decyear

disstans.tools.date2decyear(dates)[source]

Convert dates (just year, month, day, each day assumed to be centered at noon) to decimal years, assuming all years have 365.25 days (JPL convention for GIPSY timeseries, also used by UNR NGL).

Parameters

dates (Series | DatetimeIndex | Timestamp | datetime) – Input date(s). If a Series, needs to be a series of Timestamp-convertible data types.

Return type

ndarray

Returns

Date(s) as sorted decimal year(s).

download_unr_data

disstans.tools.download_unr_data(station_list_or_bbox, data_dir, solution='final', rate='24h', reference='IGS14', min_solutions=100, t_min=None, t_max=None, verbose=False, no_pbar=False)[source]

Downloads GNSS timeseries data from the University of Nevada at Reno’s Nevada Geodetic Laboratory. When using this data, please cite [blewitt18], as well as all the original data providers (the relevant info will be downloaded as well).

Files will only be downloaded if there is no matching file already present, or the remote file is newer than the local one.

Parameters
  • station_list_or_bbox (list[str] | list[float]) – Defines which stations to look for data and download. It can be either a list of station names (list of strings), a list of bounding box coordinates (the four floats [lon_min, lon_max, lat_min, lat_max] in degrees), or a three-element list defining a circle (location in degrees and radius in kilometers [center_lon, center_lat, radius]).

  • data_dir (str) – Folder for data.

  • solution (Literal['final', 'rapid', 'ultra'], default: 'final') – Which timeseries solution to download. See the Notes for approximate latency times.

  • rate (Literal['24h', '5min'], default: '24h') – Which sample rate to download. See the Notes for a table of which rates are available for each solution.

  • reference (str, default: 'IGS14') – The UNR abbreviation for the reference frame in which to download the data. Applies only for daily sample rates and final or rapid orbit solutions.

  • min_solutions (int, default: 100) – Only consider stations with at least a certain number of all-time solutions according to the station list file.

  • t_min (UnionType[str, Timestamp, None], default: None) – Only consider stations that have data on or after t_min.

  • t_max (UnionType[str, Timestamp, None], default: None) – Only consider stations that have data on or before t_max.

  • verbose (bool, default: False) – If True, individual actions are printed.

  • no_pbar (bool, default: False) – Suppress the progress bar with True.

Return type

DataFrame

Returns

A DataFrame, built from UNR’s data holding list, subset to the stations actually selected for download.

Notes

The following combinations of solution and sample rates are available. Note that not all stations are equipped to provide all data types. Furthermore, only the daily files will be available in a plate reference frame.

orbit solutions

24 hours

5 minutes

latency

final

yes

yes

approx. 2 weeks

rapid

yes

yes

approx. 24 hours

ultra

no

yes

approx. 2 hours

Warning

It is your responsibility that different reference frames or solution types are not downloaded into the same folders, because this could lead to the overwriting of data or ambiguities as to which files represent which solutions. This is because this script does not rename files or change the folder structure that it finds on UNR’s servers.

References

blewitt18

Blewitt, G., Hammond, W., & Kreemer, C. (2018). Harnessing the GPS Data Explosion for Interdisciplinary Science. Eos, 99. doi:10.1029/2018EO104623

See also

parse_unr_steps

Function to download and parse UNR’s main step file.

estimate_euler_pole

disstans.tools.estimate_euler_pole(locations, velocities, covariances=None, enu=True)[source]

Estimate a best-fit Euler pole assuming all velocities lie on the same rigid plate on a sphere. The calculations are based on [goudarzi14].

Parameters
  • locations (ndarray) – Array of shape \((\text{num_stations}, \text{num_components})\) containing the locations of each station (observation), where \(\text{num_components}=2\) if the locations are given by longitudes and latitudes [°] (enu=True) or \(\text{num_components}=3\) if the locations are given in the cartesian Earth-Centered, Earth-Fixed (ECEF) reference frame [m] (enu=False).

  • velocities (ndarray) – Array of shape \((\text{num_stations}, \text{num_components})\) containing the velocities [m/time] at different stations (observations), where \(\text{num_components}=2\) if the velocities are given in the East-North local geodetic reference frame (enu=True) or \(\text{num_components}=3\) if the velocities are given in the cartesian Earth-Centered, Earth-Fixed (ECEF) reference frame (enu=False).

  • covariances (Optional[ndarray], default: None) – Array containing the (co)variances of the velocities [m^2/time^2], allowing for different input shapes depending on what uncertainties are available. If None, all observations are weighted equally. If enu=True, the array should have shape \((\text{num_stations}, 2)\) if only variances are present, \((\text{num_stations}, 3)\) if also the covariances are present but are given as a column, or \((\text{num_stations}, 2, 2)\) if the \(2 \times 2\), the arrays should be of shapes \((\text{num_stations}, 3)\), \((\text{num_stations}, 6)\), or \((\text{num_stations}, 3, 3)\), respectively.

  • enu (bool, default: True) – See locations and velocities.

Return type

tuple[ndarray, ndarray]

Returns

  • rotation_vector – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.

  • rotation_covariance – Formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.

Notes

The ENU solution assumes a spherical Earth with radius 6378137 meters.

If the covariances are given in columns, the formatting of Timeseries is being used.

Contrary to [goudarzi14], the estimated covariance matrix is not scaled by the a posteriori sigma, to match the covariance definition throughout the rest of DISSTANS. The time unit is also not assumed to be in years, and then scaled to millions of years.

See also

rotvec2eulerpole

Convert the rotation vector into an Euler pole and magnitude.

References

goudarzi14(1,2,3,4)

Goudarzi, M. A., Cocard, M., & Santerre, R. (2014), EPC: Matlab software to estimate Euler pole parameters, GPS Solutions, 18(1), 153–162, doi:10.1007/s10291-013-0354-4.

eulerpole2rotvec

disstans.tools.eulerpole2rotvec(euler_pole, euler_pole_covariance=None)[source]

Convert an Euler pole (and optionally, its formal covariance) into a rotation vector and associated covariance matrix. Based on [goudarzi14].

Parameters
  • euler_pole (ndarray) – NumPy Array containing the longitude [rad], latitude [rad], and rotation rate [rad/time] of the Euler pole.

  • euler_pole_covariance (Optional[ndarray], default: None) – If rotation_covariance was given, the propagated uncertainty for the Euler Pole for all three components.

Return type

tuple[ndarray, ...]

Returns

  • rotation_vector – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.

  • rotation_covariance – If euler_pole_covariance was given, formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.

See also

rotvec2eulerpole

Inverse function

full_cov_mat_to_columns

disstans.tools.full_cov_mat_to_columns(cov_mat, num_components, include_covariance=False, return_single=False)[source]

Converts a full variance(-covariance) matrix with multiple components into a column-based representation like the one used by Model or Timeseries. The extraction done basically implies the assumption that the cross-parameter/cross-observation covariance is negligible.

It is assumed the the individual elements are ordered such that all components of one parameter or observation are in neighboring rows/columns (i.e. the first parameter or observation occupies the first num_components rows/columns, the second one the second num_components rows/columns, etc.).

Parameters
  • cov_mat (ndarray) – Square array with dimensions \(\text{num_elements} * \text{num_components}\) where \(\text{num_elements}\) is the number of elements (e.g. observations or parameters) in each of the \(\text{num_components}\) dimensions.

  • num_components (int) – Number of components cov_mat contains.

  • include_covariance (bool, default: False) – If True, also extract the off-diagonal covariances of each element between its components. Defaults to False, i.e. only the diagonal covariances.

  • return_single (bool, default: False) – If False, return two arrays; if True, concatenate the two.

Return type

tuple[ndarray, ...]

Returns

  • variance – Array of shape \((\text{num_elements}, \text{num_components})\). If include_covariance=True and return_single=True, this array is concatenated horizontally with covariance, leading to \((\text{num_elements}, (\text{num_components}*(\text{num_components}-1))/2)\) columns instead.

  • covariance – If include_covariance=True and return_single=False, array of shape \(\text{num_components}) + (\text{num_components}*(\text{num_components}-1))/2\).

get_cov_dims

disstans.tools.get_cov_dims(num_components)[source]

Given a number of components, return the number of covariances that exist between the components.

Parameters

num_components (int) – Number of components of timeseries or model.

Return type

int

Returns

Number of covariances, calculated as \(\text{num_components}*(\text{num_components}-1))/2\).

See also

make_cov_index_map

For an example.

make_cov_index_map

disstans.tools.make_cov_index_map(num_components)[source]

Given a number of components, create a matrix that shows the indexing of where covariance columns present in a timeseries’ or model’s 2D dataframe will show in the covariance matrix of a single observation or parameter. Also provides the ordering in a 1D array which can be used together with reshape() to create the varaiance-covariance matrix from the columns.

Parameters

num_components (int) – Number of components of timeseries or model.

Return type

tuple[ndarray, ndarray]

Returns

  • index_map – Array of shape \((\text{num_components}, \text{num_components})\) that is NaN everywhere except in the upper triangle, where integer numbers denote where the column of a timseries’ or model’s 2D dataframe belong.

  • var_cov_map – Array of shape \((\text{num_components}^2, )\) that can be used to assemble the variance-covariance matrix from the columns given a particular timestep or parameter.

Example

>>> import numpy as np
>>> from disstans.tools import get_cov_dims, make_cov_index_map
>>> num_observations, num_components = 5, 2
>>> print(f"For {num_components} components, there should be:\n"
...       f"- {num_components} data columns,\n"
...       f"- {num_components} variance columns,\n"
...       f"- and {get_cov_dims(num_components)} covariance columns.")
For 2 components, there should be:
- 2 data columns,
- 2 variance columns,
- and 1 covariance columns.
>>> index_map, var_cov_map = make_cov_index_map(num_components)
>>> test_varcov = np.stack([np.ones(5), np.arange(5)*2, np.ones(5)*0.5], axis=1)
>>> test_varcov
array([[1. , 0. , 0.5],
       [1. , 2. , 0.5],
       [1. , 4. , 0.5],
       [1. , 6. , 0.5],
       [1. , 8. , 0.5]])

The first two columns are the variances, and the third column is the covariance column (since there is only one possible covariance). index_map will show where the covariance columns fit into, indexed from 0 to get_cov_dims(num_components) - 1. Since there is only one, the column index 0 will feature in the upper right corner:

>>> index_map
array([[nan,  0.],
       [nan, nan]])

If we want the full, symmetric variance-covariance matrix for the third observation, we use var_cov_map:

>>> var_cov_map
array([0, 2, 2, 1])
>>> test_varcov[2, var_cov_map].reshape(num_components, num_components)
array([[1. , 0.5],
       [0.5, 4. ]])

get_cov_indices

disstans.tools.get_cov_indices(icomp, index_map=None, num_components=None)[source]

Given a data or variance component index, retrieve the indices in the covariance columns of a timeseries or model that are associated with that component. Exactly one of index_map or num_components must be provided as input.

Parameters
Return type

list[int]

Returns

List of integer covariance column indices associated with icomp.

Example

In a 3D dataset, the second component is associated with two covariances - between the first and the second, and the second and the third. In a timeseries or model covariance dataframe, this corresponds to the following columns:

>>> from disstans.tools import get_cov_indices
>>> get_cov_indices(1, num_components=3)
[0, 2]

get_hom_vel_strain_rot

disstans.tools.get_hom_vel_strain_rot(locations, velocities, covariances=None, utmzone=None, reference=0)[source]

For a set of horizontal velocities on a 2D cartesian grid, estimate the best-fit displacement gradient matrix to calculate a homogenous velocity field characterized by a single translation vector, strain tensor, and rotation tensor. See [tape09] for an introduction.

This function uses a local approximation to the spherical Earth by converting all station locations into a suitable UTM zone, and only considering the horizontal velocities.

Parameters
  • locations (ndarray) – Array of shape \((\text{num_stations}, 2)\) containing the longitude and latitude [°] of the observations (stations).

  • velocities (ndarray) – Array of shape \((\text{num_stations}, 2)\) containing the East and North velocities [m/time] of the observations

  • covariances (Optional[ndarray], default: None) – Array of shape \((\text{num_stations}, 2)\) containing the variances in the East and North velocities [m^2/time^2]. Alternatively, array of shape \((\text{num_stations}, 3)\) additionally containing the East-North covariance [m2/time^2].

  • utmzome – If provided, the UTM zone to use for the horizontal approximation. If None, the average longitude will be calculated, and the respective UTM zone will be used.

  • reference (int | list, default: 0) – Reference station to be used by the calculation. This can be either a longitude-latitude [°] list, or the index of the reference station in locations.

Return type

tuple[ndarray, ndarray, ndarray]

Returns

  • v_O – Velocity of the origin \(\mathbf{v}_O\).

  • epsilon\(2 \times 2\) strain tensor \(\mathbf{\varepsilon}\).

  • omega\(2 \times 2\) rotation tensor \(\mathbf{\omega}\).

See also

strain_rotation_invariants

For calculation of invariants of the tensors.

References

tape09(1,2)

Tape, C., Musé, P., Simons, M., Dong, D., & Webb, F. (2009), Multiscale estimation of GPS velocity fields, Geophysical Journal International, 179(2), 945–971, doi:10.1111/j.1365-246X.2009.04337.x.

parallelize

disstans.tools.parallelize(func, iterable, num_threads=None, chunksize=1)[source]

Convenience wrapper that given a function, an iterable set of inputs and parallelization settings automatically either runs the function in serial or parallel.

Warning

By default on most systems, NumPy will already use multiple cores and threads in its routines (you can check this by running some very large and time-consuming math, and monitoring the usage of your processors). Just using multiple Python threads will give the default number of threads to all new Python threads, completely overloading the system since it’s now out of processors, slowing down the computations by a lot. The Python multiprocessing module does not change these settings, since it is apparently hard to guess which backend NumPy uses, see this thread on GitHub. So, it is sadly currently up to the user to disable this behavior when using multiple Python threads as achieved with this function. For example, this snipped might be enough to put at the beginning of a script: import os; os.environ['OMP_NUM_THREADS'] = '1'. Then, the number of DISSTANS cores can be set by e.g. import disstans; disstans.defaults["general"]["num_threads"] = 10. Another important note is that if you’re experiencing problems when running a script, make sure the settings and the rest of the script are encapsulated in the standard if __name__ == "__main__": ... clause.

Parameters
  • func (Callable[[Any], Any]) – Function to wrap, can only have a single input argument.

  • iterable (Iterable) – Iterable object (list, generator expression, etc.) that contains all the arguments that func should be called with.

  • num_threads (Optional[int], default: None) – Number of threads to use. Set to 0 if no parallelization is desired. None defaults to the value in defaults.

  • chunksize (int, default: 1) – Chunk size used in the parallelization pool, see imap().

Yields

result – Whenever a result is calculated, return it.

Return type

Iterator[Any]

Example

Consider a simple loop to multiply two numbers:

>>> from numpy import sum
>>> iterable = [(1, 2), (2, 3)]
>>> print([sum(i) for i in iterable])
[3, 5]

In parallel with 2 threads, this could look like this:

>>> from multiprocessing import Pool
>>> with Pool(2) as p:
...     print([result for result in p.imap(sum, iterable)])
...
[3, 5]

Using parallelize(), both cases simplify to:

>>> from disstans.tools import parallelize
>>> print([result for result in parallelize(sum, iterable, num_threads=0)])
[3, 5]
>>> print([result for result in parallelize(sum, iterable, num_threads=2)])
[3, 5]

parse_maintenance_table

disstans.tools.parse_maintenance_table(csvpath, sitecol, datecols, siteformatter=None, delimiter=',', codecol=None, exclude=None, include=None, verbose=False)[source]

Function that loads a maintenance table from a .csv file (or similar) and returns a list of step times for each station. It also provides an interface to ignore certain maintenance codes (if present), and modify the site names when loading.

Parameters
  • csvpath (str) – Path of the file to load.

  • sitecol (int) – Column index of the station names.

  • datecols (list) – List of indices that contain the ingredients to convert the input to a valid Timestamp. It should fail gracefully, i.e. return a string if Pandas cannot interpret the column(s) appropriately.

  • siteformatter (Optional[Callable[[str], str]], default: None) – Function that will be called element-wise on the loaded station names to produce the output station names.

  • delimiter (str, default: ',') – Delimiter character for the input file.

  • codecol (Optional[int], default: None) – Column index of the maintenance code.

  • exclude (Optional[list[str]], default: None) – Maintenance records that exactly match an element in exclude will be ignored. codecol has to be set.

  • include (Optional[list[str]], default: None) – Only maintenance records that include an element of include will be used. No exact match is required. codecol has to be set.

  • verbose (bool, default: False) – If True, print loading information.

Return type

tuple[DataFrame, dict[str, list]]

Returns

  • maint_table – Parsed maintenance table.

  • maint_dict – Dictionary of that maps the station names to a list of steptimes.

Notes

If running into problems, also consult the Pandas read_csv() function (used to load the csvpath file) and DataFrame (object on which the filtering happens).

parse_unr_steps

disstans.tools.parse_unr_steps(filepath, check_update=True, only_stations=None, verbose=False)[source]

This functions parses the main step file from UNR and produces two step databases, one for maintenance and one for earthquake-related events. If a newer step file is found online, the local copy is updated.

See download_unr_data() for more information about UNR’s dataset, as well as how to access and cite it.

Parameters
  • filepath (str) – Path to the step file.

  • check_update (bool, default: True) – If True, check UNR’s server for an updated step file.

  • only_stations (Optional[list[str]], default: None) – If specified, a list of station IDs. Other stations are not included in the output.

  • verbose (bool, default: False) – If True, print actions.

Return type

tuple[DataFrame, dict[str, list], DataFrame, dict[str, list]]

Returns

  • maint_table – Parsed maintenance table.

  • maint_dict – Dictionary of that maps the station names to a list of maintenance steptimes.

  • eq_table – Parsed earthquake table.

  • eq_dict – Dictionary of that maps the station names to a list of earthquake-related steptimes.

R_ecef2enu

disstans.tools.R_ecef2enu(lon, lat)[source]

Generate the rotation matrix used to express a vector written in ECEF (XYZ) coordinates as a vector written in local east, north, up (ENU) coordinates at the position defined by geodetic latitude and longitude. See Chapter 4 and Appendix 4.A in [misraenge2010] for details.

Parameters
  • lon (float) – Longitude [°] of vector position.

  • lat (float) – Latitude [°] of vector position.

Return type

ndarray

Returns

The 3-by-3 rotation matrix.

See also

R_enu2ecef

The inverse matrix.

References

misraenge2010

Misra, P., & Enge, P. (2010), Global Positioning System: Signals, Measurements, and Performance, Lincoln, Mass: Ganga-Jamuna Press.

R_enu2ecef

disstans.tools.R_enu2ecef(lon, lat)[source]

Generate the rotation matrix used to express a vector written in local ENU coordinates as a vector written ECEF (XYZ) coordinates at the position defined by geodetic latitude and longitude. This is the transpose of the rotation matrix computed by R_ecef2enu().

Parameters
  • lon (float) – Longitude [°] of vector position.

  • lat (float) – Latitude [°] of vector position.

Return type

ndarray

Returns

The 3-by-3 rotation matrix.

See also

R_ecef2enu

The inverse matrix.

rotvec2eulerpole

disstans.tools.rotvec2eulerpole(rotation_vector, rotation_covariance=None)[source]

Convert a rotation vector containing the diagonals of a \(3 \times 3\) rotation matrix (and optionally, its formal covariance) into an Euler Pole and associated magnitude. Based on [goudarzi14].

Parameters
  • rotation_vector (ndarray) – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.

  • rotation_covariance (Optional[ndarray], default: None) – Formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.

Return type

tuple[ndarray, ...]

Returns

  • euler_pole – NumPy Array containing the longitude [rad], latitude [rad], and rotation rate [rad/time] of the Euler pole.

  • euler_pole_covariance – If rotation_covariance was given, a NumPy Array of the propagated uncertainty for the Euler Pole for all three components.

See also

eulerpole2rotvec

Inverse function

strain_rotation_invariants

disstans.tools.strain_rotation_invariants(epsilon=None, omega=None)[source]

Given a strain (rate) and/or rotation (rate) tensor, calculate scalar invariant quantities of interest. See [tape09] for an introduction.

Parameters
  • epsilon (Optional[ndarray], default: None) – Strain (rate) tensor \(\mathbf{\varepsilon}\).

  • omega (Optional[ndarray], default: None) – Rotation (rate) tensor \(\mathbf{\omega}\).

Return type

tuple[float, ...]

Returns

  • dilatation – Only if epsilon is provided. Scalar dilatation (rate) as defined by the first invariant of the strain (rate) tensor \(\Theta = \text{Tr} \left( \mathbf{\varepsilon} \right)\).

  • strain – Only if epsilon is provided. Scalar strain (rate) as defined by the Frobenius norm of the strain (rate) tensor \(\Sigma = \lVert \mathbf{\varepsilon} \rVert_F\)

  • shear – Only if epsilon is provided. Scalar shearing (rate) as defined by the square root of the second invariant of the deviatoric strain (rate) tensor \(\text{T} = \sqrt{\frac{1}{2} \text{Tr}(\mathbf{\varepsilon}^2) - \frac{1}{6} \text{Tr}(\mathbf{\varepsilon})^2}\).

  • rotation – Only if omega is provided. Scalar rotation (rate) as defined by \(\Omega = \frac{1}{\sqrt{2}} \lVert \mathbf{\omega} \rVert_F\).

tvec_to_numpycol

disstans.tools.tvec_to_numpycol(timevector, t_reference=None, time_unit='D')[source]

Converts a Pandas timestamp series into a NumPy array of relative time to a reference time in the given time unit.

Parameters
Return type

ndarray

Returns

Array of time differences.

weighted_median

disstans.tools.weighted_median(values, weights, axis=0, percentile=0.5, keepdims=False, visualize=False)[source]

Calculates the weighted median along a given axis.

Parameters
  • values (ndarray) – Values to calculate the medians for.

  • weights (ndarray) – Weights of each value along the given axis.

  • axis (int, default: 0) – Axis along which to calculate the median.

  • percentile (float, default: 0.5) – Changes the percentile (between 0 and 1) of which median to calculate.

  • keepdims (bool, default: False) – If True, squeezes out the axis along which the median was calculated.

  • visualize (bool, default: False) – If True, show a plot of the weighted median calculation.

Return type

ndarray

Returns

Weighted median of input.

Classes

Click

class disstans.tools.Click(ax, func, button=MouseButton.LEFT)[source]

Class that enables a GUI to distinguish between clicks (mouse press and release) and dragging event (mouse press, move, then release).

Parameters
  • ax – Axis on which to look for clicks.

  • func – Function to call, with the Matplotlib clicking Event as its first argument.

  • button (default: <MouseButton.LEFT: 1>) – Which mouse button to operate on, see MouseButton for accepted values.

RINEXDataHolding

class disstans.tools.RINEXDataHolding(df=None)[source]

Container class for a database of RINEX files.

A new object can be created by one of the two classmethods:

An object can be saved by using Pandas’ to_pickle() on the instance’s df attribute (it is recommended to add the .gz extension to enable compression).

The location information and availability metrics can be saved in the same way. To load a previously-saved file, you can use the convenience functions load_locations_from_file() and load_metrics_from_file(), specify the respective paths in the call to from_file(), or alternatively, load the data directly with Pandas and assign it to the respective instance attributes.

COLUMNS = ('station', 'station_raw', 'year', 'day', 'date', 'sequence', 'type', 'compression', 'filesize', 'filetimeutc', 'network', 'basefolder')

The necessary information about each RINEX file.

COMPRFILEEXTS = ('.Z', '.gz')

The valid (compressed) RINEX file extensions to search for.

GLOBPATTERN = '[0-9][0-9][0-9][0-9]/[0-9][0-9][0-9]/*'

The YYYY/DDD folder pattern in a glob-readable format.

METRICCOLS = ('number', 'age', 'recency', 'length', 'reliability')

The metrics that can be calculated.

RINEXPATTERN = '(?P<site>\\w{4})(?P<day>\\d{3})(?P<sequence>\\w{1})\\.(?P<yy>\\d{2})(?P<type>\\w{1})\\.(?P<compression>\\w+)'

The regex-style filename pattern for RINEX files.

calculate_availability_metrics(sampling=Timedelta('1 days 00:00:00'))[source]

Calculates the following metrics and stores them in the metrics DataFrame:

  • 'number': Number of available observations.

  • 'age': Time of first observation.

  • 'recency': Time of last observation.

  • 'length': Time between first and last observation.

  • 'reliability': Reliability defined as number of observations divided by the maximum amount of possible observations between the first and last acquisition given the assumed sampling interval of the data.

Parameters

sampling (Timedelta, default: Timedelta('1 days 00:00:00')) – Assumed sampling frequency of the data files.

Return type

None

property df: DataFrame

Pandas DataFrame object containing the RINEX files database.

classmethod from_file(db_file, locations_file=None, metrics_file=None, verbose=False)[source]

Convenience class method that creates a new RINEXDataHolding object from a file using load_db_from_file() and then optionally loads the locations and metrics from their respective files.

Parameters
  • db_file (str) – Path of the main file.

  • locations_file (Optional[str], default: None) – Path of the locations file.

  • metrics_file (Optional[str], default: None) – Path of the metrics file.

  • verbose (bool, default: False) – If True, print database size and a sample entry.

Return type

RINEXDataHolding

Returns

The newly created RINEXDataHolding object.

classmethod from_folders(folders, verbose=False, no_pbar=False)[source]

Convenience class method that creates a new RINEXDataHolding object and directly calls load_db_from_folders().

Parameters
  • folders (tuple | list[tuple]) – Folder(s) in which the different year-folders are found, formatted as a single tuple or a list of tuples with name and respective folder ([('network1', '/folder/one/'), ...]).

  • verbose (bool, default: False) – If True, print final database size and a sample entry.

  • no_pbar (bool, default: False) – Suppress the progress bar with True.

Return type

RINEXDataHolding

Returns

The newly created RINEXDataHolding object.

get_files_by(station=None, network=None, year=None, between=None, verbose=False)[source]

Return a subset of the database by criteria.

Parameters
  • station (UnionType[str, list[str], None], default: None) – Return only files of this/these station(s).

  • network (UnionType[str, list[str], None], default: None) – Return only files of this/these network(s).

  • year (UnionType[int, list[int], None], default: None) – Return only files of this/these year(s).

  • between (Optional[tuple], default: None) – Return only files between the start and end date (inclusive) given by the length-two tuple.

  • verbose (bool, default: False) – If True, print the number of selected entries.

Return type

DataFrame

Returns

The DataFrame subset.

get_location(station, lla=True)[source]

Returns the approximate location of a station.

Parameters
  • station (str) – Name of the station

  • lla (bool, default: True) – If True, returns the coordinates in Longitude [°], Latitude [°] & Altitude [m], otherwise in XYZ [m] coordinates.

Return type

Series

Returns

The location of the station in the specified coordinate system.

get_rinex_header(filepath)[source]

Open a RINEX file, read the header, and format it as a dictionary. No data type conversion or stripping of whitespaces is performed.

Parameters

filepath (str) – Path to RINEX file.

Return type

dict[str, str]

Returns

Dictionary of header lines.

property list_stations: list[str]

List of stations in the database.

load_db_from_file(db_file, verbose=False)[source]

Loads a RINEXDataHolding object from a pickled Pandas DataFrame file.

Parameters
  • db_file (str) – Path of the main file.

  • verbose (bool, default: False) – If True, print database size and a sample entry.

Return type

None

load_db_from_folders(folders, verbose=False, no_pbar=False)[source]

Loads a RINEX database from folders in the file system. The data should be located in one or multiple folder structure(s) organized by YYYY/DDD, where YYYY is a four-digit year and DDD is the three-digit day of the year.

Parameters
  • folders (tuple | list[tuple]) – Folder(s) in which the different year-folders are found, formatted as a single tuple or a list of tuples with name and respective folder ([('network1', '/folder/one/'), ...]).

  • verbose (bool, default: False) – If True, print final database size and a sample entry.

  • no_pbar (bool, default: False) – Suppress the progress bar with True.

Return type

None

load_locations_from_file(filepath)[source]

Load a previously-saved DataFrame containing the locations of each station.

Parameters

filepath (str) – Path to the pickled DataFrame.

Return type

None

load_locations_from_rinex(keep='last', replace_not_found=False, no_pbar=True)[source]

Scan the RINEX files’ headers for approximate locations for plotting purposes.

Parameters
  • keep (Literal['last', 'first', 'mean'], default: 'last') – Determine which location to use. Possible values are 'last' (only scan the most recent file), 'first' (only scan the oldest file) or 'mean' (load all files and calculate average). Note that 'mean' could take a substantial amount of time, since all files have to opened, decompressed and searched.

  • replace_not_found (bool, default: False) – If a location is not found and replace_not_found=True, the location of Null Island (0° Longitude, 0° Latitude) is used and a warning is issued. If False, an error is raised instead.

  • no_pbar (bool, default: True) – Suppress the progress bar with True.

Return type

None

load_metrics_from_file(filepath)[source]

Load a previously-saved DataFrame containing the calculated availability metrics.

Parameters

filepath (str) – Path to the pickled DataFrame.

Return type

None

property locations_lla: DataFrame

Approximate positions of stations in WGS-84 (longitude [°], latitude [°], altitude [m]) coordinates.

property locations_xyz: DataFrame

Dataframe of approximate positions of stations in WGS-84 (x, y, z) [m] coordinates.

make_filenames(db)[source]

Recreate the full paths to the individual rinex files from the database or a subset thereof.

Parameters

db (DataFrame) – df or a subset thereof.

Return type

list[str]

Returns

List of paths.

Raises

NotImplementedError – If GLOBPATTERN or RINEXPATTERN for this instance are not the same as the default values. In this case, redefine this function with the appropriate folder and file patterns.

property metrics: DataFrame

Contains the station metric calculated by calculate_availability_metrics().

property num_files: int

Number of files in the database.

property num_stations: int

Number of stations in the database.

plot_availability(sampling=Timedelta('1 days 00:00:00'), sort_by_latitude=True, saveas=None)[source]

Create an availability figure for the dataset.

Parameters
  • sampling (Timedelta, default: Timedelta('1 days 00:00:00')) – Assume that breaks strictly larger than sampling constitute a data gap.

  • sort_by_latitude (bool, default: True) – If True, sort the stations by latitude, else alphabetical. (Always falls back to alphabetical if location information is missing.)

  • saveas (str, default: None) – If provided, the figure will be saved at this location.

Return type

None

plot_map(metric=None, orientation='horizontal', annotate_stations=True, figsize=None, saveas=None, dpi=None, gui_kw_args={})[source]

Plot a map of all the stations present in the RINEX database. The markers can be colored by the different availability metrics calculated by calculate_availability_metrics().

Parameters
  • metric (Optional[str], default: None) – Calculate the marker color (and respective colormap) given a certain metric. If None, no color is applied.

  • orientation (Literal['horizontal', 'vertical'], default: 'horizontal') – Colorbar orientation, see colorbar().

  • annotate_stations (bool, default: True) – If True, add the station names to the map.

  • figsize (Optional[tuple], default: None) – Set the figure size (width, height) in inches.

  • saveas (Optional[str], default: None) – If provided, the figure will be saved at this location.

  • dpi (Optional[float], default: None) – Use this DPI for saved figures.

  • gui_kw_args (dict[str, Any], default: {}) – Override default GUI settings of defaults.

Return type

None

Timedelta

class disstans.tools.Timedelta(*args, **kwargs)[source]
static __new__(cls, *args, **kwargs)[source]

DISSTANS Timedelta subclassed from Timedelta but with support for the 'Y' year time unit, defined as always exactly 365.25 days. Other possible values are:

W, D, days, day, hours, hour, hr, h, m, minute, min, minutes, T, S, seconds, sec, second, ms, milliseconds, millisecond, milli, millis, L, us, microseconds, microsecond, micro, micros, U, ns, nanoseconds, nano, nanos, nanosecond, N