Tools
This module contains helper functions and classes that are not dependent on any of DISSTANS’s classes.
For more specialized processing functions, see processing
.
Functions
best_utmzone
block_permutation
- disstans.tools.block_permutation(n_outer, n_inner)[source]
Convenience function to calculate a permutation matrix used to rearrange (permute) blockwise-ordered submatrices in a big matrix.
n_outer
outside blocks of individualn_inner
-sized blocks will becomen_inner
outside blocks of individualn_outer
-sized blocks.Transposing the result is equivalent to calling this function with swapped arguments.
- Parameters
- Return type
- Returns
Square permutation matrix with dimensions \(n = \text{n_outer} * \text{n_inner}\). To permute a matrix \(A\), calculate \(~P A P^T\).
Example
>>> import numpy as np >>> from disstans.tools import block_permutation >>> n_outer, n_inner = 2, 2 >>> A = np.block([[np.arange(n_inner**2).reshape(n_inner, n_inner), ... np.zeros((n_inner, n_inner))], [np.zeros((n_inner, n_inner)), ... np.ones((n_inner, n_inner))]]) >>> A array([[0., 1., 0., 0.], [2., 3., 0., 0.], [0., 0., 1., 1.], [0., 0., 1., 1.]]) >>> P = block_permutation(n_outer, n_inner) >>> P @ A @ P.T array([[0., 0., 1., 0.], [0., 1., 0., 1.], [2., 0., 3., 0.], [0., 1., 0., 1.]])
cov2corr
create_powerlaw_noise
- disstans.tools.create_powerlaw_noise(size, exponent, seed=None)[source]
Creates synthetic noise according to a Power Law model [langbein04].
- Parameters
size (
int
|list
|tuple
) – Number of (equally-spaced) noise samples of the output noise array or a shape where the first entry defines the number of noise samples for the remaining dimensions.exponent (
int
) – Exponent of the power law noise model. E.g.0
corresponds to white (Gaussian) noise,1
to flicker (pink) noise, and2
to random walk (red, Brownian) noise.seed (
UnionType
[int
,Generator
,None
], default:None
) – Pass an initial seed to the random number generator, or pass aGenerator
instance.
- Return type
- Returns
Noise output array.
Notes
This function uses Timmer and König’s [timmerkoenig95] approach to generate the noise, and Felix Patzelt’s colorednoise code to calculate the theoretical standard deviation.
References
- langbein04
Langbein, J. (2004), Noise in two-color electronic distance meter measurements revisited, J. Geophys. Res., 109, B04406, doi:10.1029/2003JB002819.
- timmerkoenig95
Timmer, J.; König, M. (1995), On generating power law noise, Astronomy and Astrophysics, v.300, p.707.
date2decyear
download_unr_data
- disstans.tools.download_unr_data(station_list_or_bbox, data_dir, solution='final', rate='24h', reference='IGS14', min_solutions=100, t_min=None, t_max=None, verbose=False, no_pbar=False)[source]
Downloads GNSS timeseries data from the University of Nevada at Reno’s Nevada Geodetic Laboratory. When using this data, please cite [blewitt18], as well as all the original data providers (the relevant info will be downloaded as well).
Files will only be downloaded if there is no matching file already present, or the remote file is newer than the local one.
- Parameters
station_list_or_bbox (
list
[str
] |list
[float
]) – Defines which stations to look for data and download. It can be either a list of station names (list of strings), a list of bounding box coordinates (the four floats[lon_min, lon_max, lat_min, lat_max]
in degrees), or a three-element list defining a circle (location in degrees and radius in kilometers[center_lon, center_lat, radius]
).data_dir (
str
) – Folder for data.solution (
Literal
['final'
,'rapid'
,'ultra'
], default:'final'
) – Which timeseries solution to download. See the Notes for approximate latency times.rate (
Literal
['24h'
,'5min'
], default:'24h'
) – Which sample rate to download. See the Notes for a table of which rates are available for each solution.reference (
str
, default:'IGS14'
) – The UNR abbreviation for the reference frame in which to download the data. Applies only for daily sample rates and final or rapid orbit solutions.min_solutions (
int
, default:100
) – Only consider stations with at least a certain number of all-time solutions according to the station list file.t_min (
UnionType
[str
,Timestamp
,None
], default:None
) – Only consider stations that have data on or aftert_min
.t_max (
UnionType
[str
,Timestamp
,None
], default:None
) – Only consider stations that have data on or beforet_max
.verbose (
bool
, default:False
) – IfTrue
, individual actions are printed.no_pbar (
bool
, default:False
) – Suppress the progress bar withTrue
.
- Return type
- Returns
A DataFrame, built from UNR’s data holding list, subset to the stations actually selected for download.
Notes
The following combinations of solution and sample rates are available. Note that not all stations are equipped to provide all data types. Furthermore, only the daily files will be available in a plate reference frame.
orbit solutions
24 hours
5 minutes
latency
final
yes
yes
approx. 2 weeks
rapid
yes
yes
approx. 24 hours
ultra
no
yes
approx. 2 hours
Warning
It is your responsibility that different reference frames or solution types are not downloaded into the same folders, because this could lead to the overwriting of data or ambiguities as to which files represent which solutions. This is because this script does not rename files or change the folder structure that it finds on UNR’s servers.
References
- blewitt18
Blewitt, G., Hammond, W., & Kreemer, C. (2018). Harnessing the GPS Data Explosion for Interdisciplinary Science. Eos, 99. doi:10.1029/2018EO104623
See also
parse_unr_steps
Function to download and parse UNR’s main step file.
estimate_euler_pole
- disstans.tools.estimate_euler_pole(locations, velocities, covariances=None, enu=True)[source]
Estimate a best-fit Euler pole assuming all velocities lie on the same rigid plate on a sphere. The calculations are based on [goudarzi14].
- Parameters
locations (
ndarray
) – Array of shape \((\text{num_stations}, \text{num_components})\) containing the locations of each station (observation), where \(\text{num_components}=2\) if the locations are given by longitudes and latitudes [°] (enu=True
) or \(\text{num_components}=3\) if the locations are given in the cartesian Earth-Centered, Earth-Fixed (ECEF) reference frame [m] (enu=False
).velocities (
ndarray
) – Array of shape \((\text{num_stations}, \text{num_components})\) containing the velocities [m/time] at different stations (observations), where \(\text{num_components}=2\) if the velocities are given in the East-North local geodetic reference frame (enu=True
) or \(\text{num_components}=3\) if the velocities are given in the cartesian Earth-Centered, Earth-Fixed (ECEF) reference frame (enu=False
).covariances (
Optional
[ndarray
], default:None
) – Array containing the (co)variances of the velocities [m^2/time^2], allowing for different input shapes depending on what uncertainties are available. IfNone
, all observations are weighted equally. Ifenu=True
, the array should have shape \((\text{num_stations}, 2)\) if only variances are present, \((\text{num_stations}, 3)\) if also the covariances are present but are given as a column, or \((\text{num_stations}, 2, 2)\) if the \(2 \times 2\), the arrays should be of shapes \((\text{num_stations}, 3)\), \((\text{num_stations}, 6)\), or \((\text{num_stations}, 3, 3)\), respectively.enu (
bool
, default:True
) – Seelocations
andvelocities
.
- Return type
- Returns
rotation_vector – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.
rotation_covariance – Formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.
Notes
The ENU solution assumes a spherical Earth with radius 6378137 meters.
If the covariances are given in columns, the formatting of
Timeseries
is being used.Contrary to [goudarzi14], the estimated covariance matrix is not scaled by the a posteriori sigma, to match the covariance definition throughout the rest of DISSTANS. The time unit is also not assumed to be in years, and then scaled to millions of years.
See also
rotvec2eulerpole
Convert the rotation vector into an Euler pole and magnitude.
References
- goudarzi14(1,2,3,4)
Goudarzi, M. A., Cocard, M., & Santerre, R. (2014), EPC: Matlab software to estimate Euler pole parameters, GPS Solutions, 18(1), 153–162, doi:10.1007/s10291-013-0354-4.
eulerpole2rotvec
- disstans.tools.eulerpole2rotvec(euler_pole, euler_pole_covariance=None)[source]
Convert an Euler pole (and optionally, its formal covariance) into a rotation vector and associated covariance matrix. Based on [goudarzi14].
- Parameters
- Return type
- Returns
rotation_vector – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.
rotation_covariance – If
euler_pole_covariance
was given, formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.
See also
rotvec2eulerpole
Inverse function
full_cov_mat_to_columns
- disstans.tools.full_cov_mat_to_columns(cov_mat, num_components, include_covariance=False, return_single=False)[source]
Converts a full variance(-covariance) matrix with multiple components into a column-based representation like the one used by
Model
orTimeseries
. The extraction done basically implies the assumption that the cross-parameter/cross-observation covariance is negligible.It is assumed the the individual elements are ordered such that all components of one parameter or observation are in neighboring rows/columns (i.e. the first parameter or observation occupies the first
num_components
rows/columns, the second one the secondnum_components
rows/columns, etc.).- Parameters
cov_mat (
ndarray
) – Square array with dimensions \(\text{num_elements} * \text{num_components}\) where \(\text{num_elements}\) is the number of elements (e.g. observations or parameters) in each of the \(\text{num_components}\) dimensions.num_components (
int
) – Number of components cov_mat contains.include_covariance (
bool
, default:False
) – IfTrue
, also extract the off-diagonal covariances of each element between its components. Defaults toFalse
, i.e. only the diagonal covariances.return_single (
bool
, default:False
) – IfFalse
, return two arrays; ifTrue
, concatenate the two.
- Return type
- Returns
variance – Array of shape \((\text{num_elements}, \text{num_components})\). If
include_covariance=True
andreturn_single=True
, this array is concatenated horizontally withcovariance
, leading to \((\text{num_elements}, (\text{num_components}*(\text{num_components}-1))/2)\) columns instead.covariance – If
include_covariance=True
andreturn_single=False
, array of shape \(\text{num_components}) + (\text{num_components}*(\text{num_components}-1))/2\).
get_cov_dims
- disstans.tools.get_cov_dims(num_components)[source]
Given a number of components, return the number of covariances that exist between the components.
- Parameters
num_components (
int
) – Number of components of timeseries or model.- Return type
- Returns
Number of covariances, calculated as \(\text{num_components}*(\text{num_components}-1))/2\).
See also
make_cov_index_map
For an example.
make_cov_index_map
- disstans.tools.make_cov_index_map(num_components)[source]
Given a number of components, create a matrix that shows the indexing of where covariance columns present in a timeseries’ or model’s 2D dataframe will show in the covariance matrix of a single observation or parameter. Also provides the ordering in a 1D array which can be used together with
reshape()
to create the varaiance-covariance matrix from the columns.- Parameters
num_components (
int
) – Number of components of timeseries or model.- Return type
- Returns
index_map – Array of shape \((\text{num_components}, \text{num_components})\) that is NaN everywhere except in the upper triangle, where integer numbers denote where the column of a timseries’ or model’s 2D dataframe belong.
var_cov_map – Array of shape \((\text{num_components}^2, )\) that can be used to assemble the variance-covariance matrix from the columns given a particular timestep or parameter.
Example
>>> import numpy as np >>> from disstans.tools import get_cov_dims, make_cov_index_map >>> num_observations, num_components = 5, 2 >>> print(f"For {num_components} components, there should be:\n" ... f"- {num_components} data columns,\n" ... f"- {num_components} variance columns,\n" ... f"- and {get_cov_dims(num_components)} covariance columns.") For 2 components, there should be: - 2 data columns, - 2 variance columns, - and 1 covariance columns. >>> index_map, var_cov_map = make_cov_index_map(num_components) >>> test_varcov = np.stack([np.ones(5), np.arange(5)*2, np.ones(5)*0.5], axis=1) >>> test_varcov array([[1. , 0. , 0.5], [1. , 2. , 0.5], [1. , 4. , 0.5], [1. , 6. , 0.5], [1. , 8. , 0.5]])
The first two columns are the variances, and the third column is the covariance column (since there is only one possible covariance).
index_map
will show where the covariance columns fit into, indexed from0
toget_cov_dims(num_components) - 1
. Since there is only one, the column index0
will feature in the upper right corner:>>> index_map array([[nan, 0.], [nan, nan]])
If we want the full, symmetric variance-covariance matrix for the third observation, we use
var_cov_map
:>>> var_cov_map array([0, 2, 2, 1]) >>> test_varcov[2, var_cov_map].reshape(num_components, num_components) array([[1. , 0.5], [0.5, 4. ]])
get_cov_indices
- disstans.tools.get_cov_indices(icomp, index_map=None, num_components=None)[source]
Given a data or variance component index, retrieve the indices in the covariance columns of a timeseries or model that are associated with that component. Exactly one of
index_map
ornum_components
must be provided as input.- Parameters
icomp (
int
) – Index of the component.index_map (
Optional
[ndarray
], default:None
) – Output ofmake_cov_index_map()
.num_components (
Optional
[int
], default:None
) – Number of components of timeseries or model. (Function will callmake_cov_index_map()
to getindex_map
.)
- Return type
- Returns
List of integer covariance column indices associated with
icomp
.
Example
In a 3D dataset, the second component is associated with two covariances - between the first and the second, and the second and the third. In a timeseries or model covariance dataframe, this corresponds to the following columns:
>>> from disstans.tools import get_cov_indices >>> get_cov_indices(1, num_components=3) [0, 2]
get_hom_vel_strain_rot
- disstans.tools.get_hom_vel_strain_rot(locations, velocities, covariances=None, utmzone=None, reference=0)[source]
For a set of horizontal velocities on a 2D cartesian grid, estimate the best-fit displacement gradient matrix to calculate a homogenous velocity field characterized by a single translation vector, strain tensor, and rotation tensor. See [tape09] for an introduction.
This function uses a local approximation to the spherical Earth by converting all station locations into a suitable UTM zone, and only considering the horizontal velocities.
- Parameters
locations (
ndarray
) – Array of shape \((\text{num_stations}, 2)\) containing the longitude and latitude [°] of the observations (stations).velocities (
ndarray
) – Array of shape \((\text{num_stations}, 2)\) containing the East and North velocities [m/time] of the observationscovariances (
Optional
[ndarray
], default:None
) – Array of shape \((\text{num_stations}, 2)\) containing the variances in the East and North velocities [m^2/time^2]. Alternatively, array of shape \((\text{num_stations}, 3)\) additionally containing the East-North covariance [m2/time^2].utmzome – If provided, the UTM zone to use for the horizontal approximation. If
None
, the average longitude will be calculated, and the respective UTM zone will be used.reference (
int
|list
, default:0
) – Reference station to be used by the calculation. This can be either a longitude-latitude [°] list, or the index of the reference station inlocations
.
- Return type
- Returns
v_O – Velocity of the origin \(\mathbf{v}_O\).
epsilon – \(2 \times 2\) strain tensor \(\mathbf{\varepsilon}\).
omega – \(2 \times 2\) rotation tensor \(\mathbf{\omega}\).
See also
strain_rotation_invariants
For calculation of invariants of the tensors.
References
- tape09(1,2)
Tape, C., Musé, P., Simons, M., Dong, D., & Webb, F. (2009), Multiscale estimation of GPS velocity fields, Geophysical Journal International, 179(2), 945–971, doi:10.1111/j.1365-246X.2009.04337.x.
parallelize
- disstans.tools.parallelize(func, iterable, num_threads=None, chunksize=1)[source]
Convenience wrapper that given a function, an iterable set of inputs and parallelization settings automatically either runs the function in serial or parallel.
Warning
By default on most systems, NumPy will already use multiple cores and threads in its routines (you can check this by running some very large and time-consuming math, and monitoring the usage of your processors). Just using multiple Python threads will give the default number of threads to all new Python threads, completely overloading the system since it’s now out of processors, slowing down the computations by a lot. The Python
multiprocessing
module does not change these settings, since it is apparently hard to guess which backend NumPy uses, see this thread on GitHub. So, it is sadly currently up to the user to disable this behavior when using multiple Python threads as achieved with this function. For example, this snipped might be enough to put at the beginning of a script:import os; os.environ['OMP_NUM_THREADS'] = '1'
. Then, the number of DISSTANS cores can be set by e.g.import disstans; disstans.defaults["general"]["num_threads"] = 10
. Another important note is that if you’re experiencing problems when running a script, make sure the settings and the rest of the script are encapsulated in the standardif __name__ == "__main__": ...
clause.- Parameters
func (
Callable
[[Any
],Any
]) – Function to wrap, can only have a single input argument.iterable (
Iterable
) – Iterable object (list, generator expression, etc.) that contains all the arguments thatfunc
should be called with.num_threads (
Optional
[int
], default:None
) – Number of threads to use. Set to0
if no parallelization is desired.None
defaults to the value indefaults
.chunksize (
int
, default:1
) – Chunk size used in the parallelization pool, seeimap()
.
- Yields
result – Whenever a result is calculated, return it.
- Return type
Example
Consider a simple loop to multiply two numbers:
>>> from numpy import sum >>> iterable = [(1, 2), (2, 3)] >>> print([sum(i) for i in iterable]) [3, 5]
In parallel with 2 threads, this could look like this:
>>> from multiprocessing import Pool >>> with Pool(2) as p: ... print([result for result in p.imap(sum, iterable)]) ... [3, 5]
Using
parallelize()
, both cases simplify to:>>> from disstans.tools import parallelize >>> print([result for result in parallelize(sum, iterable, num_threads=0)]) [3, 5] >>> print([result for result in parallelize(sum, iterable, num_threads=2)]) [3, 5]
parse_maintenance_table
- disstans.tools.parse_maintenance_table(csvpath, sitecol, datecols, siteformatter=None, delimiter=',', codecol=None, exclude=None, include=None, verbose=False)[source]
Function that loads a maintenance table from a .csv file (or similar) and returns a list of step times for each station. It also provides an interface to ignore certain maintenance codes (if present), and modify the site names when loading.
- Parameters
csvpath (
str
) – Path of the file to load.sitecol (
int
) – Column index of the station names.datecols (
list
) – List of indices that contain the ingredients to convert the input to a validTimestamp
. It should fail gracefully, i.e. return a string if Pandas cannot interpret the column(s) appropriately.siteformatter (
Optional
[Callable
[[str
],str
]], default:None
) – Function that will be called element-wise on the loaded station names to produce the output station names.delimiter (
str
, default:','
) – Delimiter character for the input file.codecol (
Optional
[int
], default:None
) – Column index of the maintenance code.exclude (
Optional
[list
[str
]], default:None
) – Maintenance records that exactly match an element inexclude
will be ignored.codecol
has to be set.include (
Optional
[list
[str
]], default:None
) – Only maintenance records that include an element ofinclude
will be used. No exact match is required.codecol
has to be set.verbose (
bool
, default:False
) – IfTrue
, print loading information.
- Return type
- Returns
maint_table – Parsed maintenance table.
maint_dict – Dictionary of that maps the station names to a list of steptimes.
Notes
If running into problems, also consult the Pandas
read_csv()
function (used to load thecsvpath
file) andDataFrame
(object on which the filtering happens).
parse_unr_steps
- disstans.tools.parse_unr_steps(filepath, check_update=True, only_stations=None, verbose=False)[source]
This functions parses the main step file from UNR and produces two step databases, one for maintenance and one for earthquake-related events. If a newer step file is found online, the local copy is updated.
See
download_unr_data()
for more information about UNR’s dataset, as well as how to access and cite it.- Parameters
filepath (
str
) – Path to the step file.check_update (
bool
, default:True
) – IfTrue
, check UNR’s server for an updated step file.only_stations (
Optional
[list
[str
]], default:None
) – If specified, a list of station IDs. Other stations are not included in the output.verbose (
bool
, default:False
) – IfTrue
, print actions.
- Return type
tuple
[DataFrame
,dict
[str
,list
],DataFrame
,dict
[str
,list
]]- Returns
maint_table – Parsed maintenance table.
maint_dict – Dictionary of that maps the station names to a list of maintenance steptimes.
eq_table – Parsed earthquake table.
eq_dict – Dictionary of that maps the station names to a list of earthquake-related steptimes.
R_ecef2enu
- disstans.tools.R_ecef2enu(lon, lat)[source]
Generate the rotation matrix used to express a vector written in ECEF (XYZ) coordinates as a vector written in local east, north, up (ENU) coordinates at the position defined by geodetic latitude and longitude. See Chapter 4 and Appendix 4.A in [misraenge2010] for details.
- Parameters
- Return type
- Returns
The 3-by-3 rotation matrix.
See also
R_enu2ecef
The inverse matrix.
References
- misraenge2010
Misra, P., & Enge, P. (2010), Global Positioning System: Signals, Measurements, and Performance, Lincoln, Mass: Ganga-Jamuna Press.
R_enu2ecef
- disstans.tools.R_enu2ecef(lon, lat)[source]
Generate the rotation matrix used to express a vector written in local ENU coordinates as a vector written ECEF (XYZ) coordinates at the position defined by geodetic latitude and longitude. This is the transpose of the rotation matrix computed by
R_ecef2enu()
.- Parameters
- Return type
- Returns
The 3-by-3 rotation matrix.
See also
R_ecef2enu
The inverse matrix.
rotvec2eulerpole
- disstans.tools.rotvec2eulerpole(rotation_vector, rotation_covariance=None)[source]
Convert a rotation vector containing the diagonals of a \(3 \times 3\) rotation matrix (and optionally, its formal covariance) into an Euler Pole and associated magnitude. Based on [goudarzi14].
- Parameters
rotation_vector (
ndarray
) – Rotation vector [rad/time] containing the diagonals of the \(3 \times 3\) rotation matrix specifying the Euler pole in cartesian, ECEF coordinates.rotation_covariance (
Optional
[ndarray
], default:None
) – Formal \(3 \times 3\) covariance matrix [rad^2/time^2] of the rotation vector.
- Return type
- Returns
euler_pole – NumPy Array containing the longitude [rad], latitude [rad], and rotation rate [rad/time] of the Euler pole.
euler_pole_covariance – If
rotation_covariance
was given, a NumPy Array of the propagated uncertainty for the Euler Pole for all three components.
See also
eulerpole2rotvec
Inverse function
strain_rotation_invariants
- disstans.tools.strain_rotation_invariants(epsilon=None, omega=None)[source]
Given a strain (rate) and/or rotation (rate) tensor, calculate scalar invariant quantities of interest. See [tape09] for an introduction.
- Parameters
- Return type
- Returns
dilatation – Only if
epsilon
is provided. Scalar dilatation (rate) as defined by the first invariant of the strain (rate) tensor \(\Theta = \text{Tr} \left( \mathbf{\varepsilon} \right)\).strain – Only if
epsilon
is provided. Scalar strain (rate) as defined by the Frobenius norm of the strain (rate) tensor \(\Sigma = \lVert \mathbf{\varepsilon} \rVert_F\)shear – Only if
epsilon
is provided. Scalar shearing (rate) as defined by the square root of the second invariant of the deviatoric strain (rate) tensor \(\text{T} = \sqrt{\frac{1}{2} \text{Tr}(\mathbf{\varepsilon}^2) - \frac{1}{6} \text{Tr}(\mathbf{\varepsilon})^2}\).rotation – Only if
omega
is provided. Scalar rotation (rate) as defined by \(\Omega = \frac{1}{\sqrt{2}} \lVert \mathbf{\omega} \rVert_F\).
tvec_to_numpycol
- disstans.tools.tvec_to_numpycol(timevector, t_reference=None, time_unit='D')[source]
Converts a Pandas timestamp series into a NumPy array of relative time to a reference time in the given time unit.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
of when to evaluate the model.t_reference (
UnionType
[str
,Timestamp
,None
], default:None
) – ReferenceTimestamp
or datetime-like string that can be converted to one.None
chooses the first element oftimevector
.time_unit (
Optional
[str
], default:'D'
) – Time unit for parameters. Refer toTimedelta
for more details.
- Return type
- Returns
Array of time differences.
weighted_median
- disstans.tools.weighted_median(values, weights, axis=0, percentile=0.5, keepdims=False, visualize=False)[source]
Calculates the weighted median along a given axis.
- Parameters
values (
ndarray
) – Values to calculate the medians for.weights (
ndarray
) – Weights of each value along the givenaxis
.axis (
int
, default:0
) – Axis along which to calculate the median.percentile (
float
, default:0.5
) – Changes the percentile (between 0 and 1) of which median to calculate.keepdims (
bool
, default:False
) – IfTrue
, squeezes out the axis along which the median was calculated.visualize (
bool
, default:False
) – IfTrue
, show a plot of the weighted median calculation.
- Return type
- Returns
Weighted median of input.
Classes
Click
- class disstans.tools.Click(ax, func, button=MouseButton.LEFT)[source]
Class that enables a GUI to distinguish between clicks (mouse press and release) and dragging event (mouse press, move, then release).
- Parameters
ax – Axis on which to look for clicks.
func – Function to call, with the Matplotlib clicking
Event
as its first argument.button (default:
<MouseButton.LEFT: 1>
) – Which mouse button to operate on, seeMouseButton
for accepted values.
RINEXDataHolding
- class disstans.tools.RINEXDataHolding(df=None)[source]
Container class for a database of RINEX files.
A new object can be created by one of the two classmethods:
From one or multiple folder(s) using
from_folders()
From a previously-saved file using
from_file()
An object can be saved by using Pandas’
to_pickle()
on the instance’sdf
attribute (it is recommended to add the.gz
extension to enable compression).The location information and availability metrics can be saved in the same way. To load a previously-saved file, you can use the convenience functions
load_locations_from_file()
andload_metrics_from_file()
, specify the respective paths in the call tofrom_file()
, or alternatively, load the data directly with Pandas and assign it to the respective instance attributes.- COLUMNS = ('station', 'station_raw', 'year', 'day', 'date', 'sequence', 'type', 'compression', 'filesize', 'filetimeutc', 'network', 'basefolder')
The necessary information about each RINEX file.
- COMPRFILEEXTS = ('.Z', '.gz')
The valid (compressed) RINEX file extensions to search for.
- GLOBPATTERN = '[0-9][0-9][0-9][0-9]/[0-9][0-9][0-9]/*'
The
YYYY/DDD
folder pattern in a glob-readable format.
- METRICCOLS = ('number', 'age', 'recency', 'length', 'reliability')
The metrics that can be calculated.
- RINEXPATTERN = '(?P<site>\\w{4})(?P<day>\\d{3})(?P<sequence>\\w{1})\\.(?P<yy>\\d{2})(?P<type>\\w{1})\\.(?P<compression>\\w+)'
The regex-style filename pattern for RINEX files.
- calculate_availability_metrics(sampling=Timedelta('1 days 00:00:00'))[source]
Calculates the following metrics and stores them in the
metrics
DataFrame:'number'
: Number of available observations.'age'
: Time of first observation.'recency'
: Time of last observation.'length'
: Time between first and last observation.'reliability'
: Reliability defined as number of observations divided by the maximum amount of possible observations between the first and last acquisition given the assumed sampling interval of the data.
- classmethod from_file(db_file, locations_file=None, metrics_file=None, verbose=False)[source]
Convenience class method that creates a new RINEXDataHolding object from a file using
load_db_from_file()
and then optionally loads the locations and metrics from their respective files.- Parameters
- Return type
- Returns
The newly created RINEXDataHolding object.
- classmethod from_folders(folders, verbose=False, no_pbar=False)[source]
Convenience class method that creates a new RINEXDataHolding object and directly calls
load_db_from_folders()
.- Parameters
folders (
tuple
|list
[tuple
]) – Folder(s) in which the different year-folders are found, formatted as a single tuple or a list of tuples with name and respective folder ([('network1', '/folder/one/'), ...]
).verbose (
bool
, default:False
) – IfTrue
, print final database size and a sample entry.no_pbar (
bool
, default:False
) – Suppress the progress bar withTrue
.
- Return type
- Returns
The newly created RINEXDataHolding object.
- get_files_by(station=None, network=None, year=None, between=None, verbose=False)[source]
Return a subset of the database by criteria.
- Parameters
station (
UnionType
[str
,list
[str
],None
], default:None
) – Return only files of this/these station(s).network (
UnionType
[str
,list
[str
],None
], default:None
) – Return only files of this/these network(s).year (
UnionType
[int
,list
[int
],None
], default:None
) – Return only files of this/these year(s).between (
Optional
[tuple
], default:None
) – Return only files between the start and end date (inclusive) given by the length-two tuple.verbose (
bool
, default:False
) – IfTrue
, print the number of selected entries.
- Return type
- Returns
The DataFrame subset.
- get_rinex_header(filepath)[source]
Open a RINEX file, read the header, and format it as a dictionary. No data type conversion or stripping of whitespaces is performed.
- load_db_from_file(db_file, verbose=False)[source]
Loads a RINEXDataHolding object from a pickled Pandas DataFrame file.
- load_db_from_folders(folders, verbose=False, no_pbar=False)[source]
Loads a RINEX database from folders in the file system. The data should be located in one or multiple folder structure(s) organized by
YYYY/DDD
, whereYYYY
is a four-digit year andDDD
is the three-digit day of the year.- Parameters
folders (
tuple
|list
[tuple
]) – Folder(s) in which the different year-folders are found, formatted as a single tuple or a list of tuples with name and respective folder ([('network1', '/folder/one/'), ...]
).verbose (
bool
, default:False
) – IfTrue
, print final database size and a sample entry.no_pbar (
bool
, default:False
) – Suppress the progress bar withTrue
.
- Return type
- load_locations_from_file(filepath)[source]
Load a previously-saved DataFrame containing the locations of each station.
- load_locations_from_rinex(keep='last', replace_not_found=False, no_pbar=True)[source]
Scan the RINEX files’ headers for approximate locations for plotting purposes.
- Parameters
keep (
Literal
['last'
,'first'
,'mean'
], default:'last'
) – Determine which location to use. Possible values are'last'
(only scan the most recent file),'first'
(only scan the oldest file) or'mean'
(load all files and calculate average). Note that'mean'
could take a substantial amount of time, since all files have to opened, decompressed and searched.replace_not_found (
bool
, default:False
) – If a location is not found andreplace_not_found=True
, the location of Null Island (0° Longitude, 0° Latitude) is used and a warning is issued. IfFalse
, an error is raised instead.no_pbar (
bool
, default:True
) – Suppress the progress bar withTrue
.
- Return type
- load_metrics_from_file(filepath)[source]
Load a previously-saved DataFrame containing the calculated availability metrics.
- property locations_lla: DataFrame
Approximate positions of stations in WGS-84 (longitude [°], latitude [°], altitude [m]) coordinates.
- property locations_xyz: DataFrame
Dataframe of approximate positions of stations in WGS-84 (x, y, z) [m] coordinates.
- make_filenames(db)[source]
Recreate the full paths to the individual rinex files from the database or a subset thereof.
- Parameters
- Return type
- Returns
List of paths.
- Raises
NotImplementedError – If
GLOBPATTERN
orRINEXPATTERN
for this instance are not the same as the default values. In this case, redefine this function with the appropriate folder and file patterns.
- property metrics: DataFrame
Contains the station metric calculated by
calculate_availability_metrics()
.
- plot_availability(sampling=Timedelta('1 days 00:00:00'), sort_by_latitude=True, saveas=None)[source]
Create an availability figure for the dataset.
- Parameters
sampling (
Timedelta
, default:Timedelta('1 days 00:00:00')
) – Assume that breaks strictly larger thansampling
constitute a data gap.sort_by_latitude (
bool
, default:True
) – IfTrue
, sort the stations by latitude, else alphabetical. (Always falls back to alphabetical if location information is missing.)saveas (
str
, default:None
) – If provided, the figure will be saved at this location.
- Return type
- plot_map(metric=None, orientation='horizontal', annotate_stations=True, figsize=None, saveas=None, dpi=None, gui_kw_args={})[source]
Plot a map of all the stations present in the RINEX database. The markers can be colored by the different availability metrics calculated by
calculate_availability_metrics()
.- Parameters
metric (
Optional
[str
], default:None
) – Calculate the marker color (and respective colormap) given a certain metric. IfNone
, no color is applied.orientation (
Literal
['horizontal'
,'vertical'
], default:'horizontal'
) – Colorbar orientation, seecolorbar()
.annotate_stations (
bool
, default:True
) – IfTrue
, add the station names to the map.figsize (
Optional
[tuple
], default:None
) – Set the figure size (width, height) in inches.saveas (
Optional
[str
], default:None
) – If provided, the figure will be saved at this location.dpi (
Optional
[float
], default:None
) – Use this DPI for saved figures.gui_kw_args (
dict
[str
,Any
], default:{}
) – Override default GUI settings ofdefaults
.
- Return type
Timedelta
- class disstans.tools.Timedelta(*args, **kwargs)[source]
- static __new__(cls, *args, **kwargs)[source]
DISSTANS Timedelta subclassed from
Timedelta
but with support for the'Y'
year time unit, defined as always exactly 365.25 days. Other possible values are:W
,D
,days
,day
,hours
,hour
,hr
,h
,m
,minute
,min
,minutes
,T
,S
,seconds
,sec
,second
,ms
,milliseconds
,millisecond
,milli
,millis
,L
,us
,microseconds
,microsecond
,micro
,micros
,U
,ns
,nanoseconds
,nano
,nanos
,nanosecond
,N