Timeseries

This module contains the Timeseries base class and other formats included by default in DISSTANS.

Timeseries (Parent Class)

class disstans.timeseries.Timeseries(dataframe, src, data_unit, data_cols, var_cols=None, cov_cols=None, remove_initial_offset=False)[source]

Object that expands the functionality of a DataFrame object for better integration into DISSTANS. Apart from the data itself, it contains information about the source and units of the data. It also performs input checks and uses property setters/getters to ensure consistency.

Also enables the ability to perform math on timeseries directly.

Parameters

dataframe (DataFrame) – The timeseries’ data as a DataFrame. The index should be time, whereas data columns can be both data and their uncertainties.
src (str) – Source description.
data_unit (str) – Data unit.
data_cols (list[str]) – List of strings with the names of the columns of dataframe that contain the data. The length cooresponds to the number of components num_components.
var_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of dataframe that contain the data’s variance. Must have the same length as data_cols. None defaults to no data variance columns.
cov_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of dataframe that contain the data’s covariance. Must have length (num_components * (num_components - 1)) / 2, where the order of the elements is determined by their row-by-row, sequential position in the covariance matrix (see Notes). None defaults to no covariance columns.
remove_initial_offset (bool, default: False) – If True, the data timeseries will be shifted such that it starts at zero. The offset will be recorded in offset if it needs to be recovered.

Notes

In terms of mapping the covariance matrix of observations into the format for the Timeseries class, consider this example for observations with three components:

`var_cols[0]`	`cov_cols[0]`	`cov_cols[1]`
(symmetric)	`var_cols[1]`	`cov_cols[2]`
(symmetric)	(symmetric)	`var_cols[2]`

__add__(other)[source]

Special function that allows two timeseries instances (or a timeseries and an equivalently shaped NumPy array) to be added together element-wise.

Parameters: other (Timeseries) – Timeseries to add to instance.
Return type: Timeseries
Returns: New timeseries object containing the sum of the two timeseries.

See also

prepare_math: Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Add two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 + ts2

__getitem__(columns)[source]

Convenience special function that provides a shorthand notation to access the timeseries’ columns.

Parameters: columns (str | list[str]) – String or list of strings of the columns to return.
Return type: Series | DataFrame
Returns: Returns the requested data as a Series (if a single column) or DataFrame (if multiple columns).

Example

If ts is a Timeseries instance and columns a list of column names, the following two are equivalent:

ts.df[columns]
ts[ts_description]

__mul__(other)[source]

Special function that allows two timeseries instances (or a timeseries and an equivalently shaped NumPy array) to be multiplied together element-wise.

Parameters: other (Timeseries) – Timeseries to multiply to instance.
Return type: Timeseries
Returns: New timeseries object containing the product of the two timeseries.

See also

prepare_math: Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Multiply two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 * ts2

__radd__(other)[source]

Reflected operation of __add__() (necessary if first operand is a NumPy array).

Return type: Timeseries

__rmul__(other)[source]

Reflected operation of __mul__() (necessary if first operand is a NumPy array).

Return type: Timeseries

__rsub__(other)[source]

Reflected operation of __sub__() (necessary if first operand is a NumPy array).

Return type: Timeseries

__rtruediv__(other)[source]

Reflected operation of __truediv__() (necessary if first operand is a NumPy array).

Return type: Timeseries

__str__()[source]

Special function that returns a readable summary of the timeseries. Accessed, for example, by Python’s print() built-in function.

Return type: str
Returns: Timeseries summary.

__sub__(other)[source]

Special function that allows a timeseries instance (or a timeseries and an equivalently shaped NumPy array) to be subtracted from another element-wise.

Parameters: other (Timeseries) – Timeseries to subtract from instance.
Return type: Timeseries
Returns: New timeseries object containing the difference of the two timeseries.

See also

prepare_math: Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Subtract two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 - ts2

__truediv__(other)[source]

Special function that allows a timeseries instances (or a timeseries and an equivalently shaped NumPy array) to be divided by another element-wise.

Parameters: other (Timeseries) – Timeseries to divide instance by.
Return type: Timeseries
Returns: New timeseries object containing the quotient of the two timeseries.

See also

prepare_math: Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Divide two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 / ts2

add_uncertainties(timeseries=None, var_data=None, var_cols=None, cov_data=None, cov_cols=None)[source]

Add variance and covariance data and column names to the timeseries.

Parameters

timeseries (Optional[Timeseries], default: None) – Another timeseries object that contains uncertainty information. If set, the function will ignore the rest of the arguments.
var_data (Optional[ndarray], default: None) – New data variance.
var_cols (Optional[list[str]], default: None) – List of variance column names.
cov_data (Optional[ndarray], default: None) – New data covariance. Setting this but not var_data requires there to already be data variance.
cov_cols (Optional[list[str]], default: None) – List of covariance column names.

Return type

None

Notes

If ts is a Timeseries instance, just using:

ts.vars = new_variance
ts.covs = new_covariance

will only work when the respective columns already exist in the dataframe. (This is the same behavior for renaming variance columns that do not exist.) If they do not exist, the calls will results in an error because no column names exist, in an effort to make the inner workings more transparent and rigorous.

This function allows to override the default behavior, and can also generate column names by itself if none are specified.

convert_units(factor, new_data_unit)[source]

Convert the data and covariances to a new data unit by providing a conversion factor.

Parameters

factor (float) – Factor to multiply the data by to obtain the data in the new units.
new_data_unit (str) – New data unit to be saved in the data_cols attribute.

Return type

None

copy(only_data=False, src=None)[source]

Return a deep copy of the timeseries instance.

Parameters

only_data (bool, default: False) – If True, only copy the data columns and ignore any uncertainty information.
src (Optional[str], default: None) – Set a new source information attribute for the copy. Uses the current one if None.

Returns

The copy of the timeseries instance.

Return type

Timeseries

cov_at(t)[source]

Returns the covariance matrix of the timeseries at a given time or index.

Parameters: t (Timestamp | str | int) – A timestamp or timestamp-convertable string to return the covariance matrix for. Alternatively, an integer index.
Return type: ndarray
Returns: The full covariance matrix at time t.

property cov_cols: list[str] | None: List of the column names in df that contain data covariances.

property covs: DataFrame: Returns the covariances from df.

cut(t_min=None, t_max=None, i_min=None, i_max=None, keep_inside=True)[source]

Cut the timeseries to contain only data between certain times or indices. If both a minimum (maximum) timestamp or index is provided, the later (earlier, respectively) one is used (i.e., the more restrictive one). Also provides the reverse operation, i.e. only removing data between dates.

This operation changes the timeseries in-place; if it should be done on a new timeseries, use copy() first.

Parameters

t_min (UnionType[Timestamp, str, None], default: None) – A timestamp or timestamp-convertable string of the earliest observation to keep.
t_max (UnionType[Timestamp, str, None], default: None) – A timestamp or timestamp-convertable string of the latest observation to keep.
i_min (Optional[int], default: None) – The index of the earliest observation to keep.
i_max (Optional[int], default: None) – The index of the latest observation to keep.
keep_inside (bool, default: True) – If True, keeps data inside of the specified date range. If False, keeps only data outside the specified date range.

Return type

None

property data: DataFrame: View of only the data columns in df.

property data_cols: list[str]: List of the column names in df that contain data.

property data_unit: str: Data unit.

property df: DataFrame: The entire timeseries’ DataFrame.

classmethod from_array(timevector, data, src, data_unit, data_cols, var=None, var_cols=None, cov=None, cov_cols=None)[source]

Constructor method to create a Timeseries instance from a NumPy ndarray.

Parameters

timevector (Series | DatetimeIndex) – Series of Timestamp or alternatively a DatetimeIndex containing the timestamps of each observation.
data (ndarray) – 2D NumPy array of shape \((\text{n_observations},\text{n_components})\) containing the data.
src (str) – Source description.
data_unit (str) – Data unit.
data_cols (str) – List of strings with the names of the columns of data.
var (Optional[ndarray], default: None) – 2D NumPy array of shape \((\text{n_observations},\text{n_components})\) containing the data variances. None defaults to no data uncertainty.
var_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of data that contain the data’s variance. Must have the same length as data_cols. If var is given but var_cols is not, it defaults to appending '_var' to data_cols.
cov (Optional[ndarray], default: None) – 2D NumPy array of shape \((\text{n_observations},\text{n_components})\) containing the data covariances (as defined in Timeseries). None defaults to no data uncertainty.
cov_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of data that contain the data’s covariance. Must have the same length as data_cols. If cov is given but cov_cols is not, it defaults to appending '_cov' to the two respective entries of data_cols.

Return type

Timeseries

Returns

The generated Timeseries object.

See also

date_range(): Quick function to generate a timevector.

classmethod from_fit(data_unit, data_cols, fit)[source]

Import a fit dictionary and create a Timeseries instance.

Parameters

data_unit (str) – Data unit.
data_cols (list[str]) – List of strings containing the data column names. Uncertainty column names are generated by adding a ‘_var’.
fit (dict[str, Optional[ndarray]]) – Dictionary with the keys 'time', 'fit', 'var' and 'cov' (the latter two can be set to None).

Return type

Timeseries

Returns

Timeseries instance created from fit.

See also

disstans.models.Model.evaluate: Evaluating a model produces the fit dictionary.

get_arch()[source]

Build a dictionary describing the architecture of this timeseries, to be used when creating a network JSON configuration file.

Without subclassing Timeseries, this function will return an empty dictionary by default, since it is unknown how to recreate a general Timeseries object from just a JSON-compatible dictionary. :rtype: dict

See also

disstans.network.Network.to_json: Export the Network configuration as a JSON file.
disstans.timeseries.Timeseries.get_arch: Get the architecture dictionary of a Timeseries instance.
disstans.models.Model.get_arch: Get the architecture dictionary of a Model instance.

index_map: Matrix that contains the rolling indices of each matrix element used by get_cov_indices().

property length: timedelta64: Returns the length of the timeseries.

mask_out(dcol)[source]

Mask out an entire data column (and if present, its uncertainty column) by setting the entire column to NaN. Converts it to a sparse representation to save memory.

Parameters: dcol (str) – Name of the data column to mask out.
Return type: None

property num_components: int: Number of data columns.

property num_observations: int: Number of observations (rows in df).

offset: Offset applied to the timeseries data such that it starts at zero.

static prepare_math(left, right, operation)[source]

Tests two timeseries’ ability to be cast together in a mathematical operation, and returns output characteristics. Currently, only addition, subtraction, multiplication, and division are supported.

All uncertainty information is lost during mathematical operations.

One of the objects can be a NumPy array. In this case, the array has to have the exact same shape as the data in the Timeseries instance. Furthermore, the resulting Timeseries object will have the same src, data_unit and data_cols attributes (instead of a combination of both).

Parameters

left (Timeseries | ndarray) – Left term of the operation.
right (Timeseries | ndarray) – Right term of the operation.
operation (Literal['+', '-', '*', '/']) – Operation to perform.

Return type

tuple[ndarray, ndarray, str, str, list[str], Index]

Returns

left_data – View of the 2D left data array of the operation with shape (len(out_time), num_components).
right_data – View of the 2D right data array of the operation with shape (len(out_time), num_components).
out_src – Combines the sources of each object to a new string.
out_data_unit – Combines the data units of each object into a new unit.
out_data_cols – List of strings containing the new data column names.
out_time – Index object containing the indices of all timestamps common to both.

Raises

TypeError – If one of the operands is not a Timeseries or ndarray, or if both are ndarray (since then this function would never be called anyway).
ValueError – If the number of data columns is not equal between the two operands, or if the data units are not the same adding or subtracting.
AssertionError – If one of the operands is a NumPy array but does not have the same number of rows as the other operand.

Warning

This method is called under-the-hood whenever a mathematical operation is performed, and should not need to be used by normal users.

See also

__add__: Addition for two Timeseries or a Timeseries and a NumPy array
__radd__: Addition for a NumPy array and a Timeseries.
__sub__: Subtraction for two Timeseries or a Timeseries and a NumPy array
__rsub__: Subtraction for a NumPy array and a Timeseries.
__mul__: Multiplication for two Timeseries or a Timeseries and a NumPy array
__rmul__: Multiplication for a NumPy array and a Timeseries.
__truediv__: Division for two Timeseries or a Timeseries and a NumPy array
__rtruediv__: Division for a NumPy array and a Timeseries.

property reliability: float

Returns the reliability (between 0 and 1) defined as the number of available observations divided by the the number of expected observations. The expected observations are calculated by taking the median timespan between observations, and then dividing the total time span by that timespan.

(Essentially, this assumes that there are not any “close-by” observation, e.g. two observation for the same day but a different hour in a dataset of otherwise daily observations.)

property shape: tuple[int, int]: Returns the shape tuple (similar to NumPy) of the timeseries, which is of shape \((\text{n_observations},\text{n_components})\).

property sigmas: DataFrame: View of only the data standard deviation columns in df.

property src: str: Source information.

property time: Index: Timestamps of the timeseries (index of df).

property var_cols: list[str] | None: List of the column names in df that contain data variance.

property var_cov: DataFrame: Returns the variance as well as covariance columns from df, to be indexed by var_cov_map to yield the full variance-covariance matrix.

var_cov_map: Contains the column indices needed to create the full variance-covariance matrix for a single time.

property vars: DataFrame: Returns the variances from df.

Specialized Classes

GipsyTimeseries

class disstans.timeseries.GipsyTimeseries(path, show_warnings=True, data_unit='mm', **kw_args)[source]

Subclasses Timeseries.

Timeseries subclass for GNSS measurements in JPL’s Gipsy(X) .tseries file format.

Parameters

path (str) – Path to the timeseries file.
show_warnings (bool, default: True) – If True, warn if there are data inconsistencies encountered while loading.
data_unit (Literal['mm', 'm'], default: 'mm') – Can be 'mm' or 'm'.

Additional keyword arguments will be passed onto Timeseries.

Notes

The column format is described on JPL’s website:

Columns	Description
Column 1	Decimal year computed with 365.25 days/yr
Columns 2-4	East, North and Vertical [m]
Columns 5-7	East, North and Vertical standard deviation [m]
Columns 8-10	East, North and Vertical correlation [-]
Column 11	Time in Seconds past J2000
Columns 12-17	Time in YEAR MM DD HR MN SS

Time is GPS time, and the time series are relative to each station’s first epoch.

get_arch()[source]

Returns a JSON-compatible dictionary with all the information necessary to recreate the Timeseries instance (provided the data file is available).

Returns: JSON-compatible dictionary sufficient to recreate the GipsyTimeseries instance.
Return type: dict

See also

Timeseries.get_arch: For further information.

UNRTimeseries

class disstans.timeseries.UNRTimeseries(path, show_warnings=True, data_unit='mm', **kw_args)[source]

Subclasses Timeseries.

Timeseries subclass for GNSS measurements in UNR’s .tenv3 file format.

Parameters

path (str) – Path to the timeseries file.
show_warnings (bool, default: True) – If True, warn if there are data inconsistencies encountered while loading.
data_unit (Literal['mm', 'm'], default: 'mm') – Can be 'mm' or 'm'.

Additional keyword arguments will be passed onto Timeseries.

Notes

The column format is described on UNR’s website:

Columns	Description
Column 1	Station name
Column 2	Date
Column 3	Decimal year
Column 4	Modified Julian day
Columns 5-6	GPS week and day
Column 7	Longitude [°] of reference meridian
Columns 8-9	Easting [m] from ref. mer., integer and fraction
Columns 10-11	Northing [m] from equator, integer and fraction
Columns 12-13	Vertical [m], integer and fraction
Column 14	Antenna height [m]
Column 15-17	East, North, Vertical standard deviation [m]
Column 18	East-North correlation coefficient [-]
Column 19	East-Vertical correlation coefficient [-]
Column 20	North-Vertical correlation coefficient [-]

Newer files also contain the following three columns:

Column 21	Latitude [°]
Column 22	Longitude [°]
Column 23	Altitude [m]

The time series are relative to each station’s first integer epoch.

get_arch()[source]

Returns a JSON-compatible dictionary with all the information necessary to recreate the Timeseries instance (provided the data file is available).

Returns: JSON-compatible dictionary sufficient to recreate the UNRTimeseries instance.
Return type: dict

See also

Timeseries.get_arch: For further information.