Timeseries

This module contains the Timeseries base class and other formats included by default in DISSTANS.

Timeseries (Parent Class)

class disstans.timeseries.Timeseries(dataframe, src, data_unit, data_cols, var_cols=None, cov_cols=None, remove_initial_offset=False)[source]

Object that expands the functionality of a DataFrame object for better integration into DISSTANS. Apart from the data itself, it contains information about the source and units of the data. It also performs input checks and uses property setters/getters to ensure consistency.

Also enables the ability to perform math on timeseries directly.

Parameters
  • dataframe (DataFrame) – The timeseries’ data as a DataFrame. The index should be time, whereas data columns can be both data and their uncertainties.

  • src (str) – Source description.

  • data_unit (str) – Data unit.

  • data_cols (list[str]) – List of strings with the names of the columns of dataframe that contain the data. The length cooresponds to the number of components num_components.

  • var_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of dataframe that contain the data’s variance. Must have the same length as data_cols. None defaults to no data variance columns.

  • cov_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of dataframe that contain the data’s covariance. Must have length (num_components * (num_components - 1)) / 2, where the order of the elements is determined by their row-by-row, sequential position in the covariance matrix (see Notes). None defaults to no covariance columns.

  • remove_initial_offset (bool, default: False) – If True, the data timeseries will be shifted such that it starts at zero. The offset will be recorded in offset if it needs to be recovered.

Notes

In terms of mapping the covariance matrix of observations into the format for the Timeseries class, consider this example for observations with three components:

var_cols[0]

cov_cols[0]

cov_cols[1]

(symmetric)

var_cols[1]

cov_cols[2]

(symmetric)

(symmetric)

var_cols[2]

__add__(other)[source]

Special function that allows two timeseries instances (or a timeseries and an equivalently shaped NumPy array) to be added together element-wise.

Parameters

other (Timeseries) – Timeseries to add to instance.

Return type

Timeseries

Returns

New timeseries object containing the sum of the two timeseries.

See also

prepare_math

Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Add two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 + ts2
__getitem__(columns)[source]

Convenience special function that provides a shorthand notation to access the timeseries’ columns.

Parameters

columns (str | list[str]) – String or list of strings of the columns to return.

Return type

Series | DataFrame

Returns

Returns the requested data as a Series (if a single column) or DataFrame (if multiple columns).

Example

If ts is a Timeseries instance and columns a list of column names, the following two are equivalent:

ts.df[columns]
ts[ts_description]
__mul__(other)[source]

Special function that allows two timeseries instances (or a timeseries and an equivalently shaped NumPy array) to be multiplied together element-wise.

Parameters

other (Timeseries) – Timeseries to multiply to instance.

Return type

Timeseries

Returns

New timeseries object containing the product of the two timeseries.

See also

prepare_math

Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Multiply two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 * ts2
__radd__(other)[source]

Reflected operation of __add__() (necessary if first operand is a NumPy array).

Return type

Timeseries

__rmul__(other)[source]

Reflected operation of __mul__() (necessary if first operand is a NumPy array).

Return type

Timeseries

__rsub__(other)[source]

Reflected operation of __sub__() (necessary if first operand is a NumPy array).

Return type

Timeseries

__rtruediv__(other)[source]

Reflected operation of __truediv__() (necessary if first operand is a NumPy array).

Return type

Timeseries

__str__()[source]

Special function that returns a readable summary of the timeseries. Accessed, for example, by Python’s print() built-in function.

Return type

str

Returns

Timeseries summary.

__sub__(other)[source]

Special function that allows a timeseries instance (or a timeseries and an equivalently shaped NumPy array) to be subtracted from another element-wise.

Parameters

other (Timeseries) – Timeseries to subtract from instance.

Return type

Timeseries

Returns

New timeseries object containing the difference of the two timeseries.

See also

prepare_math

Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Subtract two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 - ts2
__truediv__(other)[source]

Special function that allows a timeseries instances (or a timeseries and an equivalently shaped NumPy array) to be divided by another element-wise.

Parameters

other (Timeseries) – Timeseries to divide instance by.

Return type

Timeseries

Returns

New timeseries object containing the quotient of the two timeseries.

See also

prepare_math

Prepares the two instances for the mathematical operation. Refer to it for more details about how the two objects are cast together.

Example

Divide two Timeseries ts1 and ts2 and save the result as ts3:

ts3 = ts1 / ts2
add_uncertainties(timeseries=None, var_data=None, var_cols=None, cov_data=None, cov_cols=None)[source]

Add variance and covariance data and column names to the timeseries.

Parameters
  • timeseries (Optional[Timeseries], default: None) – Another timeseries object that contains uncertainty information. If set, the function will ignore the rest of the arguments.

  • var_data (Optional[ndarray], default: None) – New data variance.

  • var_cols (Optional[list[str]], default: None) – List of variance column names.

  • cov_data (Optional[ndarray], default: None) – New data covariance. Setting this but not var_data requires there to already be data variance.

  • cov_cols (Optional[list[str]], default: None) – List of covariance column names.

Return type

None

Notes

If ts is a Timeseries instance, just using:

ts.vars = new_variance
ts.covs = new_covariance

will only work when the respective columns already exist in the dataframe. (This is the same behavior for renaming variance columns that do not exist.) If they do not exist, the calls will results in an error because no column names exist, in an effort to make the inner workings more transparent and rigorous.

This function allows to override the default behavior, and can also generate column names by itself if none are specified.

convert_units(factor, new_data_unit)[source]

Convert the data and covariances to a new data unit by providing a conversion factor.

Parameters
  • factor (float) – Factor to multiply the data by to obtain the data in the new units.

  • new_data_unit (str) – New data unit to be saved in the data_cols attribute.

Return type

None

copy(only_data=False, src=None)[source]

Return a deep copy of the timeseries instance.

Parameters
  • only_data (bool, default: False) – If True, only copy the data columns and ignore any uncertainty information.

  • src (Optional[str], default: None) – Set a new source information attribute for the copy. Uses the current one if None.

Returns

The copy of the timeseries instance.

Return type

Timeseries

cov_at(t)[source]

Returns the covariance matrix of the timeseries at a given time or index.

Parameters

t (Timestamp | str | int) – A timestamp or timestamp-convertable string to return the covariance matrix for. Alternatively, an integer index.

Return type

ndarray

Returns

The full covariance matrix at time t.

property cov_cols: list[str] | None

List of the column names in df that contain data covariances.

property covs: DataFrame

Returns the covariances from df.

cut(t_min=None, t_max=None, i_min=None, i_max=None, keep_inside=True)[source]

Cut the timeseries to contain only data between certain times or indices. If both a minimum (maximum) timestamp or index is provided, the later (earlier, respectively) one is used (i.e., the more restrictive one). Also provides the reverse operation, i.e. only removing data between dates.

This operation changes the timeseries in-place; if it should be done on a new timeseries, use copy() first.

Parameters
  • t_min (UnionType[Timestamp, str, None], default: None) – A timestamp or timestamp-convertable string of the earliest observation to keep.

  • t_max (UnionType[Timestamp, str, None], default: None) – A timestamp or timestamp-convertable string of the latest observation to keep.

  • i_min (Optional[int], default: None) – The index of the earliest observation to keep.

  • i_max (Optional[int], default: None) – The index of the latest observation to keep.

  • keep_inside (bool, default: True) – If True, keeps data inside of the specified date range. If False, keeps only data outside the specified date range.

Return type

None

property data: DataFrame

View of only the data columns in df.

property data_cols: list[str]

List of the column names in df that contain data.

property data_unit: str

Data unit.

property df: DataFrame

The entire timeseries’ DataFrame.

classmethod from_array(timevector, data, src, data_unit, data_cols, var=None, var_cols=None, cov=None, cov_cols=None)[source]

Constructor method to create a Timeseries instance from a NumPy ndarray.

Parameters
  • timevector (Series | DatetimeIndex) – Series of Timestamp or alternatively a DatetimeIndex containing the timestamps of each observation.

  • data (ndarray) – 2D NumPy array of shape \((\text{n_observations},\text{n_components})\) containing the data.

  • src (str) – Source description.

  • data_unit (str) – Data unit.

  • data_cols (str) – List of strings with the names of the columns of data.

  • var (Optional[ndarray], default: None) – 2D NumPy array of shape \((\text{n_observations},\text{n_components})\) containing the data variances. None defaults to no data uncertainty.

  • var_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of data that contain the data’s variance. Must have the same length as data_cols. If var is given but var_cols is not, it defaults to appending '_var' to data_cols.

  • cov (Optional[ndarray], default: None) – 2D NumPy array of shape \((\text{n_observations},\text{n_components})\) containing the data covariances (as defined in Timeseries). None defaults to no data uncertainty.

  • cov_cols (Optional[list[str]], default: None) – List of strings with the names of the columns of data that contain the data’s covariance. Must have the same length as data_cols. If cov is given but cov_cols is not, it defaults to appending '_cov' to the two respective entries of data_cols.

Return type

Timeseries

Returns

The generated Timeseries object.

See also

date_range()

Quick function to generate a timevector.

classmethod from_fit(data_unit, data_cols, fit)[source]

Import a fit dictionary and create a Timeseries instance.

Parameters
  • data_unit (str) – Data unit.

  • data_cols (list[str]) – List of strings containing the data column names. Uncertainty column names are generated by adding a ‘_var’.

  • fit (dict[str, Optional[ndarray]]) – Dictionary with the keys 'time', 'fit', 'var' and 'cov' (the latter two can be set to None).

Return type

Timeseries

Returns

Timeseries instance created from fit.

See also

disstans.models.Model.evaluate

Evaluating a model produces the fit dictionary.

get_arch()[source]

Build a dictionary describing the architecture of this timeseries, to be used when creating a network JSON configuration file.

Without subclassing Timeseries, this function will return an empty dictionary by default, since it is unknown how to recreate a general Timeseries object from just a JSON-compatible dictionary. :rtype: dict

See also

disstans.network.Network.to_json

Export the Network configuration as a JSON file.

disstans.timeseries.Timeseries.get_arch

Get the architecture dictionary of a Timeseries instance.

disstans.models.Model.get_arch

Get the architecture dictionary of a Model instance.

index_map

Matrix that contains the rolling indices of each matrix element used by get_cov_indices().

property length: timedelta64

Returns the length of the timeseries.

mask_out(dcol)[source]

Mask out an entire data column (and if present, its uncertainty column) by setting the entire column to NaN. Converts it to a sparse representation to save memory.

Parameters

dcol (str) – Name of the data column to mask out.

Return type

None

property num_components: int

Number of data columns.

property num_observations: int

Number of observations (rows in df).

offset

Offset applied to the timeseries data such that it starts at zero.

static prepare_math(left, right, operation)[source]

Tests two timeseries’ ability to be cast together in a mathematical operation, and returns output characteristics. Currently, only addition, subtraction, multiplication, and division are supported.

All uncertainty information is lost during mathematical operations.

One of the objects can be a NumPy array. In this case, the array has to have the exact same shape as the data in the Timeseries instance. Furthermore, the resulting Timeseries object will have the same src, data_unit and data_cols attributes (instead of a combination of both).

Parameters
Return type

tuple[ndarray, ndarray, str, str, list[str], Index]

Returns

  • left_data – View of the 2D left data array of the operation with shape (len(out_time), num_components).

  • right_data – View of the 2D right data array of the operation with shape (len(out_time), num_components).

  • out_src – Combines the sources of each object to a new string.

  • out_data_unit – Combines the data units of each object into a new unit.

  • out_data_cols – List of strings containing the new data column names.

  • out_time – Index object containing the indices of all timestamps common to both.

Raises
  • TypeError – If one of the operands is not a Timeseries or ndarray, or if both are ndarray (since then this function would never be called anyway).

  • ValueError – If the number of data columns is not equal between the two operands, or if the data units are not the same adding or subtracting.

  • AssertionError – If one of the operands is a NumPy array but does not have the same number of rows as the other operand.

Warning

This method is called under-the-hood whenever a mathematical operation is performed, and should not need to be used by normal users.

See also

__add__

Addition for two Timeseries or a Timeseries and a NumPy array

__radd__

Addition for a NumPy array and a Timeseries.

__sub__

Subtraction for two Timeseries or a Timeseries and a NumPy array

__rsub__

Subtraction for a NumPy array and a Timeseries.

__mul__

Multiplication for two Timeseries or a Timeseries and a NumPy array

__rmul__

Multiplication for a NumPy array and a Timeseries.

__truediv__

Division for two Timeseries or a Timeseries and a NumPy array

__rtruediv__

Division for a NumPy array and a Timeseries.

property reliability: float

Returns the reliability (between 0 and 1) defined as the number of available observations divided by the the number of expected observations. The expected observations are calculated by taking the median timespan between observations, and then dividing the total time span by that timespan.

(Essentially, this assumes that there are not any “close-by” observation, e.g. two observation for the same day but a different hour in a dataset of otherwise daily observations.)

property shape: tuple[int, int]

Returns the shape tuple (similar to NumPy) of the timeseries, which is of shape \((\text{n_observations},\text{n_components})\).

property sigmas: DataFrame

View of only the data standard deviation columns in df.

property src: str

Source information.

property time: Index

Timestamps of the timeseries (index of df).

property var_cols: list[str] | None

List of the column names in df that contain data variance.

property var_cov: DataFrame

Returns the variance as well as covariance columns from df, to be indexed by var_cov_map to yield the full variance-covariance matrix.

var_cov_map

Contains the column indices needed to create the full variance-covariance matrix for a single time.

property vars: DataFrame

Returns the variances from df.

Specialized Classes

GipsyTimeseries

class disstans.timeseries.GipsyTimeseries(path, show_warnings=True, data_unit='mm', **kw_args)[source]

Subclasses Timeseries.

Timeseries subclass for GNSS measurements in JPL’s Gipsy(X) .tseries file format.

Parameters
  • path (str) – Path to the timeseries file.

  • show_warnings (bool, default: True) – If True, warn if there are data inconsistencies encountered while loading.

  • data_unit (Literal['mm', 'm'], default: 'mm') – Can be 'mm' or 'm'.

Additional keyword arguments will be passed onto Timeseries.

Notes

The column format is described on JPL’s website:

Columns

Description

Column 1

Decimal year computed with 365.25 days/yr

Columns 2-4

East, North and Vertical [m]

Columns 5-7

East, North and Vertical standard deviation [m]

Columns 8-10

East, North and Vertical correlation [-]

Column 11

Time in Seconds past J2000

Columns 12-17

Time in YEAR MM DD HR MN SS

Time is GPS time, and the time series are relative to each station’s first epoch.

get_arch()[source]

Returns a JSON-compatible dictionary with all the information necessary to recreate the Timeseries instance (provided the data file is available).

Returns

JSON-compatible dictionary sufficient to recreate the GipsyTimeseries instance.

Return type

dict

See also

Timeseries.get_arch

For further information.

UNRTimeseries

class disstans.timeseries.UNRTimeseries(path, show_warnings=True, data_unit='mm', **kw_args)[source]

Subclasses Timeseries.

Timeseries subclass for GNSS measurements in UNR’s .tenv3 file format.

Parameters
  • path (str) – Path to the timeseries file.

  • show_warnings (bool, default: True) – If True, warn if there are data inconsistencies encountered while loading.

  • data_unit (Literal['mm', 'm'], default: 'mm') – Can be 'mm' or 'm'.

Additional keyword arguments will be passed onto Timeseries.

Notes

The column format is described on UNR’s website:

Columns

Description

Column 1

Station name

Column 2

Date

Column 3

Decimal year

Column 4

Modified Julian day

Columns 5-6

GPS week and day

Column 7

Longitude [°] of reference meridian

Columns 8-9

Easting [m] from ref. mer., integer and fraction

Columns 10-11

Northing [m] from equator, integer and fraction

Columns 12-13

Vertical [m], integer and fraction

Column 14

Antenna height [m]

Column 15-17

East, North, Vertical standard deviation [m]

Column 18

East-North correlation coefficient [-]

Column 19

East-Vertical correlation coefficient [-]

Column 20

North-Vertical correlation coefficient [-]

Newer files also contain the following three columns:

Column 21

Latitude [°]

Column 22

Longitude [°]

Column 23

Altitude [m]

The time series are relative to each station’s first integer epoch.

get_arch()[source]

Returns a JSON-compatible dictionary with all the information necessary to recreate the Timeseries instance (provided the data file is available).

Returns

JSON-compatible dictionary sufficient to recreate the UNRTimeseries instance.

Return type

dict

See also

Timeseries.get_arch

For further information.