Models
This module contains all models that can be used to fit the data or generate synthetic timeseries.
Model (Parent Class)
- class disstans.models.Model(num_parameters, regularize=False, time_unit=None, t_start=None, t_end=None, t_reference=None, zero_before=True, zero_after=True)[source]
Base class for models.
The class implements some common methods as to how one interacts with models, with the goal that subclasses of it can focus on as few details as possible.
A model, generally speaking, is defined by parameters (and optionally their covariance), and can be evaluated given a series of timestamps. Most models will make use of the
time_unit
andt_reference
attributes to relate time series into data space. Solvers can check theregularize
attribute to regularize the model during fitting.Models are usually linear (but can be overriden in subclasses), and adhere to the nomenclature
\[\mathbf{G}(\mathbf{t}) \cdot \mathbf{m} = \mathbf{d}\]where \(\mathbf{G}\) is the mapping matrix, \(\mathbf{t}\) is the time vector, \(\mathbf{m}\) are the model parameters, and \(\mathbf{d}\) are the observations.
Models are always active by default, i.e. they implement a certain functional form that can be evaluated at any time. By setting the
t_start
andt_end
attributes, this behavior can be changed, so that it is only defined on that interval, and is zero or continuous outside of that interval (see the attributeszero_before
andzero_after
).The usual workflow is to instantiate a model, then fit it to a timeseries, saving the parameters, and then evaluate it later. For synthetic timeseries, it is instantiated and the parameters are set manually.
A minimal user-defined subclass should look similar to
Polynomial
orExponential
. Three methods need to be provided: an__init__()
function that takes in any model-specific parameters and passes all other parameters into the parent class throughsuper().__init__()
, as well as bothget_mapping_single()
and_get_arch()
(see the base class’ documentation for expected in- and output).Appendix A.2 of [koehne23] describes in detail the approach to models in DISSTANS.
- Parameters
num_parameters (
int
) – Number of model parameters.regularize (
bool
, default:False
) – IfTrue
, regularization-capable solvers will regularize the parameters of this model.time_unit (
str
, default:None
) – Time unit for parameters. Refer toTimedelta
for more details.t_start (
UnionType
[str
,Timestamp
,None
], default:None
) – Sets the model start time (attributest_start
andt_start_str
).t_end (
UnionType
[str
,Timestamp
,None
], default:None
) – Sets the model end time (attributest_end
andt_end_str
).t_reference (
UnionType
[str
,Timestamp
,None
], default:None
) – Sets the model reference time (attributest_reference
andt_reference_str
).zero_before (
bool
, default:True
) – Defines whether the model is zero beforet_start
, or if the boundary value should be used (attributezero_before
).zero_after (
bool
, default:True
) – Defines whether the model is zero aftert_end
, or if the boundary value should be used (attributezero_after
).
References
- koehne23(1,2,3,4,5,6,7,8,9,10,11)
Köhne, T., Riel, B., & Simons, M. (2023). Decomposition and Inference of Sources through Spatiotemporal Analysis of Network Signals: The DISSTANS Python package. Computers & Geosciences, 170, 105247. doi:10.1016/j.cageo.2022.10524
- EVAL_PREDVAR_PRECISION = dtype('float32')
To reduce memory impact when estimating the full covariance of the predicted timeseries when calling
evaluate()
, this attribute is by default set to single precision, but can be changed to double precision if desired.
- __eq__(other)[source]
Special function that allows for the comparison of models based on their type and architecture, regardless of model parameters.
Example
>>> from disstans.models import Step, Sinusoid >>> step1, step2 = Step(["2020-01-01"]), Step(["2020-01-02"]) >>> sin1, sin2 = Sinusoid(1, "2020-01-01"), Sinusoid(1, "2020-01-01") >>> step1 == step2 False >>> sin1 == sin2 True
Note that obviously, the objects themselves are still different:
>>> step1 is step1 True >>> step1 is step2 False
See also
get_arch
Function used to determine the equality.
- __str__()[source]
Special function that returns a readable summary of the Model. Accessed, for example, by Python’s
print()
built-in function.- Return type
- Returns
Model summary.
- active_parameters
By default, all parameters in the model are considered active, and this attribute is set to
None
. Otherwise, this attribute contains an array of shape \((\text{num_parameters}, )\) withTrue
where parameters are active, andFalse
otherwise.
- convert_units(factor)[source]
Convert the parameter and covariances to a new unit by providing a conversion factor.
- copy(parameters=True, covariances=True, active_parameters=True)[source]
Copy the model object.
- Parameters
parameters (
bool
, default:True
) – IfTrue
, include the read-in parameters in the copy (par
), otherwise leave empty.covariances (
bool
, default:True
) – IfTrue
, include the read-in (co)variances in the copy (cov
), otherwise leave empty.active_parameters (
bool
, default:True
) – IfTrue
, include the active parameter setting in the copy (active_parameters
), otherwise leave empty.
- Return type
- Returns
A copy of the model, based on
get_arch()
.
- property cov: numpy.ndarray | None
Square array property with dimensions \(\text{num_parameters} * \text{num_components}\) that contains the parameter’s full covariance matrix as a NumPy array. The rows (and columns) are ordered such that they first correspond to the covariances between all components for the first parameter, then the covariance between all components for the second parameter, and so forth.
- evaluate(timevector, return_full_covariance=False)[source]
Evaluate the model given a time vector, calculating the predicted timeseries \(\mathbf{d} = \mathbf{Gm}\) and (if applicable) its formal covariance matrix \(\mathbf{C}_d^{\text{pred}} = \mathbf{G} \mathbf{C}_m \mathbf{G}^T\).
This method ignores the parameters being set invalid by
freeze()
.- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.return_full_covariance (
bool
, default:False
) – By default (False
) the covariances between timesteps are ignored, and the returned dictionary will only include the component variances and covariances for each timestep. IfTrue
, the full covariance matrix (between components and timesteps) will be returned instead.
- Return type
- Returns
Dictionary with the keys
time
containing the input time vector,fit
containing \(\mathbf{d}\),var
containing the formal variance (orNone
, if not present), andcov
containing the formal covariance (orNone
, if not present).fit
,var
andcov
(if notNone
) arendarray
objects. Ifreturn_full_covariance=True
,var
will be omitted and the full covariance matrix will be returned incov
.- Raises
RuntimeError – If the model parameters have not yet been set with
read_parameters()
.
- freeze(zero_threshold=1e-10)[source]
In case some parameters are estimated to be close to zero and should not be considered in future fits and evaluations, this function “freezes” the model by setting parameters below the threshold
zero_threshold
to be invalid. The mask will be kept inactive_parameters
.Only valid parameters will be used by
get_mapping()
andevaluate()
.- Parameters
zero_threshold (
float
, default:1e-10
) – Model parameters with absolute values belowzero_threshold
will be set to zero and set inactive.- Return type
See also
unfreeze
The reverse method.
- get_active_period(timevector)[source]
Given a time vector, return at each point whether the model is active.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
active – Array of same length as
timevector
, withTrue
where active.first – Index of the first active timestamp.
last – Index of the last active Timestamp.
- get_arch()[source]
Get a dictionary that describes the model fully and allows it to be recreated. Requires the model to be subclassed and implement a
_get_arch()
method that expands the base model keywords to the subclassed model details.- Return type
- Returns
Model keyword dictionary.
- Raises
NotImplementedError – If the model has not been subclassed and
_get_arch()
has not been added.
- get_mapping(timevector, return_observability=False, ignore_active_parameters=False)[source]
Builds the mapping matrix \(\mathbf{G}\) given a time vector \(\mathbf{t}\). Requires the model to be subclassed and implement a
get_mapping_single()
method.This method has multiple steps: it first checks the active period of the model using
get_active_period()
. Iftimevector
is outside the active period, it skips the actual calculation and returns an empty sparse matrix. If there is at least one timestamp where the model is active, it calls the actualget_mapping_single()
mapping matrix calculation method only for the timestamps where the model is active in order to reduce the computational load. Lastly, the dense, evaluated mapping matrix gets padded before and after with empty sparse matrices (if the model is zero outside its boundaries) or the values at the boundaries themselves.This method respects the parameters being set invalid by
freeze()
, and will interpret those parameters to be unobservable.- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.return_observability (
bool
, default:False
) – If true, the function will check if there are any all-zero columns, which would point to unobservable parameters, and return a boolean mask with the valid indices.ignore_active_parameters (
bool
, default:False
) – IfTrue
, do not set inactive parameters to zero to avoid estimation.
- Return type
- Returns
mapping – Sparse mapping matrix.
observable – Returned if
return_observability=True
. A boolean NumPy array of the same length asmapping
has columns.False
indicates (close to) all-zero columns (unobservable parameters).
- Raises
NotImplementedError – If the model has not been subclassed and
get_mapping_single()
has not been added.
- get_mapping_single(timevector)[source]
Build the mapping matrix \(\mathbf{G}\) given a time vector \(\mathbf{t}\) for the active period. Called inside
get_mapping()
.- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation. It can and should be assumed that all included timestamps are valid (i.e., defined by the model’szero_before
andzero_after
).- Return type
- Returns
Mapping matrix with the same number of rows as
timevector
andnum_parameters
columns.
- num_parameters
Number of parameters that define the model and can be solved for.
- property par: ndarray
Array property of shape \((\text{num_parameters}, \text{num_components})\) that contains the parameters as a NumPy array.
- read_parameters(parameters, covariances=None)[source]
Reads in the parameters \(\mathbf{m}\) (optionally also their covariance) and stores them in the instance attributes.
- Parameters
parameters (
ndarray
) – Model parameters of shape \((\text{num_parameters}, \text{num_components})\).covariances (
Optional
[ndarray
], default:None
) – Model component (co-)variances that can either have the same shape asparameters
, in which case every parameter and component only has a variance, or it is square with dimensions \(\text{num_parameters} * \text{num_components}\), in which case it represents a full variance-covariance matrix.
- Return type
- regularize
Indicate to solvers to regularize this model (
True
) or not.
- t_end_str
String representation of the end time (or
None
).
- t_reference_str
String representation of the reference time (or
None
).
- t_start_str
String representation of the start time (or
None
).
- time_unit
Stores the time unit of the parameters as a string.
- tvec_to_numpycol(timevector)[source]
Convenience wrapper for
tvec_to_numpycol()
for Model objects that have thetime_unit
andt_reference
attributes set. :rtype:ndarray
See also
tvec_to_numpycol()
Convert a Timestamp vector into a NumPy array.
- unfreeze()[source]
Resets previous model freezing done by
freeze()
such that all parameters are active again. :rtype:None
See also
freeze
The reverse method.
- property var: numpy.ndarray | None
Array property of shape \((\text{num_parameters}, \text{num_components})\) that returns the parameter’s individual variances as a NumPy array.
- zero_after
If
True
, model will evaluate to zero after the end time, otherwise the model value at the end time will be used for all times after that.
- zero_before
If
True
, model will evaluate to zero before the start time, otherwise the model value at the start time will be used for all times before that.
- disstans.models.check_model_dict(models)[source]
Checks whether a dictionary has the appropriate structure to be used to create
Model
objects.- Parameters
models (
dict
[str
,dict
]) – Dictionary of structure{model_name: {"type": modelclass, "kw_args": {**kw_args}}}
that contains the names, types and necessary keyword arguments to create each model object.- Raises
AssertionError – If the dictionary structure is invalid.
- Return type
Model Collection
- class disstans.models.ModelCollection[source]
Class that contains
Model
objects and is mainly used to keep track of across-model variables and relations such as the cross-model covariances. It also contains convenience functions that wrap individual models’ functions likeevaluate()
,get_mapping()
orread_parameters()
or attributes likepar
.- EVAL_PREDVAR_PRECISION = dtype('float32')
To reduce memory impact when estimating the full covariance of the predicted timeseries when calling
evaluate()
, this attribute is by default set to single precision, but can be changed to double precision if desired.
- __contains__(model_description)[source]
Special function that allows to check whether a certain model description is in the collection.
- Return type
Example
If
mc
is aModelCollection
instance, and we want to check whether'mymodel'
is a model in the collection, the following two are equivalent:# long version 'mymodel' in mc.collection # short version 'mymodel' in mc
- __delitem__(model_description)[source]
Convenience special function to delete a model contained in
collection
. Deleting a model forces the collection’s parameter and covariance matrices to be reset toNone
.- Return type
- __eq__(other)[source]
Special function that allows for the comparison of model collection based on their contents, regardless of model parameters.
See also
disstans.models.Model.__eq__
For more details.
- __getitem__(model_description)[source]
Convenience special function to the models contained in
collection
.- Return type
- __iter__()[source]
Convenience special function that allows for a shorthand notation to quickly iterate over all models in
collection
.Example
If
mc
is aModelCollection
instance, then the following two loops are equivalent:# long version for model in mc.collection.values(): pass # shorthand for model in mc: pass
- __len__()[source]
Special function that gives quick access to the number of models in the collection using Python’s built-in
len()
function to make interactions with iterators easier.- Return type
- __setitem__(model_description, model)[source]
Convenience special function to add or update a model contained in
collection
. Setting or updating a model forces the collection’s parameter and covariance matrices to be reset toNone
.- Return type
- __str__()[source]
Special function that returns a readable summary of the model collection. Accessed, for example, by Python’s
print()
built-in function.- Return type
- Returns
Model collection summary.
- property active_parameters: ndarray
Either
None
, if all parameters are active, or an array of shape \((\text{num_parameters}, )\) that containsTrue
for all active parameters, andFalse
otherwise.
- static build_LS(ts, G, obs_mask, icomp=None, return_W_G=False, use_data_var=True, use_data_cov=True)[source]
Helper function that builds the necessary matrices to solve the least-squares problem for the observable parameters given observations.
If the problem only attempts to solve a single data component (by specifying its index in
icomp
), it simply takes the input mapping matrix \(\mathbf{G}\), creates the weight matrix \(\mathbf{W}\), and computes \(\mathbf{G}^T \mathbf{W} \mathbf{G}\) as well as \(\mathbf{G}^T \mathbf{W} \mathbf{d}\).If the problem is joint, i.e. there are multiple data components with covariance between them, this function brodcasts the mapping matrix to the components, creates the multi-component weight matrix, and then computes those same \(\mathbf{G}^T \mathbf{W} \mathbf{G}\) and \(\mathbf{G}^T \mathbf{W} \mathbf{d}\) matrices.
- Parameters
ts (
Timeseries
) – The timeseries whose time indices are used to calculate the mapping matrix.G (
spmatrix
) – Single-component mapping matrix.obs_mask (
ndarray
) – Observability mask.icomp (
Optional
[int
], default:None
) – If provided, the integer index of the component of the data to be fitted.return_W_G (
bool
, default:False
) – IfTrue
, also return the \(\mathbf{G}\) and \(\mathbf{W}\) matrices, reduced to the observable parameters and given observations.use_data_var (
bool
, default:True
) – IfTrue
, use the data variance if present. IfFalse
, ignore it even if it is present.use_data_cov (
bool
, default:True
) – IfTrue
, use the data covariance if present. IfFalse
, ignore it even if it is present.
- Return type
- Returns
G – (If
return_W_G=True
.) Reduced \(\mathbf{G}\) matrix.W – (If
return_W_G=True
.) Reduced \(\mathbf{W}\) matrix.GtWG – Reduced \(\mathbf{G}^T \mathbf{W} \mathbf{G}\) matrix.
GtWd – Reduced \(\mathbf{G}^T \mathbf{W} \mathbf{d}\) matrix.
See also
prepare_LS
Function that precedes this one in the regular solving process.
- convert_units(factor)[source]
Convert the parameter and covariances to a new unit by providing a conversion factor.
- copy(parameters=True, covariances=True, active_parameters=True)[source]
Copy the model collection object.
- Parameters
parameters (
bool
, default:True
) – IfTrue
, include the read-in parameters in the copy (par
), otherwise leave empty.covariances (
bool
, default:True
) – IfTrue
, include the read-in (co)variances in the copy (cov
), otherwise leave empty.active_parameters (
bool
, default:True
) – IfTrue
, include the active parameter setting in the copy (active_parameters
), otherwise leave empty.
- Return type
- Returns
A copy of the model collection, based on the individual models’
copy()
method.
- property cov: ndarray
Square array property with dimensions \(\text{num_elements} * \text{num_components}\) that contains the parameter’s full covariance matrix as a NumPy array. The rows (and columns) are ordered such that they first correspond to the covariances between all components for the first parameter, then the covariance between all components for the second parameter, and so forth.
- evaluate(timevector, return_full_covariance=False)
Evaluate the model given a time vector, calculating the predicted timeseries \(\mathbf{d} = \mathbf{Gm}\) and (if applicable) its formal covariance matrix \(\mathbf{C}_d^{\text{pred}} = \mathbf{G} \mathbf{C}_m \mathbf{G}^T\).
This method ignores the parameters being set invalid by
freeze()
.- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.return_full_covariance (
bool
, default:False
) – By default (False
) the covariances between timesteps are ignored, and the returned dictionary will only include the component variances and covariances for each timestep. IfTrue
, the full covariance matrix (between components and timesteps) will be returned instead.
- Return type
- Returns
Dictionary with the keys
time
containing the input time vector,fit
containing \(\mathbf{d}\),var
containing the formal variance (orNone
, if not present), andcov
containing the formal covariance (orNone
, if not present).fit
,var
andcov
(if notNone
) arendarray
objects. Ifreturn_full_covariance=True
,var
will be omitted and the full covariance matrix will be returned incov
.- Raises
RuntimeError – If the model parameters have not yet been set with
read_parameters()
.
- freeze(model_list=None, zero_threshold=1e-10)[source]
Convenience function that calls
freeze()
for all models (or a subset thereof) contained in the collection.- Parameters
- Return type
- classmethod from_model_dict(model_dict)[source]
Creates an empty
ModelCollection
object, and adds a dictionary of model objects to it.- Parameters
model_dict (
dict
[str
,Model
]) – Dictionary with model names as keys, andModel
object instances as values.- Return type
- Returns
The new
ModelCollection
object.
- get_arch()[source]
Get a dictionary that describes the model collection fully and allows it to be recreated.
- Return type
- Returns
Model keyword dictionary.
- get_mapping(timevector, return_observability=False, ignore_active_parameters=False)[source]
Builds the mapping matrix \(\mathbf{G}\) given a time vector \(\mathbf{t}\) by concatenating the individual mapping matrices from each contained model using their method
get_mapping()
(see for more details).This method respects the parameters being set invalid by
freeze()
, and will interpret those parameters to be unobservable.- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.return_observability (
bool
, default:False
) – If true, the function will check if there are any all-zero columns, which would point to unobservable parameters, and return a boolean mask with the valid indices.ignore_active_parameters (
bool
, default:False
) – IfTrue
, do not set inactive parameters to zero to avoid estimation.
- Return type
- Returns
mapping – Sparse mapping matrix.
observable – Returned if
return_observability=True
. A boolean NumPy array of the same length asmapping
has columns.False
indicates (close to) all-zero columns (unobservable parameters).
- property internal_scales: ndarray
Array of shape \((\text{num_parameters}, )\) that collects all the models’ internal scales.
- items()[source]
Convenience function that returns a key-value-iterator from
collection
.- Return type
- property num_parameters: int
Number of parameters in the model collection, calculated as the sum of all the parameters in the contained models.
- property par: ndarray
Array property of shape \((\text{num_parameters}, \text{num_components})\) that contains the parameters as a NumPy array.
- plot_covariance(title=None, fname=None, use_corr_coef=False, plot_empty=True, save_kw_args={'format': 'png'})[source]
Plotting method that displays the covariance (or correlation coefficient) matrix. The axes are labeled by model names for easier interpretation.
- Parameters
title (
Optional
[str
], default:None
) – If provided, the title that is added to the figure.fname (
Optional
[str
], default:None
) – By default (None
), the figure is shown interarctively to enable zooming in etc. If anfname
is provided, the figure is instead directly saved to the provided filename.use_corr_coef (
bool
, default:False
) – By default (False
), the method plots the covariance matrix. IfTrue
, the correlation coefficient matrix is plotted instead.plot_empty (
bool
, default:True
) – By default (True
), the full matrix is plotted. If it is sparse, it will be hard to identify the interplay between the different parameters. Therefore, settingplot_empty=False
will only plot the rows and columns corresponding to nonzero parameters.save_kw_args (
dict
, default:{'format': 'png'}
) – Additional keyword arguments passed tosavefig()
, used whenfname
is provided.
- Return type
- prepare_LS(ts, include_regularization=True, reweight_init=None, use_internal_scales=False, check_constraints=False)[source]
Helper function that concatenates the mapping matrices of the collection models given the timevector in in the input timeseries, and returns some relevant sizes.
It can also combine the regularization masks and reshape the input weights into the format used by the solvers, optionally taking into account the internal scales.
- Parameters
ts (
Timeseries
) – The timeseries whose time indices are used to calculate the mapping matrix.include_regularization (
bool
, default:True
) – IfTrue
, expands the returned variables (see below) and computes the regularization mask for the observable parameters only.reweight_init (
UnionType
[ndarray
,list
[ndarray
],dict
[str
,ndarray
],None
], default:None
) – Contains the initial weights for the current iteration of the least squares problem. It can be a Numpy array or a list of Numpy arrays, in which case it (or the array created by concatenating the list) need to already have the right output shape (no check is performed). If it is a dictionary, the keys need to be model names, and the values are then the Numpy arrays which will be arranged properly to match the mapping matrix.use_internal_scales (
bool
, default:False
) – IfTrue
, also return the internal model scales, subset to the observable and regularized parameters.check_constraints (
bool
, default:False
) – IfTrue
, also return an array that contains the signs that should be enforced for the parameters.
- Return type
tuple
[spmatrix
,ndarray
,int
,int
,int
,int
,int
,ndarray
,ndarray
,ndarray
,ndarray
]- Returns
G – Mapping matrix computed by
get_mapping()
.obs_mask – Observability mask computed by
get_mapping()
.num_time – Length of the timeseries.
num_params – Number of total parameters present in the model collection.
num_comps – Number of components in the timeseries.
num_obs – Number of observable parameters.
num_reg – (Only if
include_regularization=True
.) Number of observable and regularized parameters.reg_mask – (Only if
include_regularization=True
.) Numpy array of shape \((\text{num_obs}, )\) that for each observable parameter denotes whether that parameter is regularized (True
) or not.init_weights – (Only if
include_regularization=True
.) Numpy array of shape \((\text{num_reg}, )\) that for each observable and regularized parameter contains the initial weights.None
ifreweight_init=None
.weights_scaling – (Only if
include_regularization=True
.) Numpy array of shape \((\text{num_reg}, )\) that for each observable and regularized parameter contains the internal model scale.None
ifuse_internal_scales=False
.sign_constraints – (Only if
check_constraints=True
.) Numpy array of shape \((\text{num_obs}, \text{num_comps})\) that for each observable parameter denotes whether it should be positive (+1
), negative (-1
), or unconstrained (NaN
).
See also
build_LS
Function that follows this one in the regular solving process.
- read_parameters(parameters, covariances=None)[source]
Reads in the entire collection’s parameters \(\mathbf{m}\) (optionally also their covariance) and stores them in the instance attributes.
- Parameters
parameters (
ndarray
) – Model collection parameters of shape \((\text{num_parameters}, \text{num_components})\).covariances (
Optional
[ndarray
], default:None
) – Model collection component (co-)variances that can either have the same shape asparameters
, in which case every parameter and component only has a variance, or it is square with dimensions \(\text{num_parameters} * \text{num_components}\), in which case it represents a full variance-covariance matrix.
- Return type
- property regularized_mask: ndarray
A boolean array mask of shape \((\text{num_parameters}, )\) where
True
denotes a regularized parameter (False
otherwise``).
- unfreeze(model_list=None)[source]
Convenience function that calls
unfreeze()
for all models (or a subset thereof) contained in the collection.
Fit Collection
- class disstans.models.FitCollection(*args, **kw_args)[source]
Class that contains
Timeseries
model fits, and can be used just like adict
. Has an additionalallfits
attribute that stores the sum of all fits.- allfits
Attribute that contains the sum of all fits in
FitCollection
.
Basic Models
Arctangent
- class disstans.models.Arctangent(tau, t_reference, time_unit='D', zero_before=False, zero_after=False, **model_kw_args)[source]
Subclasses
Model
.This model provides the arctangent \(\arctan(\mathbf{t}/\tau)\), stretched with a given time constant and normalized to be between \((0, 1)\) at the limits.
Because this model is usually transient, it is recommended not to use it in the estimation of parameters, even when using
t_start
andt_end
to make the tails constant (since that introduces high-frequency artifacts). (Usingt_start
and/ort_end
might be desirable for creating syntehtic data, however.)- Parameters
tau (
float
) – Arctangent time constant \(\tau\). It represents the time at which, after zero-crossing at the reference time, the arctangent reaches the value \(\pi/4\) (before model scaling), i.e. half of the one-sided amplitude.
See
Model
for attribute descriptions and more keyword arguments.- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\left( \frac{1}{\pi} \arctan \left( \frac{t}{\tau} \right) + 0.5 \right)\]where \(\tau\) is the arctangent time constant.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- tau
Arctangent time constant.
Exponential
- class disstans.models.Exponential(tau, t_reference, sign_constraint=None, time_unit='D', t_start=None, zero_after=False, **model_kw_args)[source]
Subclasses
Model
.This model provides the “geophysical” exponential \(1-\exp(-\mathbf{t}/\tau)\) with a given time constant, zero for \(\mathbf{t} < 0\), and approaching one asymptotically.
- Parameters
tau (
float
|list
[float
] |ndarray
) – Exponential time constant(s) \(\tau\). It represents the amount of time that it takes for the (general) exponential function’s value to be multiplied by \(e\). Applied to this model, for a given relative amplitude \(a\) (so \(0 < a < 1\), before model scaling) to be reached at given \(\Delta t\) pastt_start
, \(\tau = - \frac{\Delta t}{\ln(1 - a)}\)sign_constraint (
UnionType
[int
,list
[int
],None
], default:None
) – Can be+1
or-1
, and tells the solver to constrain fitted parameters to this sign, avoiding sign flips between individual exponentials. This is useful if the resulting curve should be monotonous. It can also be a list, where each entry applies to one data component (needs to be known at initialization). IfNone
, no constraint is enforced.
See
Model
for attribute descriptions and more keyword arguments.- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\left( 1 - \exp \left( -\frac{t}{\tau} \right) \right)\]where \(\tau\) is the exponential time constant.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- sign_constraint
Flag whether the sign of the fitted parameters should be constrained.
- tau
Exponential time constant(s).
Hyperbolic Tangent
- class disstans.models.HyperbolicTangent(tau, t_reference, time_unit='D', zero_before=False, zero_after=False, **model_kw_args)[source]
Subclasses
Model
.This model provides the hyprbolic tangent \(\arctan(\mathbf{t}/\tau)\), stretched with a given time constant and normalized to be between \((0, 1)\) at the limits.
- Parameters
tau (
float
) – Time constant \(\tau\). To determine the constant from a characteristic time scale \(T\) and a percentage \(0<q<1\) of the fraction of magnitude change to have happened in that time scale (as counted in both directions from the reference time, and given the specified time unit), use the following formula: \(\tau = T / \left(2 \tanh^{-1} q \right)\).
See
Model
for attribute descriptions and more keyword arguments.- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\left( \frac{1}{2} \tanh \left( \frac{t}{\tau} \right) + 0.5 \right)\]where \(\tau\) is the hyperbolic tangent time constant.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- tau
Time constant.
Logarithmic
- class disstans.models.Logarithmic(tau, t_reference, sign_constraint=None, time_unit='D', t_start=None, zero_after=False, **model_kw_args)[source]
Subclasses
Model
.This model provides the “geophysical” logarithmic \(\ln(1 + \mathbf{t}/\tau)\) with a given time constant and zero for \(\mathbf{t} < 0\).
- Parameters
tau (
float
|list
[float
] |ndarray
) – Logarithmic time constant(s) \(\tau\). It represents the time at which, after zero-crossing at the reference time, the logarithm reaches the value 1 (before model scaling).sign_constraint (
UnionType
[int
,list
[int
],None
], default:None
) – Can be+1
or-1
, and tells the solver to constrain fitted parameters to this sign, avoiding sign flips between individual logarithms. This is useful if the resulting curve should be monotonous. It can also be a list, where each entry applies to one data component (needs to be known at initialization). IfNone
, no constraint is enforced.
See
Model
for attribute descriptions and more keyword arguments.- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\log \left( 1 + \frac{t}{\tau} \right)\]where \(\tau\) is the logairthmic time constant.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- sign_constraint
Flag whether the sign of the fitted parameters should be constrained.
- tau
Logarithmic time constant(s).
Polynomial
- class disstans.models.Polynomial(order, t_reference, min_exponent=0, time_unit='D', zero_before=False, zero_after=False, **model_kw_args)[source]
Subclasses
Model
.Polynomial model of given order.
- Parameters
See
Model
for attribute descriptions and more keyword arguments.- get_exp_index(exponent)[source]
Return the row index for
par
andvar
of a given exponent.- Parameters
exponent (
int
) – Exponent for which to return the index. E.g.,1
would correspond to the index of the linear term.- Return type
- Returns
Index of the exponent’s term.
- Raises
ValueError – Raised if the desired exponent is not present in the model..
- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[t^l\]where \(l\) are the integer exponents of the model.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
Step
- class disstans.models.Step(steptimes, zero_after=False, **model_kw_args)[source]
Subclasses
Model
.Model that introduces steps at discrete times.
- Parameters
steptimes (
list
[str
]) – List of datetime-like strings that can be converted intoTimestamp
. Length of it equals the number of model parameters.
See
Model
for attribute descriptions and more keyword arguments.- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[H \left( t - t_l^{\text{step}} \right)\]where \(H \left( t \right)\) is the Heaviside step function and \(t_l^{\text{step}}\) are the step times.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- steptimes
List of step times as datetime-like strings.
Sinusoid
- class disstans.models.Sinusoid(period, t_reference, time_unit='D', **model_kw_args)[source]
Subclasses
Model
.This model defines a fixed-frequency periodic sinusoid signal with constant amplitude and phase to be estimated.
See
Model
for attribute descriptions and more keyword arguments.Notes
Implements the relationship
\[\mathbf{g}(\mathbf{t}) = A \cos ( 2 \pi \mathbf{t} / T - \phi ) = a \cos ( 2 \pi \mathbf{t} / T ) + b \sin ( 2 \pi \mathbf{t} / T )\]with
period
\(T\),phase
\(\phi=\text{atan2}(b,a)\) andamplitude
\(A=\sqrt{a^2 + b^2}\).- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\left( \cos \left( \omega t \right), \sin \left( \omega t \right) \right)\]where \(\omega\) is the period of the sinusoid.
See Appendix A.2.2 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- period
Period of the sinusoid.
Spline Models
BSpline
- class disstans.models.BSpline(degree, scale, t_reference, regularize=True, time_unit='D', num_splines=1, spacing=None, obs_scale=1.0, **model_kw_args)[source]
Subclasses
Model
.Model defined by cardinal, centralized B-Splines of certain order/degree and time scale. Used for transient temporary signals that return to zero after a given time span.
- Parameters
degree (
int
) – Degree of the B-Splines.scale (
float
) – Scale of the B-Splines, see Notes.t_reference (
str
|Timestamp
) – Reference (center) time for (first) spline.time_unit (
str
, default:'D'
) – Time unit of scale, spacing and model parameters.num_splines (
int
, default:1
) – Number of splines, separated byspacing
.spacing (
Optional
[float
], default:None
) – Spacing between the center times when multiple splines are created.None
defaults toscale
.obs_scale (
float
, default:1.0
) – Determines how many factors ofscale
should be sampled by thetimevector
input toget_mapping()
to accept an individual spline as observable.
See
Model
for attribute descriptions and more keyword arguments.Notes
For an analytic representation of the B-Splines, see [butzer88] or [schoenberg73]. Further examples can be found at https://bsplines.org/flavors-and-types-of-b-splines/.
It is important to note that the function will be non-zero on the interval
\[-(p+1)/2 < x < (p+1)/2\]where \(p\) is the degree of the cardinal B-spline (and the degree of the resulting polynomial). The order \(n\) is related to the degree by the relation \(n = p + 1\). The scale determines the width of the spline in the time domain, and corresponds to the interval [0, 1] of the B-Spline. The full non-zero time span of the spline is therefore \(\text{scale} \cdot (p+1) = \text{scale} \cdot n\).
num_splines
will increase the number of splines by shifting the reference point \((\text{num_splines} - 1)\) times by the spacing (which must be given in the same units as the scale).If no spacing is given but multiple splines are requested, the scale will be used as the spacing.
References
- butzer88(1,2)
Butzer, P., Schmidt, M., & Stark, E. (1988). Observations on the History of Central B-Splines. Archive for History of Exact Sciences, 39(2), 137-156. Retrieved May 14, 2020, from https://www.jstor.org/stable/41133848
- schoenberg73(1,2)
Schoenberg, I. J. (1973). Cardinal Spline Interpolation. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611970555
- degree
Degree \(p\) of the B-Splines.
- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\sum_{k=0}^{n} \frac{{\left( -1 \right)}^{k}}{p!} \cdot \binom{n}{k} \cdot {\left( t_j^\prime + \frac{n}{2} - k \right)}^p\]where \(p\) is the degree and \(n=p+1\) is the order (see [butzer88] and [schoenberg73]). \(t^\prime\) is the normalized time:
\[t_j^\prime = \frac{ \left( t - t_{\text{ref}} \right) - j \cdot \rho}{\rho}\]where \(t_{\text{ref}}\) and \(\rho\) are the model’s reference time and timescale, respectively.
See Appendix A.2.3 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- get_transient_period(timevector)[source]
Returns a mask-like array of where each spline is currently transient (not staying constant).
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
NumPy array with
True
when a spline is currently transient,False
otherwise.
- observability_scale
Observability scale factor.
- order
Order \(n=p+1\) of the B-Splines.
- scale
Scale of the splines.
- spacing
Spacing between the center times of the splines.
ISpline
- class disstans.models.ISpline(degree, scale, t_reference, regularize=True, time_unit='D', num_splines=1, spacing=None, zero_after=False, obs_scale=1.0, **model_kw_args)[source]
Subclasses
Model
.Integral of cardinal, centralized B-Splines of certain order/degree and time scale, with an amplitude of 1. The degree \(p\) given in the initialization is the degree of the spline before the integration, i.e. the resulting ISpline is a piecewise polynomial of degree \(p + 1\). Used for transient permanent signals that stay at their maximum value after a given time span.
See also
disstans.models.BSpline
More details about B-Splines and the available keyword arguments.
- degree
Degree \(p\) of the B-Splines.
- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\sum_{k=0}^{n} \frac{{\left( -1 \right)}^{k}}{\left( p+1 \right) !} \cdot \binom{n}{k} \cdot {\left( t_j^\prime + \frac{n}{2} - k \right)}^{p+1}\]which is the integral over time of
get_mapping_single()
.See Appendix A.2.3 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- get_transient_period(timevector)[source]
Returns a mask-like array of where each spline is currently transient (not staying constant).
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
NumPy array with
True
when a spline is currently transient,False
otherwise.
- observability_scale
Observability scale factor.
- order
Order \(n=p+1\) of the B-Splines.
- scale
Scale of the splines.
- spacing
Spacing between the center times of the splines.
BaseSplineSet
- class disstans.models.BaseSplineSet(splines, internal_scaling=True, **model_kw_args)[source]
Subclasses
Model
.Raw container class for lists of splines, which are usually created with subclasses like
SplineSet
. The common functionalities are implemented here.- Parameters
splines (
list
[BSpline
|ISpline
]) – List of spline model objects.internal_scaling (
bool
, default:True
) – By default, in order to influence the tradeoff between splines of different timescales, the mapping matrix of each spline is scaled by its own time scale to promote using fewer components. Without this, there would be an ambiguity for the solver as to whether fit the signal using many smaller scales or with one large scale, as the fit would be almost identical. This behavior can be disabled by settinginternal_scaling=False
.
See
Model
for attribute descriptions and more keyword arguments.- freeze(zero_threshold=1e-10)[source]
In case some parameters are estimated to be close to zero and should not be considered in future fits and evaluations, this function “freezes” the model by setting parameters below the threshold
zero_threshold
to be invalid. The mask will be kept inactive_parameters
.Only valid parameters will be used by
get_mapping()
andevaluate()
.- Parameters
zero_threshold (
float
, default:1e-10
) – Model parameters with absolute values belowzero_threshold
will be set inactive.- Return type
See also
unfreeze
The reverse method.
- get_mapping_single(timevector)[source]
Calculate the mapping factors at times by accumulating the mapping factors of the different scales.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- internal_scales
If
internal_scaling
isTrue
, this NumPy array holds the relative scaling factors of all parameters over all the sub-splines.
- internal_scaling
Trackes whether to scale the sub-splines relative to their lengths.
- make_scalogram(t_left, t_right, cmaprange=None, resolution=1000, min_param_mag=0.0)[source]
Create a scalogram figure of the model parameters.
A scalogram shows the amplitude of each model parameter plotted over time and for all the different scales contained. Model parameters that have overlapping influence are also shown as overlapping. The height of each parameter’s patch is defined by the weight of that parameter relative to the other parameters (excluding splines that are not transient at that time).
- Parameters
t_left – Left boundary of the time axis.
t_right – Right boundary of the time axis.
cmaprange (default:
None
) – Maximum absolute amplitude of the color scale to use.None
defaults to the 95th percentile of the absolute amplitudes of all parameters.resolution (default:
1000
) – Number of points inside the time span to evaluate the scalogram at.min_param_mag (default:
0.0
) – The absolute value under which any value is plotted as zero.
- Returns
fig – Figure object of the scalogram.
ax – Axes object of the scalogram.
- Raises
NotImplementedError – If the generation method for the scalogram given the SplineSet’s spline class is not defined in this method yet.
- min_scale
Minimum scale of the sub-splines.
- read_parameters(parameters, covariances=None)[source]
Reads in the parameters \(\mathbf{m}\) (optionally also their variances) of all the sub-splines and stores them in the respective attributes.
Note that the main
SplineSet
object will still contain all the cross-spline covariances (if contained incovariances
), but the sub-splines cannot.- Parameters
parameters (
ndarray
) – Model parameters of shape \((\text{num_parameters}, \text{num_components})\).covariances (
Optional
[ndarray
], default:None
) – Model component (co-)variances that can either have the same shape asparameters
, in which case every parameter and component only has a variance, or it is square with dimensions \(\text{num_parameters} * \text{num_components}\), in which case it represents a full variance-covariance matrix.
- Return type
- splines
List of spline object contained within the SplineSet.
SplineSet
- class disstans.models.SplineSet(degree, t_center_start, t_center_end, time_unit='D', list_scales=None, list_num_knots=None, splineclass=<class 'disstans.models.ISpline'>, complete=True, regularize=True, **model_kw_args)[source]
Subclasses
BaseSplineSet
.Contains a list of splines that cover an entire timespan, and that share a common degree, but different center times and scales.
The set is constructed from a time span (
t_center_start
andt_center_end
) and numbers of centerpoints or length scales. By default (complete=True
), the number of splines and center points for each scale will then be chosen such that the resulting set of splines will be complete over the input time scale. This means it will contain all splines that are non-zero at least somewhere in the time span. Otherwise, the spline set will only have center times at or betweent_center_start
andt_center_end
.This class also sets the spacing equal to the scale.
- Parameters
degree (
int
) – Degree of the splines to be created.t_center_start (
str
|Timestamp
) – Time span start of the spline set.t_center_end (
str
|Timestamp
) – Time span end of the spline set.time_unit (
str
, default:'D'
) – Time unit of scale, spacing and model parameters.list_scales (
Optional
[list
[float
]], default:None
) – List of scales to use for each of the sub-splines. Mutually exclusive to settinglist_num_knots
.list_num_knots (
Optional
[list
[int
]], default:None
) – List of number of knots to divide the time span into for each of the sub-splines. Mutually exclusive to settinglist_scales
.splineclass (
BSpline
|ISpline
, default:<class 'disstans.models.ISpline'>
) – Model class to use for the splines.complete (
bool
, default:True
) – See usage description.
See
BaseSplineSet
andModel
for attribute descriptions and more keyword arguments.- complete
Sets whether the spline coverage of the time span is considered to be complete or not (see class documentation).
- degree
Degree of the splines.
- list_num_knots
List of number of knots the time span is divided into for each of the sub-splines.
- list_scales
List of scales of each of the sub-splines.
- splineclass
Class of the splines contained.
- t_center_end
Relevant time span end.
- t_center_start
Relevant time span start.
DecayingSplineSet
- class disstans.models.DecayingSplineSet(degree, t_center_start, list_scales, list_num_splines, time_unit='D', splineclass=<class 'disstans.models.ISpline'>, regularize=True, **model_kw_args)[source]
Subclasses
BaseSplineSet
.Contains a list of splines that cover a one-sided timespan, sharing a common degree, but with different center times and scales.
The set is constructed from a starting center time
t_center_start
and then a list of spline scales with associated number of splines. The splines are repeated by the spacing in positive time. Note that if you want to make the spline set start at the center time, you need to specifyt_start
manually.This class defaults to setting the spacing equal to the scale.
- Parameters
degree (
int
) – Degree of the splines to be created.t_center_start (
str
|Timestamp
) – First center time of the spline set.list_scales (
list
[float
]) – List of scales to use for each of the sub-splines.list_num_splines (
int
|list
[int
]) – Number of splines to create for each scale.time_unit (
str
, default:'D'
) – Time unit of scale, spacing and model parameters.splineclass (
BSpline
|ISpline
, default:<class 'disstans.models.ISpline'>
) – Model class to use for the splines.
See
Model
for attribute descriptions and more keyword arguments.- degree
Degree of the splines.
- list_num_splines
List of number of splines for each scale.
- list_scales
List of scales of each of the sub-splines.
- splineclass
Class of the splines contained.
- t_center_start
Relevant time span start.
AmpPhModulatedSinusoid
- class disstans.models.AmpPhModulatedSinusoid(period, degree, num_bases, t_start, t_end, t_reference=None, time_unit='D', obs_scale=2.0, regularize=True, **model_kw_args)[source]
Subclasses
Model
.This model defines a periodic sinusoid signal with a nominal frequency that allows the amplitude and phase (and therefore instantaneous frequency) to vary. This is accomplished by enabling the \(a\) and \(b\) parameters of the
Sinusoid
model to be time-varying. The functional form of these parameters is a full B-Spline basis set, defined bybasis_element()
(not aSplineSet
of purely cardinal splines).- Parameters
period (
float
) – Nominal period length \(T\) intime_unit
units.degree (
int
) – Degree \(p\) of the B-spline to be used.num_bases (
int
) – Number of basis functions in the B-Spline. Needs to be at least2
.obs_scale (
float
, default:2.0
) – Determines how many factors of the average scale should be sampled by thetimevector
input toget_mapping()
to accept an individual B-spline as observable.
See also
Sinusoid
For the definition of the functional form of the sinusoid.
- degree
Degree \(p\) of the B-Spline.
- get_inst_amplitude_phase(num_points=1000)[source]
Calculate the instantaenous (time-varying) amplitude and phase of the sinusoid over its entire fitted domain.
- Parameters
num_points (
int
, default:1000
) – Number of points to use in the discretized, normalized phase vector used in the evaluation of the basis functions. For plotting purposes, this value should be the length of the timeseries to which this model was fitted.- Return type
- Returns
amplitude – Amplitude timeseries for each point (rows) and component (columns).
phase – Phase timeseries in radians (with the same shape as
amplitude
).
- get_mapping_single(timevector)[source]
Calculate the mapping factors at times \(t\) as
\[\left( h_j (t) \cos \left( \omega t \right), h_j (t) \sin \left( \omega t \right) \right)\]where \(h_j\) are envelopes based on B-Splines calculated by
BSpline
, and \(\omega\) is the period.See Appendix A.2.4 in [koehne23] for more details.
- Parameters
timevector (
Series
|DatetimeIndex
) –Series
ofTimestamp
or alternatively aDatetimeIndex
containing the timestamps of each observation.- Return type
- Returns
Coefficients of the mapping matrix.
- num_bases
Number of basis functons in the B-Spline
- observability_scale
Observability scale factor.
- order
Order \(n=p+1\) of the B-Splines.
- period
Nominal period of the sinusoid.