Tutorial 1: The first synthetic station
=======================================

.. sidebar:: Download full script

   :download:`tutorial_1_simple_timeseries.py <../../scripts/tutorial_1_simple_timeseries.py>`

This tutorial will give the very first insight into the basic workings of DISSTANS.
We will have a look at the :class:`~disstans.timeseries.Timeseries`,
:class:`~disstans.models.Model`, and :class:`~disstans.station.Station` objects,
and show how standard Python, NumPy and Matplotlib procedures can be used directly
with DISSTANS.

.. contents:: Table of contents
    :local:

Building a Model collection
---------------------------

Let's create an empty synthetic station "TUT" located in Los Angeles using
:class:`~disstans.station.Station`:

.. doctest::

    >>> from disstans import Station
    >>> synth_stat = Station(name="TUT", location=(34.05, -118.25, 93.0))

Now, let's create a dictionary of :class:`~disstans.models.Model` objects which we would
like to use to create our synthetic timeseries. We'll start with a linear trend, an annual
sinusoid and some steps.

.. doctest::

    >>> import numpy as np
    >>> from disstans.models import Polynomial, Sinusoid, Step
    >>> mdl_secular = Polynomial(order=1, time_unit="D", t_reference="2000-01-01")
    >>> mdl_secular.read_parameters(np.array([-1, 5e-3]))
    >>> mdl_annual = Sinusoid(period=365.25, time_unit="D", t_reference="2000-01-01")
    >>> mdl_annual.read_parameters(np.array([0.3, 0]))
    >>> mdl_steps = Step(steptimes=["2000-03-01", "2000-10-10", "2000-10-15"])
    >>> mdl_steps.read_parameters(np.array([0.5, 0.2, -0.01]))

Let's combine the models in a dictionary for easy access later.

.. doctest::

    >>> collection = {"Secular": mdl_secular,
    ...               "Annual": mdl_annual,
    ...               "Steps": mdl_steps}

Next, we need the time span to evaluate our models over. We can use Pandas'
:func:`~pandas.date_range` for this very nicely.

.. doctest::

    >>> import pandas as pd
    >>> timevector = pd.date_range(start="2000-01-01", end="2000-12-31", freq="1D")

Now, we can evaluate the models, and sum them all up. :meth:`~disstans.models.Model.evaluate`
returns a dictionary and the modeled output can be found in the ``'fit'`` key:

.. doctest::

    >>> sum_models = np.zeros((timevector.size, 1))
    >>> for model_description, model in collection.items():
    ...     evaluated = model.evaluate(timevector)
    ...     sum_models += evaluated["fit"]

Creating Timeseries objects
---------------------------

A :class:`~disstans.timeseries.Timeseries` can now be created using the data we just
made up, and we can add it to our station:

.. doctest::

    >>> from disstans import Timeseries
    >>> synth_ts = Timeseries.from_array(timevector=timevector,
    ...                                  data=sum_models,
    ...                                  src="synthetic",
    ...                                  data_unit="m",
    ...                                  data_cols=["total"])
    >>> synth_stat["Data"] = synth_ts

.. note::

    ``synth_stat["Data"] = synth_ts`` is equivalent to
    ``synth_stat.add_timeseries("Data", synth_ts)``.

The ``data_unit``, ``data_cols`` and timeseries name are obviously a bit arbitrary
for synthetic data.

But oh no - we forgot to add noise! We can either

- go back to the beginning, change the ``sum_models`` NumPy array, recreate ``synth_ts``,
  and then replace the the timeseries saved in the station (replacing is just adding a new
  timeseries with an existing name),
- add the noise to the timeseries directly and replace the timeseries saved in the station, or
- just overwrite the timeseries' data.

They are all equivalent, but because the first doesn't show any new code, and the second
is literally just doing ``synth_ts = synth_ts + noise`` (where ``noise`` is a NumPy array
of the same shape as the timeseries), we're going to use the third way for the sake of
the tutorial:

.. doctest::

    >>> rng = np.random.default_rng(1)  # make this example reproducible
    >>> noise = rng.normal(scale=0.01, size=synth_stat["Data"].shape)
    >>> synth_stat["Data"].data += noise

Finally, let's have a look at the timeseries we created. DISSTANS provides a straightforward
interface to plot timeseries with Matplotlib::

    >>> import matplotlib.pyplot as plt
    >>> plt.plot(synth_stat["Data"].data)
    >>> plt.savefig("tutorial_1a.png")
    >>> plt.close()

This should give us an output like this:

.. image:: ../img/tutorial_1a.png

Fitting the models
------------------

We've successfully created a synthetic timeseries for a single station. Now, let's see
if a linear regression solver can estimate the model parameters we originally put
in. To this end, we must first associate our models with the timeseries at the station.

.. doctest::

    >>> for model_description, model in collection.items():
    ...     synth_stat.add_local_model("Data", model_description, model)

Under the hood, adding individual models will create a
:class:`~disstans.models.ModelCollection` object at the station, which keeps track
of all the different models. This collection object is necessary because once
we solve for the model parameters, there will be covariances between the different
models which are important to track for the uncertainty estimation.
The model collection object is accessed as ``synth_stat.models["Data"]``.

For the fitting, we first import a solver function from the :mod:`~disstans.solvers`
module, and then call it to give us a :class:`~disstans.solvers.Solution` object of
the fitted parameters.
We will start with basic, linear, non-regularized least-squares, which returns
a :class:`~disstans.solvers.Solution` object:

.. doctest::

    >>> from disstans.solvers import linear_regression
    >>> result = linear_regression(ts=synth_stat["Data"],
    ...                            models=synth_stat.models["Data"])

With the solver finished, we want to do multiple things for all models:

1. Give the models their best-fit parameters as returned by the solver using the
   :meth:`~disstans.models.ModelCollection.read_parameters` method.
2. Calculate the trajectory of each individual model given the best-fit parameters
   using the :meth:`~disstans.models.Model.evaluate` method.
3. Calculate the combined trajectory of all models together (i.e., the best fit to
   the timeseries) using the :meth:`~disstans.models.ModelCollection.evaluate` method.
   (One could also combine the timeseries produces in point 2 and sum them, but this
   will not work if we want to estimate the predicted fit covariance.)
4. Add the evaluated trajectories to the station as a "fit" using the
   :meth:`~disstans.station.Station.add_fit` method.

We can do this in the following way:

.. doctest::

    >>> # quick access to the model collection
    >>> stat_coll = synth_stat.models["Data"]
    >>> # give the model collection the best-fit parameters and covariances
    >>> stat_coll.read_parameters(result.parameters, result.covariances)
    >>> # evaluate each individual model and add as a fit
    >>> for model_description in stat_coll.model_names:
    ...     modeled = stat_coll[model_description].evaluate(timevector)
    ...     fit_ts = synth_stat.add_fit(ts_description="Data",
    ...                                 fit=modeled,
    ...                                 model_description=model_description)
    >>> # evaluate the entire model collection at once
    >>> modeled = stat_coll.evaluate(timevector)
    >>> fit_ts = synth_stat.add_fit(ts_description="Data", fit=modeled)

Not providing a model description to :meth:`~disstans.station.Station.add_fit`
implies that this is the fit of all models jointly.

.. note::

    Note that in this process, the model parameters we set in the beginning
    to each model object instance (e.g. ``mdl_secular``) will be
    overwritten. If we want to keep them, we can either create the models again
    without giving them parameters, or create a (deep) copy of the model dictionary.

Lastly, we want to make a timeseries object from the joint model fit for plotting purposes.
We can do that by directly accessing :attr:`~disstans.models.FitCollection.allfits`:

.. doctest::

    >>> synth_stat.add_timeseries(ts_description="Modeled",
    ...                           timeseries=synth_stat.fits["Data"].allfits,
    ...                           override_src="model", override_data_cols=synth_ts.data_cols)

.. note::

    It should be pointed out here that this is a lot of code that doesn't need to be
    repeated by the user every time, but is intended to illustrate what is happening
    under the hood. For ease of use, the :class:`~disstans.network.Network` class
    can do all of these things with its two methods :meth:`~disstans.network.Network.fit`
    and :meth:`~disstans.network.Network.evaluate`.

Plotting the fit and residuals
------------------------------

At this point, we first want to have a look at the fit to see if everything worked
before actually looking at the parameters themselves or residuals. The ``evaluate``
function already created a new timeseries object on the way, which we can now plot::

    >>> plt.plot(synth_stat["Data"].data, label="Data")
    >>> plt.plot(synth_stat["Modeled"].data, label="Modeled")
    >>> plt.legend()
    >>> plt.savefig("tutorial_1b.png")
    >>> plt.close()

As we can see, the model matches the data pretty well:

.. image:: ../img/tutorial_1b.png

Lastly, let's calculate the residual and some statistics on it. Timeseries objects
can directly be used in (basic) mathematical operations, with the output being a new
Timeseries object that can be added to the station.

.. doctest::

    >>> synth_stat["Residual"] = synth_stat["Data"] - synth_stat["Modeled"]
    >>> stats_dict = synth_stat.analyze_residuals(ts_description="Residual",
    ...                                           mean=True, std=True, verbose=True)
    TUT: Residual          Mean  Standard Deviation
    total-total    5.342115e-14            0.009065

And we can plot it like before::

    >>> plt.plot(synth_stat["Residual"].data)
    >>> plt.savefig("tutorial_1c.png")
    >>> plt.close()

.. image:: ../img/tutorial_1c.png

.. note::

    This is very simple plotting. The :class:`~disstans.network.Network` class provides
    some better plotting tools to use with larger networks or stations with more
    timeseries which we will explore later. This is again just intended to show how
    the data is accessible.