Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python

PyPI GitHub Static Static

GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (incubating).

GluonTS provides utilities for loading and iterating over time series datasets, state of the art models ready to be trained, and building blocks to define your own models and quickly experiment with different solutions.

Installation

GluonTS requires Python 3.6, and the easiest way to install it is via pip:

pip install --upgrade mxnet~=1.7 gluonts

Dockerfiles

Dockerfiles compatible with Amazon Sagemaker can be found in the examples/dockerfiles folder.

Quick start guide

This simple example illustrates how to train a model from GluonTS on some data, and then use it to make predictions. As a first step, we need to collect some data: in this example we will use the volume of tweets mentioning the AMZN ticker symbol.

import pandas as pd
url = "https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv"
df = pd.read_csv(url, header=0, index_col=0)

The first 100 data points look like follows:

import matplotlib.pyplot as plt
df[:100].plot(linewidth=2)
plt.grid(which='both')
plt.show()

Data

We can now prepare a training dataset for our model to train on. Datasets in GluonTS are essentially iterable collections of dictionaries: each dictionary represents a time series with possibly associated features. For this example, we only have one entry, specified by the "start" field which is the timestamp of the first datapoint, and the "target" field containing time series data. For training, we will use data up to midnight on April 5th, 2015.

from gluonts.dataset.common import ListDataset
training_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-05 00:00:00"]}],
    freq = "5min"
)

A forecasting model in GluonTS is a predictor object. One way of obtaining predictors is by training a correspondent estimator. Instantiating an estimator requires specifying the frequency of the time series that it will handle, as well as the number of time steps to predict. In our example we're using 5 minutes data, so freq="5min", and we will train a model to predict the next hour, so prediction_length=12. We also specify some minimal training options.

from gluonts.model.deepar import DeepAREstimator
from gluonts.mx.trainer import Trainer

estimator = DeepAREstimator(freq="5min", prediction_length=12, trainer=Trainer(epochs=10))
predictor = estimator.train(training_data=training_data)

During training, useful information about the progress will be displayed. To get a full overview of the available options, please refer to the documentation of DeepAREstimator (or other estimators) and Trainer.

We're now ready to make predictions: we will forecast the hour following the midnight on April 15th, 2015.

test_data = ListDataset(
    [{"start": df.index[0], "target": df.value[:"2015-04-15 00:00:00"]}],
    freq = "5min"
)

from gluonts.dataset.util import to_pandas

for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
    to_pandas(test_entry)[-60:].plot(linewidth=2)
    forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.grid(which='both')

Forecast

Note that the forecast is displayed in terms of a probability distribution: the shaded areas represent the 50% and 90% prediction intervals, respectively, centered around the median (dark green line).

Further examples

The following are good entry-points to understand how to use many features of GluonTS:

The following modules illustrate how custom models can be implemented:

Contributing

If you wish to contribute to the project, please refer to our contribution guidelines.

Citing

If you use GluonTS in a scientific publication, we encourage you to add the following references to the related papers:

@article{gluonts_jmlr,
  author  = {Alexander Alexandrov and Konstantinos Benidis and Michael Bohlke-Schneider
    and Valentin Flunkert and Jan Gasthaus and Tim Januschowski and Danielle C. Maddix
    and Syama Rangapuram and David Salinas and Jasper Schulz and Lorenzo Stella and
    Ali Caner Türkmen and Yuyang Wang},
  title   = {{GluonTS: Probabilistic and Neural Time Series Modeling in Python}},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {116},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/19-820.html}
}
@article{gluonts_arxiv,
  author  = {Alexandrov, A. and Benidis, K. and Bohlke-Schneider, M. and
    Flunkert, V. and Gasthaus, J. and Januschowski, T. and Maddix, D. C.
    and Rangapuram, S. and Salinas, D. and Schulz, J. and Stella, L. and
    Türkmen, A. C. and Wang, Y.},
  title   = {{GluonTS: Probabilistic Time Series Modeling in Python}},
  journal = {arXiv preprint arXiv:1906.05264},
  year    = {2019}
}

Video

Further Reading

Overview tutorials

Introductory material

Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Comments
  • Remove mandatory `freq` attribute of `Predictor`.

    Remove mandatory `freq` attribute of `Predictor`.

    Issue #, if available:

    Description of changes:

    Follow-up changes to #1997

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  • Implemented model iteration averaging to reduce model variance

    Implemented model iteration averaging to reduce model variance

    Issue #, if available:

    Description of changes:

    1. In model_iteration_averaging.py, implemented model averaging across iterations during training instead of epochs after training
    2. Implemented 3 different averaging triggers: NTA (NTA_V1 is the ICLR version: https://openreview.net/pdf?id=SyyGPP0TZ, NTA_V2 is the arxiv version: https://arxiv.org/pdf/1708.02182.pdf), and Alpha Suffix (https://arxiv.org/pdf/1109.5647.pdf)
    3. Integrated both epoch averaging and iteration averaging in Trainer (mx/trainer/_base.py)
    4. Wrote test in test/trainer/test_model_iteration_averaging.py

    The overall goal is to reduce the model variance. We test iteration averaging on DeepAR anomaly detection (examples\anomaly_detection.py, electricity data) We train the model with 20 different random seeds, and report the variance on the same batch of target sequences (take variance on each timestamp, and then take the average over the entire sequence and all samples) The results are as follows: | | n or alpha | var | var/mean | std | std/mean | RMSE | |-----------------|--------------|---------|------------|---------|------------|---------| | SelectNBestMean | 1 | 9552.24 | 0.508395 | 22.5279 | 0.0318269 | 414.924 | | SelectNBestMean | 5 | 8236.13 | 0.41966 | 19.9947 | 0.0253164 | 411.92 | | NTA_V1 | 5 | 5888.36 | 0.387781 | 16.7624 | 0.0253107 | 412.792 | | NTA_V2 | 5 | 6422.11 | 0.394004 | 17.7947 | 0.0237186 | 416.328 | | Alpha_Suffix | 0.2 | 5877.92 | 0.384664 | 16.6868 | 0.030484 | 408.711 | | Alpha_Suffix | 0.4 | 5814.86 | 0.378298 | 16.6081 | 0.0290987 | 409.952 |

    Although we haven't tuned the hyperparameters, we've already obtained smaller variance and better RMSE.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

  • Predictions are way too high when modeling an intermittent count series with DeepAR and NegBin distribution?

    Predictions are way too high when modeling an intermittent count series with DeepAR and NegBin distribution?

    I'm trying to model a simulated series of weekly seasonal intermittent sales, with values between 0 and 4. I generated 5 years of simulated data:

    Screen Shot 2020-02-13 at 1 30 59 PM

    I trained a DeepAR model with the output distribution set to Negative Binomial (all other settings were the default settings), on 3 years, and generated predictions for the next two. I got the following results (plotting the [70.0, 80.0, 95.0] predictions intervals):

    Screen Shot 2020-02-13 at 1 31 50 PM

    Increasing number of training epochs doesn't change anything, the loss falls to its lowest value around the 8th to 10th epoch and hovers more or less around there, whether I train for 10 or 100 epochs. I thought training on 3 years and testing on 2 might be too ambitious, so I tried 4y/1y split instead, and the results got much worse - and downright strange - this time with values climbing into the 100s, even though the largest historical value the series ever reaches is 4 (I'm using the same input series, but is seems flat now because the scale is completely skewed by how large the predictions are):

    Screen Shot 2020-02-13 at 3 55 13 PM

    I'm wondering if I am doing anything wrong? Are there any special settings for DeepAR when applied to intermittent series?

    For comparison, the DeepAREstimator worked pretty well out of the box for more traditional series (using Student's distribution), for example:

    Screen Shot 2020-02-12 at 4 49 47 PM

    Details:

    Train data: [{'start': Timestamp('2014-01-05 00:00:00', freq='W-SUN'), 'target': array([1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 2., 0., 0., 1., 2., 2., 1., 4., 1., 2., 1., 0., 0., 2., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 2., 1., 2., 0., 1., 1., 2., 3., 2., 2., 1., 1., 3., 4., 1., 1., 0., 0., 3., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 2., 1., 1., 0., 1., 0., 1., 2., 2., 1., 2., 3., 3., 1., 2., 2., 0., 0., 2., 0., 3., 0., 1., 2., 0., 1., 1.], dtype=float32), 'source': SourceContext(source='list_data', row=1)}]

    Test data: {'start': Timestamp('2017-01-08 00:00:00', freq='W-SUN'), 'target': array([2., 1., 2., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 1., 0., 1., 2., 3., 1., 0., 3., 2., 1., 0., 0., 2., 2., 2., 1., 0., 2., 0., 2., 2., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 2., 0., 0., 4., 1., 2., 2., 1., 3., 1., 2., 1., 2., 1., 2., 3., 3., 1., 2., 0.], dtype=float32), 'source': SourceContext(source='list_data', row=1)}

    Estimator used:

    estimator = DeepAREstimator(freq="W", prediction_length=105, trainer=Trainer(epochs=10),distr_output=NegativeBinomialOutput()) predictor = estimator.train(training_data=training_data)

  • PyTorch implementation of DeepAR

    PyTorch implementation of DeepAR

    Work in progress, open for comments.

    This ports the PyTorch implementation of DeepAR from PyTorchTS (cc @kashif), with some changes:

    • The estimator class was slightly refactored, and in particular the way data loaders are set up is more in line with other estimators (but I want to try out a few things here, this is giving me some thoughts)
    • No specific "trainer" class was implemented, and instead the estimator relies on PyTorch Lightning for this.
    • The network is now down to a single class implementing both loss computation and sample paths prediction, following torch's .training convention
    • A thin extension to the network provides the interface used by Lightning

    A few surrounding, related changes are also included.

    Some open questions:

    1. should the dtype and device be specified at constructor time for the estimator? Or is it something we want to pass to the train method?
    2. the base estimator class is really PyTorch Lightning oriented: should it be called PyTorchLightningEstimator?
    3. we would now have gluonts.model containing existing models (mxnet based) and gluonts.torch.model containing this one; should the mxnet ones moved to gluonts.mx.model for the sake of symmetry?

    TODOs (partial list, probably):

    • [x] cover also validation data in tests
    • [x] remove the input_size parameter from the estimator (this should probably be inferred from the other ones)
    • [x] re-include the option to pseudo-shuffle batches at training time
    • [x] improve tests (also serde and so on)
    • [x] open issue on the left-over features of the model, and make it release-blocking
    • [x] run experiments to check the model accuracy

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  • potential bottleneck in training

    potential bottleneck in training

    Description

    I profiled my training which was taking too long and here is what I believe the part that is taking the longest:

    Profile stats for: run_training_epoch
             309984 function calls (302234 primitive calls) in 82.921 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   82.921   82.921 base.py:168(run)
        520/2    0.000    0.000   82.921   41.460 {built-in method builtins.next}
            2    0.000    0.000   82.792   41.396 fetching.py:271(_fetch_next_batch)
            4    0.000    0.000   82.792   20.698 apply_func.py:73(apply_to_collection)
            2    0.000    0.000   82.792   41.396 supporters.py:547(__next__)
            2    0.000    0.000   82.792   41.396 supporters.py:555(request_next_batch)
            2    0.000    0.000   82.792   41.396 itertools.py:174(__iter__)
            2    0.000    0.000   82.792   41.396 dataloader.py:639(__next__)
            2    0.001    0.000   82.792   41.396 dataloader.py:680(_next_data)
            2    0.001    0.001   82.791   41.395 fetch.py:24(fetch)
          512    0.000    0.000   82.777    0.162 util.py:140(__iter__)
      992/512    0.001    0.000   82.777    0.162 _base.py:102(__iter__)
     5312/512    0.018    0.000   82.776    0.162 _base.py:121(__call__)
          512    0.005    0.000   82.773    0.162 _base.py:174(__call__)
          479    0.001    0.000   81.289    0.170 itertools.py:68(__iter__)
          479    0.013    0.000   78.146    0.163 feature.py:354(map_transform)
          479    0.014    0.000   76.209    0.159 feature.py:367(<listcomp>)
         2395    0.026    0.000   73.799    0.031 extension.py:67(fget)
        14851    0.015    0.000   73.575    0.005 {built-in method builtins.getattr}
         2395    0.020    0.000   73.559    0.031 period.py:97(f)
         2395   73.505    0.031   73.505    0.031 {pandas._libs.tslibs.period.get_period_field_arr}
            1    0.000    0.000   42.221   42.221 training_epoch_loop.py:157(advance)
    ...
    

    To reproduce kindly train the pytorch DeepAR estimator with:

    ...
    trainer_kwargs=dict(..., profiler="advanced"),
    ...
    

    and train with num_workers=0

  • Add `TimeLimitCallback` to `mx/trainer` callbacks.

    Add `TimeLimitCallback` to `mx/trainer` callbacks.

    Issue #, if available:

    Description of changes: Add TimelimitCallback so that user can set a time limit to the training process.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  • Predict for future date without target value

    Predict for future date without target value

    I need to predict for future dates with some dates missing in between the training date and the date I wan to predict. So I wont be having any target values. When I use Nan for target series, My forecast is mostly on 0.

  • Use `pd.Period` instead of `pd.Timestamp`.

    Use `pd.Period` instead of `pd.Timestamp`.

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  • Multiprocessing data loader.

    Multiprocessing data loader.

    Issue, if available: With a multiprocessing data loader we should overcome data loading bottlenecks. Will fix issue https://github.com/awslabs/gluon-ts/issues/682.

    Description of changes:

    • Datasets use the class attributes of MPWorkerInfo to get information about their multiprocessing environment.
    • Datasets are replicated among workers (only Object reference though, not the physical dataset), this happens exactly one in the beginning of the training
    • Datasets are not cached by default (caching not implemented so far)
    • Data loading can now be done in a multiprocessing fashion by specifying the number of workers, this works for training set and validation set (for inference there is some bug for now, but that has the least impact on performance of all datasets)
    • Parallelisation for datasets: modulo based, i.e. every num_workerth ts will be assigned to the corresponding worker. // however, this does not guarantee that the batches will always be sampled from equidistant locations for training, since some workers could potentially be slower or faster
    • The data loaders return batches of transformed samples of batch_size in the requested context. The transformation is done according to the provided transformation.
    • There is no threading support (wouldn't make sense since we are also doing computation heavy transformations), there is no memory pinning support (not necessary since we load the batches into the right context right away)
    • Which exact batches and samples one gets is nondeterministic if num_workers > 0

    Future extensibility:

    • the main functions to modify will be the batching function and the stacking function, the transformation can already be replaced to any that produces a list of samples if applied to a dataset
    • any dataset that makes use of the MPWorkerInfo class can be effectively parallelelized

    Missing functionality:

    • Shuffling (beyond a single batch)
    • Dataset caching
    • Correct documentation

    Current bugs:

    • No mp support for windows due to pickling error.
    • No mp support for InferenceDataLoader due to pickling error.

    Possible improvements

    • Create named tuple for all the different data the worker processes use
    • Only pass subset of dataset to worker
    • Switch away from Pool to something that allows for more fine-grained control, like manually creating Processes as seen in Pytorch's data-loader or make use of libraries like Ray using Actors

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

  • Multivariate time series forecasting question

    Multivariate time series forecasting question

    My apologies for the ignorant questions in advance, while I’m not necessarily new to deep learning, I’m a new fairly new to time series forecasting, especially when using deep learning techniques for it.

    Due to the fact gluon-ts is making use of DL based approaches, dealing with non-stationarity in training datasets is not necessary, unlike when using AR/MA and VAR based models, correct? This appears to be outlined here.

    Also, I am working with a multivariate time series dataset in which the target/dependent variable is related and/or dependent on other features/independent variables. So, while I’m only trying to predict one target variable, the relationship between this target variable and the other features is important; consequently, this leads to two questions.

    First, since the relationship between the target variable and other features is important, are the most applicable models deepvar and gpvar or will other models in gluon-ts work and I’m just thinking too much in terms of classical time series forecasting?

    Second, if I’m using deepvar or gpvar, I’m assuming that when making the dataset, the target should be a vector of vectors which include my target variable and the other features, right? However, if I’m thinking too much in terms of classical time series forecasting, target should be a vector of the target variable and I should store the other features as vectors of vectors in either dynamic_feat or cat, right?

    Again, I’m sorry for my ignorance. Thanks in advance for any assistance you provide.

  • Introducing baseline models. Includes new Naive 2 model, new OWA metric, and a lot of refactoring.

    Introducing baseline models. Includes new Naive 2 model, new OWA metric, and a lot of refactoring.

    Added the Naïve 2 model as well as the OWA (overall weighted average) metric as described in the m4 competition: here to GluonTS.

    The naive seasonal model has been moved to m_competitions as well, since it shares so many similarities with the naive 2 model, and is an integral part of the m competitions. Both model only consist of a predictor.

    For the Naïve 2 model I followed the R implementation that can be found here. However, I optimized performance in a few places (no model impact). I also added a potential TODO optimization with marginal model impact.

    There is an additional requirement: statsmodels~=0.11

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

  • Breaking: Move MXNet-based models to `mx.model`.

    Breaking: Move MXNet-based models to `mx.model`.

    Closes: #2117

    Description of changes: move origin models from gluonts.model to gluonts.mx.model and use from gluonts.mx import xxEstimator

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  • Manual api docs

    Manual api docs

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  • Move MXNet based models from `gluonts.model` to `gluonts.mx`.

    Move MXNet based models from `gluonts.model` to `gluonts.mx`.

    When we started, we put all models into gluonts.model. However, when adding support for PyTorch, we started to put these models into gluonts.torch.model, since they were re-implementations of existing models.

    Ideally, we replicate this setup for mxnet based models, and move all these models to gluonts.mx.model.

    Further we can then import models directly from mx:

    from gluonts.mx import DeepAREstimator
    

    For v0.11 we add a deprecation warning and remove the warning in v0.12.

  • using pandasdataset on dataset repositories

    using pandasdataset on dataset repositories

    Issue #, if available:

    Description of changes: Using PandasDataset to handle with dataset repositories.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

  •  make_evaluation_predictions() not producing proper forecasts when when DeepAR is trained with custom Tweedie distribution output

    make_evaluation_predictions() not producing proper forecasts when when DeepAR is trained with custom Tweedie distribution output

    Hi,

    I have trained a DeepAR model using a custom Tweedie output distribution. But I am not getting any output when I try to get the predictions using make_evaluation_predictions(). I am testing the code on the M5 dataset

    Could someone kindly help me with the same?

    To Reproduce

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import json
    import os
    from tqdm import tqdm
    from pathlib import Path
    from gluonts.dataset.common import load_datasets, ListDataset
    from gluonts.dataset.field_names import FieldName
    
    from gluonts.torch.model.deepar import DeepARModel
    from gluonts.torch.model.deepar.estimator import DeepAREstimator
    from gluonts.torch.modules.distribution_output import NegativeBinomialOutput
    import pytorch_lightning as pl
    import torch
    import numpy as np
    from torch.distributions.distribution import Distribution
    from gluonts.torch.modules.distribution_output import DistributionOutput
    import torch.nn.functional as F
    from typing import Callable, Dict, Optional, Tuple, List
    from gluonts.evaluation.backtest import make_evaluation_predictions
    
    single_prediction_length = 28
    submission_prediction_length = single_prediction_length * 2
    m5_input_path="./Data/m5-forecasting-accuracy"
    submission=False
    
    if submission:
        prediction_length = submission_prediction_length
    else:
        prediction_length = single_prediction_length
    
    calendar = pd.read_csv(m5_input_path+'/calendar.csv')
    sales_train_validation = pd.read_csv(m5_input_path+'/sales_train_validation.csv')
    sample_submission = pd.read_csv(m5_input_path+'/sample_submission.csv')
    sell_prices = pd.read_csv(m5_input_path+'/sell_prices.csv')
    
    cal_features = calendar.drop(
        ['date', 'wm_yr_wk', 'weekday', 'wday', 'month', 'year', 'event_name_1', 'event_name_2', 'd'], 
        axis=1
    )
    cal_features['event_type_1'] = cal_features['event_type_1'].apply(lambda x: 0 if str(x)=="nan" else 1)
    cal_features['event_type_2'] = cal_features['event_type_2'].apply(lambda x: 0 if str(x)=="nan" else 1)
    
    test_cal_features = cal_features.values.T
    if submission:
        train_cal_features = test_cal_features[:,:-submission_prediction_length]
    else:
        train_cal_features = test_cal_features[:,:-submission_prediction_length-single_prediction_length]
        test_cal_features = test_cal_features[:,:-submission_prediction_length]
    
    test_cal_features_list = [test_cal_features] * len(sales_train_validation)
    train_cal_features_list = [train_cal_features] * len(sales_train_validation)
    
    state_ids = sales_train_validation["state_id"].astype('category').cat.codes.values
    state_ids_un , state_ids_counts = np.unique(state_ids, return_counts=True)
    
    store_ids = sales_train_validation["store_id"].astype('category').cat.codes.values
    store_ids_un , store_ids_counts = np.unique(store_ids, return_counts=True)
    
    cat_ids = sales_train_validation["cat_id"].astype('category').cat.codes.values
    cat_ids_un , cat_ids_counts = np.unique(cat_ids, return_counts=True)
    
    dept_ids = sales_train_validation["dept_id"].astype('category').cat.codes.values
    dept_ids_un , dept_ids_counts = np.unique(dept_ids, return_counts=True)
    
    item_ids = sales_train_validation["item_id"].astype('category').cat.codes.values
    item_ids_un , item_ids_counts = np.unique(item_ids, return_counts=True)
    
    stat_cat_list = [item_ids, dept_ids, cat_ids, store_ids, state_ids]
    
    stat_cat = np.concatenate(stat_cat_list)
    stat_cat = stat_cat.reshape(len(stat_cat_list), len(item_ids)).T
    
    stat_cat_cardinalities = [len(item_ids_un), len(dept_ids_un), len(cat_ids_un), len(store_ids_un), len(state_ids_un)]
    
    
    
    train_df = sales_train_validation.drop(["id","item_id","dept_id","cat_id","store_id","state_id"], axis=1)
    train_target_values = train_df.values
    
    if submission == True:
        test_target_values = [np.append(ts, np.ones(submission_prediction_length) * np.nan) for ts in train_df.values]
    else:
        test_target_values = train_target_values.copy()
        train_target_values = [ts[:-single_prediction_length] for ts in train_df.values]
    
    m5_dates = [pd.Timestamp("2011-01-29", freq='1D') for _ in range(len(sales_train_validation))]
    
    train_ds = ListDataset([
        {
            FieldName.TARGET: target,
            FieldName.START: start,
            FieldName.FEAT_DYNAMIC_REAL: fdr,
            FieldName.FEAT_STATIC_CAT: fsc
        }
        for (target, start, fdr, fsc) in zip(train_target_values,
                                             m5_dates,
                                             train_cal_features_list,
                                             stat_cat)
    ], freq="D")
    
    test_ds = ListDataset([
        {
            FieldName.TARGET: target,
            FieldName.START: start,
            FieldName.FEAT_DYNAMIC_REAL: fdr,
            FieldName.FEAT_STATIC_CAT: fsc
        }
        for (target, start, fdr, fsc) in zip(test_target_values,
                                             m5_dates,
                                             test_cal_features_list,
                                             stat_cat)
    ], freq="D")
    
    def est_lambda(mu, p):
        return mu ** (2 - p) / (2 - p)
    
    def est_alpha(p):
        return (2 - p) / (p - 1)    
    
    def est_beta(mu, p):
        return mu ** (1 - p) / (p - 1)
    
    
    class Tweedie(Distribution):
        r"""
        Creates a Tweedie distribution, i.e. distribution
        
        Args:    
            log_mu (Tensor): log(mean)
            rho (Tensor): tweedie_variance_power (1 ~ 2)
        """
    
        def __init__(self, log_mu, rho, validate_args=None):
            self.log_mu = log_mu
            self.rho = rho
    
            batch_shape = log_mu.size()
            super(Tweedie, self).__init__(batch_shape, validate_args=validate_args)
    
        @property
        def mean(self):
            return torch.exp(self.log_mu)
    
        @property
        def variance(self):
            return torch.ones_like(self.log_mu) #TODO need to be assigned
    
        def sample(self, sample_shape=torch.Size()):
            shape = self._extended_shape(sample_shape)
    
            mu = self.mean
            p = self.rho
            phi = 1 #TODO
    
            
            rate = est_lambda(mu, p) / phi     #rate for poisson
            alpha = est_alpha(p)             #alpha for Gamma distribution
            beta = est_beta(mu, p) / phi     #beta for Gamma distribution
    
            N = torch.poisson(rate)
    
            gamma = torch.distributions.gamma.Gamma(N*alpha, beta)
            samples = gamma.sample()
            samples[N==0] = 0
    
            return samples
    
        def log_prob(self, y_true):
            rho = self.rho
            y_pred = self.log_mu
    
            a = y_true * torch.exp((1 - rho) * y_pred) / (1 - rho)
            b = torch.exp((2 - rho) * y_pred) / (2 - rho)
    
            return a - b
    
        @property
        def args(self) -> List:
            return [self.log_mu, self.rho]
    
    class TweedieOutput(DistributionOutput):
        args_dim: Dict[str, int] = {"log_mu": 1, "rho": 1} #, "dispersion": 1} TODO: add dispersion
    
    
        def domain_map(self, log_mu, rho):
            rho = 1.5 * torch.ones_like(log_mu)
    
            return log_mu.squeeze(-1), rho.squeeze(-1)
    
        def distribution(
            self, distr_args, scale: Optional[torch.Tensor] = None
        ) -> Tweedie:
            log_mu, rho = distr_args
    
    
            return Tweedie(log_mu, rho)
    
        @property
        def event_shape(self) -> Tuple:
            return ()
    
    
    deepar = DeepAREstimator(
        prediction_length=prediction_length,
        freq="D",
        distr_output = TweedieOutput(),
        num_feat_dynamic_real=5,
        num_feat_static_cat=5,
        cardinality=stat_cat_cardinalities,
        batch_size=32,
        trainer_kwargs = {
            
            'auto_lr_find':True,
            'max_epochs':20
        }
    )
            
    
    model = deepar.train(train_ds)
    
    forecast_it, ts_it = make_evaluation_predictions(
        dataset=test_ds,
        predictor=model,
        num_samples=100
    )
    
    print("Obtaining time series conditioning values ...")
    tss = list(tqdm(ts_it, total=len(test_ds)))
    print("Obtaining time series predictions ...")
    forecasts = list(tqdm(forecast_it, total=len(test_ds)))
    
    

    Error message or code output

    ValueError                                Traceback (most recent call last)
    /Users/poulamisarkar/Documents/TUM/sem2/seminar/glutons /m5_gluonts_template.ipynb Cell 27' in <cell line: 12>()
         10 tss = list(tqdm(ts_it, total=len(test_ds)))
         11 print("Obtaining time series predictions ...")
    ---> 12 forecasts = list(tqdm(forecast_it, total=len(test_ds)))
    
    File ~/opt/anaconda3/envs/forcasting/lib/python3.9/site-packages/tqdm/std.py:1195, in tqdm.__iter__(self)
       1192 time = self._time
       1194 try:
    -> 1195     for obj in iterable:
       1196         yield obj
       1197         # Update and possibly print the progressbar.
       1198         # Note: does not call self.update(1) for speed optimisation.
    
    File ~/opt/anaconda3/envs/forcasting/lib/python3.9/site-packages/gluonts/torch/model/predictor.py:82, in PyTorchPredictor.predict(self, dataset, num_samples)
         79 self.prediction_net.eval()
         81 with torch.no_grad():
    ---> 82     yield from self.forecast_generator(
         83         inference_data_loader=inference_data_loader,
         84         prediction_net=self.prediction_net,
         85         input_names=self.input_names,
         86         freq=self.freq,
         87         output_transform=self.output_transform,
         88         num_samples=num_samples,
    ...
            [1.],
            ...,
            [1.],
            [1.],
            [2.]])
    

    Environment

    • Operating system: MACOS
    • Python version: 3.9.12
    • GluonTS version: 0.9.4
A python library for Bayesian time series modeling
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Jun 25, 2022
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Jun 19, 2022
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Jun 17, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

Jun 29, 2022
A python library for easy manipulation and forecasting of time series.
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Jul 2, 2022
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

Jul 4, 2022
A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

Jun 22, 2022
Python module for machine learning time series:

seglearn Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extr

Jun 16, 2022
A Python toolkit for rule-based/unsupervised anomaly detection in time series

Anomaly Detection Toolkit (ADTK) Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. As

Jun 27, 2022
AtsPy: Automated Time Series Models in Python (by @firmai)
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Jun 26, 2022
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

Nov 24, 2021
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

Jun 28, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

Jun 28, 2022
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Jun 29, 2022
Automatic extraction of relevant features from time series:
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Jul 2, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

Jun 26, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

Jun 29, 2022
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

Jul 1, 2022
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

Auto_TS: Auto_TimeSeries Automatically build multiple Time Series models using a Single Line of Code. Now updated with Dask. Auto_timeseries is a comp

Jul 5, 2022