Greykite: A flexible, intuitive and fast forecasting library

Greykite: A flexible, intuitive and fast forecasting library

Why Greykite?

The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

Silverkite algorithm works well on most time series, and is especially adept for those with changepoints in trend or seasonality, event/holiday effects, and temporal dependencies. Its forecasts are interpretable and therefore useful for trusted decision-making and insights.

The Greykite library provides a framework that makes it easy to develop a good forecast model, with exploratory data analysis, outlier/anomaly preprocessing, feature extraction and engineering, grid search, evaluation, benchmarking, and plotting. Other open source algorithms can be supported through Greykite’s interface to take advantage of this framework, as listed below.

For a demo, please see our quickstart and pythondig.

Distinguishing Features

  • Flexible design
    • Provides time series regressors to capture trend, seasonality, holidays, changepoints, and autoregression, and lets you add your own.
    • Fits the forecast using a machine learning model of your choice.
  • Intuitive interface
    • Provides powerful plotting tools to explore seasonality, interactions, changepoints, etc.
    • Provides model templates (default parameters) that work well based on data characteristics and forecast requirements (e.g. daily long-term forecast).
    • Produces interpretable output, with model summary to examine individual regressors, and component plots to visually inspect the combined effect of related regressors.
  • Fast training and scoring
    • Facilitates interactive prototyping, grid search, and benchmarking. Grid search is useful for model selection and semi-automatic forecasting of multiple metrics.
  • Extensible framework
    • Exposes multiple forecast algorithms in the same interface, making it easy to try algorithms from different libraries and compare results.
    • The same pipeline provides preprocessing, cross-validation, backtest, forecast, and evaluation with any algorithm.

Algorithms currently supported within Greykite’s modeling framework:

Notable Components

Greykite offers components that could be used within other forecasting libraries or even outside the forecasting context.

  • ModelSummary() - R-like summaries of scikit-learn and statsmodels regression models.
  • ChangepointDetector() - changepoint detection based on adaptive lasso, with visualization.
  • SimpleSilverkiteForecast() - Silverkite algorithm with forecast_simple and predict methods.
  • SilverkiteForecast() - low-level interface to Silverkite algorithm with forecast and predict methods.

Usage Examples

You can obtain forecasts with only a few lines of code:

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum

# df = ...  # your input timeseries!
metadata = MetadataParam(
    time_col="ts",     # time column in `df`
    value_col="y"      # value in `df`
)
forecaster = Forecaster()  # creates forecasts and stores the result
forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         # uses the SILVERKITE model template parameters
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )
# Access the result
forecaster.forecast_result
# ...

For a demo, please see our quickstart.

Setup and Installation

Greykite is available on Pypi and can be installed with pip:

pip install greykite

For more installation tips, see installation.

Documentation

Please find our full documentation here.

Learn More

Citation

Please cite Greykite in your publications if it helps your research:

@misc{reza2021greykite-github,
  author = {Reza Hosseini and
            Albert Chen and
            Kaixu Yang and
            Sayan Patra and
            Rachit Arora},
  title  = {Greykite: a flexible, intuitive and fast forecasting library},
  url    = {https://github.com/linkedin/greykite},
  year   = {2021}
}

License

Copyright (c) LinkedIn Corporation. All rights reserved. Licensed under the BSD 2-Clause License.

Comments
  • Why pin runtime dependencies so tightly?

    Why pin runtime dependencies so tightly?

    Hi,

    Looking at the setup.py file, it looks like the following are all required runtime dependencies, all of which need to be pinned very precisely:

    requirements = [    "Cython==0.29.23",    "cvxpy==1.1.12",    "fbprophet==0.5",    "holidays==0.9.10",  # 0.10.2,    "ipykernel==4.8.2",    "ipython==7.1.1",    "ipywidgets==7.2.1",    "jupyter==1.0.0",    "jupyter-client==6.1.5",    "jupyter-console==6.",  # used version 6 to avoid conflict with ipython version    "jupyter-core==4.7.1",    "matplotlib==3.4.1",    "nbformat==5.1.3",    "notebook==5.4.1",    "numpy==1.20.2",    "osqp==0.6.1",    "overrides==2.8.0",    "pandas==1.1.3",    "patsy==0.5.1",    "Pillow==8.0.1",    "plotly==3.10.0",    "pystan==2.18.0.0",    "pyzmq==22.0.3",    "scipy==1.5.4",    "seaborn==0.9.0",    "six==1.15.0",    "scikit-learn==0.24.1",    "Sphinx==3.2.1",    "sphinx-gallery==0.6.1",    "sphinx-rtd-theme==0.4.2",    "statsmodels==0.12.2",    "testfixtures==6.14.2",    "tornado==5.1.1",    "tqdm==4.52.0"]
    

    My question is - why pin them so tightly, and are all of them really necessary? E.g. do I really need sphinx-gallery? Such tight pins make it very difficult to integrate into any existing project. Why not just require a lower bound for many/most of these?

  • Seasonality changepoint detection does not seem to work with cross-validation for Silverkite

    Seasonality changepoint detection does not seem to work with cross-validation for Silverkite

    Hi,

    First of all thank you for open-sourcing this library. It's really complete and well though (as well as the Silverkite algorithm itself).

    However, I think I have spotted a potential bug:

    It seems that the option seasonality_changepoints_dict in ModelComponentsParam does seem to break some functionality within pandas, when running Silverkite with cross-validation.

    Here's a complete example (using Greykite 0.2.0):

    import pandas as pd
    import numpy as np
    
    # Load airline passengers dataset (with monthly data):
    air_passengers = pd.read_csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv")
    air_passengers["Month"] = pd.to_datetime(air_passengers["Month"])
    air_passengers = air_passengers.set_index("Month").asfreq("MS").reset_index()
    
    # Prepare Greykite configs:
    from greykite.framework.templates.autogen.forecast_config import (ComputationParam, 
                                                                      EvaluationMetricParam, 
                                                                      EvaluationPeriodParam,
                                                                      ForecastConfig, 
                                                                      MetadataParam, 
                                                                      ModelComponentsParam)
    
    # Metadata:
    metadata_params = MetadataParam(date_format=None,  # infer
                                    freq="MS",
                                    time_col="Month",
                                    train_end_date=None,
                                    value_col="Passengers")
    
    # Eval metric:
    evaluation_metric_params = EvaluationMetricParam(agg_func=np.sum,   # Sum all forecasts...
                                                     agg_periods=12,    # ...Over 12 months
                                                     cv_report_metrics=["MeanSquaredError", "MeanAbsoluteError", "MeanAbsolutePercentError"],
                                                     cv_selection_metric="MeanAbsolutePercentError",
                                                     null_model_params=None,
                                                     relative_error_tolerance=None)
    
    # Eval procedure (CV & backtest):
    evaluation_period_params = EvaluationPeriodParam(cv_expanding_window=False,
                                                     cv_horizon=0,   # No CV for now. CHANGE THIS
                                                     cv_max_splits=5,
                                                     cv_min_train_periods=24,
                                                     cv_periods_between_splits=6,
                                                     cv_periods_between_train_test=0,
                                                     cv_use_most_recent_splits=False,
                                                     periods_between_train_test=0,
                                                     test_horizon=12)
    
    # Config for seasonality changepoints
    seasonality_components_df = pd.DataFrame({"name": ["conti_year"],
                                              "period": [1.0],
                                              "order": [5],
                                              "seas_names": ["yearly"]})
    
    # Model components (quite long):
    model_components_params = ModelComponentsParam(autoregression={"autoreg_dict": "auto"},
                                                   
                                                   changepoints={"changepoints_dict":  [{"method":"auto",
                                                                                         "potential_changepoint_n": 50,
                                                                                         "no_changepoint_proportion_from_end": 0.2,
                                                                                         "regularization_strength": 0.01}],
                                                                 
                                                                 # Seasonality changepoints
                                                                 "seasonality_changepoints_dict": [{"regularization_strength": 0.6,
                                                                                                    "no_changepoint_proportion_from_end": 0.8,
                                                                                                    "seasonality_components_df": seasonality_components_df,
                                                                                                    "potential_changepoint_n": 50,
                                                                                                    "resample_freq":"MS"},
                                                                                                   ]
                                                                },
                                                   
                                                   custom={"fit_algorithm_dict": [{"fit_algorithm": "linear"},
                                                                                  ],
                                                           "feature_sets_enabled": "auto",
                                                           "min_admissible_value": 0.0},
                                                   
                                                   events={"holiday_lookup_countries": None,
                                                           "holidays_to_model_separately": None,
                                                           },
                                                   
                                                   growth={"growth_term":["linear"]},
                                                   
                                                   hyperparameter_override={"input__response__outlier__z_cutoff": [100.0],
                                                                            "input__response__null__impute_algorithm": ["ts_interpolate"]},
                                                   
                                                   regressors=None,
                                                   
                                                   lagged_regressors=None,
                                                   
                                                   seasonality={"yearly_seasonality": [5],
                                                                "quarterly_seasonality": ["auto"],
                                                                "monthly_seasonality": False,
                                                                "weekly_seasonality": False,
                                                                "daily_seasonality": False},
                                                   
                                                   uncertainty=None)
    
    # Computation
    computation_params = ComputationParam(n_jobs=1,
                                          verbose=3)
    
    
    # Define forecaster:
    from greykite.framework.templates.forecaster import Forecaster
    
    # defines forecast configuration
    config=ForecastConfig(model_template="SILVERKITE",
                          forecast_horizon=12,
                          coverage=0.8,
                          metadata_param=metadata_params,
                          evaluation_metric_param=evaluation_metric_params,
                          evaluation_period_param=evaluation_period_params,
                          model_components_param=model_components_params,
                          computation_param=computation_params,
                         )
    
    # Run:
    # creates forecast
    forecaster = Forecaster()
    result = forecaster.run_forecast_config(df=air_passengers, 
                                            config=config 
                                            )
    

    If we run the piece of code above, everything works as expected. However, if we activate cross-validation (increasing cv_horizon to 5 for instance), Greykite crashes. This happens unless we remove seasonality changepoints (through removing seasonality_changepoints_dict).

    The crash traceback looks as follows:

    5 fits failed out of a total of 5.
    The score on these train-test partitions for these parameters will be set to nan.
    If these failures are not expected, you can try to debug them by setting error_score='raise'.
    
    Below are more details about the failures:
    --------------------------------------------------------------------------------
    5 fits failed with the following error:
    Traceback (most recent call last):
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_validation.py", line 681, in _fit_and_score
        estimator.fit(X_train, y_train, **fit_params)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\pipeline.py", line 394, in fit
        self._final_estimator.fit(Xt, y, **fit_params_last_step)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\sklearn\estimator\simple_silverkite_estimator.py", line 239, in fit
        self.model_dict = self.silverkite.forecast_simple(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_simple_silverkite.py", line 708, in forecast_simple
        trained_model = super().forecast(**parameters)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\forecast\silverkite\forecast_silverkite.py", line 719, in forecast
        seasonality_changepoint_result = get_seasonality_changepoints(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 1177, in get_seasonality_changepoints
        result = cd.find_seasonality_changepoints(**seasonality_changepoint_detection_args)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\common\python_utils.py", line 787, in fn_ignore
        return fn(*args, **kwargs)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoint_detector.py", line 736, in find_seasonality_changepoints
        seasonality_df = build_seasonality_feature_df_with_changes(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\greykite\algo\changepoint\adalasso\changepoints_utils.py", line 237, in build_seasonality_feature_df_with_changes
        fs_truncated_df.loc[(features_df["datetime"] < date).values, cols] = 0
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 719, in __setitem__
        indexer = self._get_setitem_indexer(key)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 646, in _get_setitem_indexer
        self._ensure_listlike_indexer(key)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexing.py", line 709, in _ensure_listlike_indexer
        self.obj._mgr = self.obj._mgr.reindex_axis(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\base.py", line 89, in reindex_axis
        return self.reindex_indexer(
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\internals\managers.py", line 670, in reindex_indexer
        self.axes[axis]._validate_can_reindex(indexer)
      File "C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\pandas\core\indexes\base.py", line 3785, in _validate_can_reindex
        raise ValueError("cannot reindex from a duplicate axis")
    ValueError: cannot reindex from a duplicate axis
    
    
    C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:
    
    One or more of the test scores are non-finite: [nan]
    
    C:\Users\SOTOVJU1\Anaconda3\envs\greykite\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning:
    
    One or more of the train scores are non-finite: [nan]
    

    It would be great to cross-validate when seasonality changepoint is activated, as it allows to learn multiplicative seasonalities for instance in a similar fashion as Prophet or Orbit do.

    Thank you!

  • "cv_selection_metric" & "cv_report_metrics"

    Hello all,

    I am running the Greykite without cross-validation (cv_max_splits = 0) because I am using the LassoCV() algorithm which by itself uses 5-fold CV. The ForecastConfig() is as follows, in particular, evaluation_metric is all set to None because cv_max_splits = 0:

    Capture

    However, the output on the console suggests that at least 3 metrics are evaluated. My response contains zeros so I do not want MAPE and MedAPE to be reported, and I do not want "Correlation" to be reported either. As a matter of fact, since the loss function in LassoCV() is MSE (L2-norm), I am not interested in anything rather than MSE, really. Unless the loss function in LassoCV() could be changed to MAE (L1-norm) in that case I would be interested in the MAE instead of MSE:

    Capture1

    Do you have any suggestions please ?

    Best regards, Dario

  • Greykite suitable for pure linear increasing series?

    Greykite suitable for pure linear increasing series?

    Hello

    I'm working in some house price time series using Greykite but for some reason, the forecast I got is just a median price between upper and lower (ARIMA), so is this known issue with Greykite when we have a pure linear increasing series?

    Thank you Aktham Momani greykite_forecast

  • Setting of

    Setting of "cv_max_splits" when using "fit_algorithm": "lasso"

    Hi all,

    When setting fit_algorithm_params={"cv": 5} to use 5-fold CV with sklearn LassoCV() on the training set, how should the global parameter "cv_max_splits" be set up ? (either set it to zero, or to None - equivalent to 3 - or equal to 5 ?).

    Best regards, Dario

  • Getting Various Warnings while running time series prediction

    Getting Various Warnings while running time series prediction

    • I'm trying to fit GreyKite Model to my time series data.

    • I have attached the csv file for reference.

    • Even though the model works, it raises a bunch of warnings that I'd like to avoid.

    • Since some of my target values are zero it tells me that MAPE is undefined.

    • Also, since I'm only forecasting one step into the future, it gives me an UndefinedMetricWarning : R^2 score is not well-defined with less than two samples.'

    • I have attached a few images displaying the warnings.

    • Any help to get rid of these warnings would be appreciated!

    • This is the code I'm using to fit the data:

    `class GreyKiteModel(AnomalyModel):

    def __init__(self, *args,model_kwargs = {}, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.model_kwargs = model_kwargs
        
    def predict(self, df: pd.DataFrame, ) -> pd.DataFrame:
        """Takes in pd.DataFrame with 2 columns, dt and y, and returns a 
        pd.DataFrame with 3 columns, dt, y, and yhat_lower, yhat_upper.
    
        :param df: Input Dataframe with dt, y columns
        :type df: pd.DataFrame
        :return: Output Dataframe with dt, y, yhat_lower, yhat_upper 
        columns
        :rtype: pd.DataFrame
        """
        df = df.rename(columns = {"dt":"ds", "y":"y"})
        metadata = MetadataParam(time_col="ds", # ----> name of the time column 
                                 value_col="y", # ----> name of the value column 
                                 freq="D"       # ----> H" for hourly, "D" for daily, "W" for weekly, etc. 
                                )
        forecaster = Forecaster()  # Creates forecasts and stores the result
        result = forecaster.run_forecast_config(df=df, # result is also stored as forecaster.forecast_result
                                                config=ForecastConfig(model_template=ModelTemplateEnum.SILVERKITE.name,
                                                                      forecast_horizon=1,  # forecasts 1 step
                                                                      coverage=0.95,
                                                                      metadata_param=metadata 
                                                                      )
                                                )
        forecast_df = result.forecast.df
        forecast_df = forecast_df.drop(columns=['actual'])
        forecast_df.rename(columns={'ds':'dt',
                                    'forecast':'y', 
                                    'forecast_lower':'yhat_lower', 
                                    'forecast_upper':'yhat_upper' },
                           inplace=True)
        return forecast_df`
    

    df.csv

    Screenshot from 2021-08-21 12-39-55

    Screenshot from 2021-08-21 12-39-10

  • Load Model from a GCP Cloud Function

    Load Model from a GCP Cloud Function

    I'm trying to deploy my greykite model on GCP via a cloud function. The existing read and write functions only work for local directories and not cloud blob storage options. I've adjusted the write function to write to cloud storage but the load function is proving to be a bit challenging.

  • Predictions taking too long

    Predictions taking too long

    Hi Greykite Team!

    I am trying to use Greykite to predict at scale and I am not sure if I am doing something wrong but even with the example code, the predictions take a long time to calculate. Sometime in the 20, 30, 40 seconds and others in the minutes. Any help will be greatly appreciated. Below is a sample code I am running that takes about 17 or so seconds.

    from greykite.framework.templates.autogen.forecast_config import ForecastConfig from greykite.framework.templates.autogen.forecast_config import MetadataParam from greykite.framework.templates.forecaster import Forecaster from greykite.framework.templates.model_templates import ModelTemplateEnum import numpy as np import pandas as pd np.random.seed(1)

    rows,cols = 365,1 data = np.random.rand(rows,cols) tidx = pd.date_range('2019-01-01', periods=rows, freq='MS') data_frame = pd.DataFrame(data, columns=['y'], index=tidx) data_frame = data_frame.reset_index() data_frame.columns = ['ts', 'y']

    metadata = MetadataParam( time_col="ts", # time column in df value_col="y" # value in df ) forecaster = Forecaster() # creates forecasts and stores the result forecaster.run_forecast_config( df=data_frame, config=ForecastConfig( # uses the SILVERKITE model template parameters model_template=ModelTemplateEnum.SILVERKITE.name, forecast_horizon=365, # forecasts 365 steps ahead coverage=0.95, # 95% prediction intervals metadata_param=metadata ) )

    forecaster.forecast_result

  • Training the model on all data

    Training the model on all data

    Hello,

    First of all, thanks for this library!

    I want to train the model on all of my data, then create a future dataframe and let the model forecast those timesteps. This is to simulate a real-world situation where you actually want to predict the future, in which you don't have any data to validate on.

    The last timestamp in my dataset is 2020-02-20 09:00:00. So I set the train_end_date to this timestamp in MetadataParam like this:

    metadata = MetadataParam( time_col="ts", value_col="y",
    freq="H", train_end_date=datetime.datetime(2020, 2, 20, 9) )

    Then, in forecaster.forecast_config, I tried commenting out forecast horizon, which needs to be >= 1.

    forecaster = Forecaster() # Creates forecasts and stores the result result = forecaster.run_forecast_config( df=df_model, config=ForecastConfig( model_template=ModelTemplateEnum.SILVERKITE.name, # model template #forecast_horizon=1, coverage=0.95, # 95% prediction intervals metadata_param=metadata, model_components_param=model_components, evaluation_period_param=evaluation_period ) )

    Running this I get the Error message: ValueError: fut_df must be a dataframe of non-zero size.

    So the closest I have come to achieve what I want is to set train_end_date=datetime.datetime(2020, 2, 20, 8), an hour before the last timestamp in the dataset, and use forecast_horizon=1. However, I still want the model to train on this last hour, since I intend to run a short-term forecast.

    So, the question I have is; how do I train the model on all of my data, without forecasting on it before I give the model a future dataframe?

  • TimeSeries features

    TimeSeries features

    Hi all,

    Great library and work! I was curious if there is a recommended way to get the time series features as a dataframe without running the model? I am looking to compare with other models.

    Thanks, George

  • Can't save model

    Can't save model

    After fitting model I would like to persist it for later use in my app. I tried to save the model (result.model), the forecaster, the forecaster and forecaster.forecast_result and none of them could be persisted using pickle or joblib.

    That's the error I get. Any advice?

    ---------------------------------------------------------------------------
    PicklingError                             Traceback (most recent call last)
    <ipython-input-77-0716155adc48> in <module>
    ----> 1 joblib.dump(result.model, model_path)
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in dump(value, filename, compress, protocol, cache_size)
        478     elif is_filename:
        479         with open(filename, 'wb') as f:
    --> 480             NumpyPickler(f, protocol=protocol).dump(value)
        481     else:
        482         NumpyPickler(filename, protocol=protocol).dump(value)
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in dump(self, obj)
        435         if self.proto >= 4:
        436             self.framer.start_framing()
    --> 437         self.save(obj)
        438         self.write(STOP)
        439         self.framer.end_framing()
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        547 
        548         # Save the reduce() output and finally memoize the object
    --> 549         self.save_reduce(obj=obj, *rv)
        550 
        551     def persistent_id(self, obj):
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
        660 
        661         if state is not None:
    --> 662             save(state)
        663             write(BUILD)
        664 
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        502         f = self.dispatch.get(t)
        503         if f is not None:
    --> 504             f(self, obj) # Call unbound method with explicit self
        505             return
        506 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_dict(self, obj)
        857 
        858         self.memoize(obj)
    --> 859         self._batch_setitems(obj.items())
        860 
        861     dispatch[dict] = save_dict
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_setitems(self, items)
        883                 for k, v in tmp:
        884                     save(k)
    --> 885                     save(v)
        886                 write(SETITEMS)
        887             elif n:
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        502         f = self.dispatch.get(t)
        503         if f is not None:
    --> 504             f(self, obj) # Call unbound method with explicit self
        505             return
        506 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_list(self, obj)
        817 
        818         self.memoize(obj)
    --> 819         self._batch_appends(obj)
        820 
        821     dispatch[list] = save_list
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_appends(self, items)
        841                 write(MARK)
        842                 for x in tmp:
    --> 843                     save(x)
        844                 write(APPENDS)
        845             elif n:
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        502         f = self.dispatch.get(t)
        503         if f is not None:
    --> 504             f(self, obj) # Call unbound method with explicit self
        505             return
        506 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_tuple(self, obj)
        772         if n <= 3 and self.proto >= 2:
        773             for element in obj:
    --> 774                 save(element)
        775             # Subtle.  Same as in the big comment below.
        776             if id(obj) in memo:
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        547 
        548         # Save the reduce() output and finally memoize the object
    --> 549         self.save_reduce(obj=obj, *rv)
        550 
        551     def persistent_id(self, obj):
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
        660 
        661         if state is not None:
    --> 662             save(state)
        663             write(BUILD)
        664 
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        502         f = self.dispatch.get(t)
        503         if f is not None:
    --> 504             f(self, obj) # Call unbound method with explicit self
        505             return
        506 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_dict(self, obj)
        857 
        858         self.memoize(obj)
    --> 859         self._batch_setitems(obj.items())
        860 
        861     dispatch[dict] = save_dict
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in _batch_setitems(self, items)
        883                 for k, v in tmp:
        884                     save(k)
    --> 885                     save(v)
        886                 write(SETITEMS)
        887             elif n:
    
    /work/y435/crypto-forecast/lib/python3.7/site-packages/joblib/numpy_pickle.py in save(self, obj)
        280             return
        281 
    --> 282         return Pickler.save(self, obj)
        283 
        284 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
        502         f = self.dispatch.get(t)
        503         if f is not None:
    --> 504             f(self, obj) # Call unbound method with explicit self
        505             return
        506 
    
    /work/y435/crypto-forecast/lib/python3.7/pickle.py in save_global(self, obj, name)
        958             raise PicklingError(
        959                 "Can't pickle %r: it's not found as %s.%s" %
    --> 960                 (obj, module_name, name)) from None
        961         else:
        962             if obj2 is not obj:
    
    PicklingError: Can't pickle <function add_finite_filter_to_scorer.<locals>.score_func_finite at 0x7f490e750d40>: it's not found as greykite.common.evaluation.add_finite_filter_to_scorer.<locals>.score_func_finite
    
  • Greykite Forecaster Model is Unpickle-able

    Greykite Forecaster Model is Unpickle-able

    Even basic implementation of greykite (see below) does not pickle properly, due to some of the design choices within Greykite (e.g. nested functions and namedtuple definitions within function class calls.

    Was this a purposeful design choice? Is there another method to save a trained model state and reuse the model to create inferences downstream? Integrations with deployment tools become much more challenging if we need to retrain the model every time and can't save the model state. Looking for guidance here on best practice - thanks!

    Here's code to reproduce the issue:

    from greykite.framework.templates.autogen.forecast_config import ForecastConfig
    from greykite.framework.templates.autogen.forecast_config import MetadataParam
    from greykite.framework.templates.forecaster import Forecaster
    from greykite.framework.templates.model_templates import ModelTemplateEnum
    
    import pandas as pd
    import numpy as np
    
    date_list = pd.date_range(start='2020-01-01', end='2022-01-01', freq='W-FRI')
    df_train = pd.DataFrame(
        {
            'week_end_date': date_list,
            'data': np.random.rand(len(date_list))
        }
    )
    
    metadata = MetadataParam(
        time_col="week_end_date",
        value_col=df_train.columns[-1],
        freq='W-FRI'
    )
    
    fc = Forecaster()
    result = fc.run_forecast_config(
        df=df_train,
        config=ForecastConfig(
            model_template=ModelTemplateEnum.SILVERKITE.name,
            forecast_horizon=52,
            coverage=0.95,         # 95% prediction intervals
            metadata_param=metadata
        )
    )
    
    import dill
    with open("pickle_out.b", "wb") as fp:
        dill.dump(result.model, fp)
        output_set = dill.load(fp)
    
  • Does multi-stage forecasting supports weekly aggregation as-well

    Does multi-stage forecasting supports weekly aggregation as-well

    Hi Team,

    Can you please confirm if multi-stage forecasting works on weekly aggregation as well.

    I tried with data that has daily frequency, so for one stage I kept the daily frequency & next stage its the weekly aggregation of the daily data.

    But getting the below error

    TypeError: '<' not supported between instances of 'pandas._libs.tslibs.offsets.Day' and 'pandas._libs.tslibs.offsets.Week'

  • Different Precision Causes IndexError

    Different Precision Causes IndexError

    Hello,

    Different precision of variable types causes an index error exactly like the one found in #64

    If I change the precision of my target variable Greykite throws an error. Should the amount of bits used cause an IndexError?

    I created a minimum working example. Below I change the precision of y in the Peyton Manning Example from float64 to float32 it throws an error.

    Example:

    from greykite.common.data_loader import DataLoader
    from greykite.framework.templates.autogen.forecast_config import ForecastConfig
    from greykite.framework.templates.autogen.forecast_config import MetadataParam
    from greykite.framework.templates.forecaster import Forecaster
    from greykite.framework.templates.model_templates import ModelTemplateEnum
    from greykite.framework.utils.result_summary import summarize_grid_search_results
    
    # Loads dataset into pandas DataFrame
    dl = DataLoader()
    df = dl.load_peyton_manning()
    
    # copy pd.Dataframe and change precision
    df2 = df.copy()
    df2.y = df2.y.astype('float32')
    
    
    # specify dataset information
    metadata = MetadataParam(
       time_col="ts",  # name of the time column ("date" in example above)
       value_col="y",  # name of the value column ("sessions" in example above)
       freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
    
    forecaster = Forecaster()  # Creates forecasts and stores the result
    result = forecaster.run_forecast_config(  # result is also stored as `forecaster.forecast_result`.
         df=df2,
         config=ForecastConfig(
             model_template=ModelTemplateEnum.SILVERKITE.name,
             forecast_horizon=365,  # forecasts 365 steps ahead
             coverage=0.95,         # 95% prediction intervals
             metadata_param=metadata
         )
     )            
    

    Fitting 3 folds for each of 1 candidates, totalling 3 fits /databricks/python/lib/python3.8/site-packages/greykite/algo/forecast/silverkite/forecast_simple_silverkite_helper.py:129: UserWarning: Requested holiday 'Easter Monday [England, Wales, Northern Ireland]' does not occur in the provided countries warnings.warn( /databricks/python/lib/python3.8/site-packages/sklearn/model_selection/_validation.py:610: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: Traceback (most recent call last): File "/databricks/python/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 593, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/databricks/python_shell/dbruntime/MLWorkloadsInstrumentation/_sklearn.py", line 29, in patch_function original_result = original(self, *args, **kwargs) File "/databricks/python/lib/python3.8/site-packages/sklearn/pipeline.py", line 346, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step) File "/databricks/python/lib/python3.8/site-packages/greykite/sklearn/estimator/simple_silverkite_estimator.py", line 250, in fit self.model_dict = self.silverkite.forecast_simple( File "/databricks/python/lib/python3.8/site-packages/greykite/algo/forecast/silverkite/forecast_simple_silverkite.py", line 747, in forecast_simple trained_model = super().forecast(**parameters) File "/databricks/python/lib/python3.8/site-packages/greykite/algo/forecast/silverkite/forecast_silverkite.py", line 939, in forecast trained_model = fit_ml_model_with_evaluation( File "/databricks/python/lib/python3.8/site-packages/greykite/algo/common/ml_models.py", line 806, in fit_ml_model_with_evaluation training_evaluation[R2_null_model_score] = r2_null_model_score( File "/databricks/python/lib/python3.8/site-packages/greykite/common/evaluation.py", line 333, in r2_null_model_score y_true, y_pred, y_train, y_pred_null = valid_elements_for_evaluation( File "/databricks/python/lib/python3.8/site-packages/greykite/common/evaluation.py", line 157, in valid_elements_for_evaluation return [array[keep] for array in reference_arrays] + [np.array(array)[keep] if ( File "/databricks/python/lib/python3.8/site-packages/greykite/common/evaluation.py", line 157, in return [array[keep] for array in reference_arrays] + [np.array(array)[keep] if ( IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

  • problems with additional regressors in low-level template

    problems with additional regressors in low-level template

    Hi. I've been evaluating the Greykite Forecasting algorithm for use with additional regressors. I am using the low-level template ModelTemplateEnum.SK.

    The documentation shown here: https://linkedin.github.io/greykite/docs/0.3.0/html/gallery/tutorials/0200_templates.html#the-low-level-templates-in-silverkite

    defines the model template as:

    model_components_param_sk = ModelComponentsParam(  
         growth={  
         },  # growth does not accept any parameters, pass growth term via 'extra_pred_cols' instead.  
         seasonality={  
             "fs_components_df": [pd.DataFrame({  
                 "name": ["tod", "tow", "tom", "toq", "toy"],  
                 "period": [24.0, 7.0, 1.0, 1.0, 1.0],  
                 "order": [3, 3, 1, 1, 5],  
                 "seas_names": ["daily", "weekly", "monthly", "quarterly", "yearly"]})],  
         },  
         changepoints={  
             "changepoints_dict": [None],  
             "seasonality_changepoints_dict": [None]  
         },  
         events={  
             "daily_event_df_dict": [None]  
         },  
         autoregression={  
             "autoreg_dict": [None]  
         },  
         regressors={  
             "regressor_cols": [None]  
         },  
         uncertainty={  
             "uncertainty_dict": [None]  
         },  
         custom={  
             "fit_algorithm_dict": {  
                 "fit_algorithm": "ridge",  
                 "fit_algorithm_params": None,  
             },  
             "extra_pred_cols": ["ct1"],  # linear growth  
             "min_admissible_value": [None],  
             "max_admissible_value": [None],  
         }  
    )
    

    However, using this template as-is results in the following error:

    ---------------------------------------------------------------------------  
    ValueError                                Traceback (most recent call last)
    ~\AppData\Local\Temp/ipykernel_20796/3149020205.py in <module>
    ----> 1 result = forecaster.run_forecast_config(  # result is also stored as 'forecaster.forecast_result'.
          2      df=df_train,
          3      config=ForecastConfig(
          4          model_template=ModelTemplateEnum.SK.name,
          5          forecast_horizon=None,  # forecast window
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\forecaster.py in run_forecast_config(self, df, config)
        321             according to the ``df`` and ``config`` configuration parameters.
        322         """
    --> 323         pipeline_parameters = self.apply_forecast_config(
        324             df=df,
        325             config=config)
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\forecaster.py in apply_forecast_config(self, df, config)
        291         self.template_class = self.__get_template_class(self.config)
        292         self.template = self.template_class()
    --> 293         self.pipeline_params = self.template.apply_template_for_pipeline_params(df=df, config=self.config)
        294         self.__apply_forecast_one_by_one_to_pipeline_parameters()
        295         return self.pipeline_params
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\silverkite_template.py in process_wrapper(self, df, config)
        625                 raise ValueError(f"SilverkiteTemplate only supports config.model_template='SK', "
        626                                  f"found '{config.model_template}'")
    --> 627             pipeline_params = func(self, df, config)
        628             return pipeline_params
        629         return process_wrapper
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\silverkite_template.py in apply_template_for_pipeline_params(self, df, config)
        651             '~greykite.framework.pipeline.pipeline.forecast_pipeline'.
        652         """
    --> 653         return super().apply_template_for_pipeline_params(df=df, config=config)
        654 
        655     apply_template_decorator = staticmethod(apply_template_decorator)
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\base_template.py in process_wrapper(self, df, config)
        280             config = self.apply_forecast_config_defaults(config)
        281             # <optional processing before calling 'func', if needed>
    --> 282             pipeline_params = func(self, df, config)
        283             # <optional postprocessing after calling 'func', if needed>
        284             return pipeline_params
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\base_template.py in apply_template_for_pipeline_params(self, df, config)
        320         self.pipeline = self.get_pipeline()
        321         self.time_properties = self.get_forecast_time_properties()
    --> 322         self.hyperparameter_grid = self.get_hyperparameter_grid()
        323 
        324         self.pipeline_params = dict(
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\silverkite_template.py in get_hyperparameter_grid(self)
        566             The output dictionary values are lists, combined in grid search.
        567         """
    --> 568         self.config.model_components_param = apply_default_model_components(
        569             model_components=self.config.model_components_param,
        570             time_properties=self.time_properties)
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\framework\templates\silverkite_template.py in apply_default_model_components(model_components, time_properties)
        171 
        172     default_regressors = {}
    --> 173     model_components.regressors = update_dictionary(
        174         default_regressors,
        175         overwrite_dict=model_components.regressors,
    
    C:\Anaconda3\envs\aml\lib\site-packages\greykite\common\python_utils.py in update_dictionary(default_dict, overwrite_dict, allow_unknown_keys)
         71         extra_keys = overwrite_dict.keys() - default_dict.keys()
         72         if extra_keys:
    ---> 73             raise ValueError(f"Unexpected key(s) found: {extra_keys}. "
         74                              f"The valid keys are: {default_dict.keys()}")
         75 
    
    ValueError: Unexpected key(s) found: {'regressor_cols'}. The valid keys are: dict_keys([])
    

    When I trace the code I find the issue on lines 173-177 of .\greykite\framework\templates\silverkite_template.py:

    default_regressors = {}  
    model_components.regressors = update_dictionary(  
         default_regressors,  
         overwrite_dict=model_components.regressors,  
         allow_unknown_keys=False)
    

    There is no default dictionary key “regressor_cols” specified on the line default_regressors = {}. As such, no key/value pair can be accepted as it fails the verification step before updating the dictionary since the key is considered unknown. The posted template does not appear compatible with the structure of the code. Setting regressors=None in ModelComponentsParam bypasses this issue.

    Going one step further, I noticed comments in silverkite_template.py (lines 64-66, 327-328), stating that the extra_pred_cols parameter should be used for additional regressors. Is this correct? If so, would you please update the documentation to show how to use additional regressors with the low-level template? I tried adding one additional regressor to this parameter but it resulted in another error.

    Thanks.

  • Wrong assignment to summary prediction categories

    Wrong assignment to summary prediction categories

    Hi all,

    I have added 69 regressors to ModelComponentsParam, so my model instance is as follows: ModelComponentsParam(autoregression={'autoreg_dict': 'auto'}, changepoints={'changepoints_dict': [None, {'method': 'auto'}]}, custom={'fit_algorithm_dict': [{'fit_algorithm': 'elastic_net', 'fit_algorithm_params': {'l1_ratio': array([0.01, 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 0.99]), 'n_alphas': 100, 'alphas': None, 'fit_intercept': True, 'cv': None, 'tol': 0.001, 'max_iter': 1000}}], 'feature_sets_enabled': 'auto', 'min_admissible_value': None, 'max_admissible_value': None}, events={'holiday_lookup_countries': [], 'holidays_to_model_separately': None, 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': None}, hyperparameter_override={'input__response__outlier__use_fit_baseline': False, 'input__response__outlier__z_cutoff': None, 'input__response__null__impute_algorithm': None, 'input__regressors_numeric__outlier__use_fit_baseline': False, 'input__regressors_numeric__outlier__z_cutoff': None, 'input__regressors_numeric__null__impute_algorithm': None, 'input__regressors_numeric__normalize__normalize_algorithm': 'RobustScaler', 'input__regressors_numeric__normalize__normalize_params': {'quantile_range': (10.0, 90.0)}, 'degenerate__drop_degenerate': False}, regressors={'regressor_cols': ['ownSame_5_NET_PRICE_BAG', 'media_total_spend', 'Discount_Depth', 'weather_wghtd_avg_tmp_flslk_2m_f_max_low', 'ownSame_3_NET_PRICE_BAG', 'media_tv_traditional', 'hol_EasterSunday', 'cg_STRNGNCY_LGCY_INDX_MEAN', 'hol_RestorationofIndependence_LAG1', 'NET_PRICE_BAG', 'hol_AllSaintsDay_LAG3', 'hol_holiday_count', 'hol_AssumptionDay_lead4', 'hol_AssumptionDay_lead2', 'cc_AVG_IR_MEAN', 'hol_spain_perc_wknd', 'media_digital', 'hol_CorpusChristi_LAG2', 'hol_AssumptionDay_lead3', 'hol_CorpusChristi_LAG3', 'inp_actual_inventory', 'weather_wghtd_avg_tmp_flslk_2m_f_min_high', 'gm_AVG_PRKS_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_max_high', 'hol_holiday_count_lead4', 'hol_spain_hol_flag', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_mid', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_high', 'hol_ConstitutionDay_lead2', 'hol_portugal_perc_longwknd', 'gm_AVG_RSDNTL_AND_PHRMCY_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_min_low', 'weather_presence_of_snow', 'hol_ImmaculateConception', 'hol_portugal_fri_mon_flag', 'ownSame_2_NET_PRICE_BAG', 'gm_AVG_RTL_AND_RCRTN_PCNT_CHNG_FRM_BSLNE_MEAN', 'hol_spain_fri_mon_flag', 'hol_NationalDay_LAG4', 'Dollar_Discount', 'hol_portugal_perc_wknd', 'weather_wghtd_avg_tmp_flslk_2m_f_min_mid', 'hol_NationalDay_LAG3', 'ownSame_1_NET_PRICE_BAG', 'hol_portugal_hol_flag', 'cg_STRNGNCY_INDX_MEAN', 'hol_ConstitutionDay', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_low', 'hol_Epiphany_lead4', 'cg_CNTNMT_HLTH_INDX_MEAN', 'cc_AVG_CFR_MEAN', 'hol_spain_perc_longwknd', 'weather_wghtd_avg_tmp_flslk_2m_f_max_mid', 'cg_GOVT_RSPNS_INDX_MEAN', 'inp_projected_inventory', 'ph_Unemployment_persons', 'ownSame_4_NET_PRICE_BAG', 'cc_REC_MEAN', 'inp_actual_inventory_flag', 'hol_NationalDay_LAG2', 'hol_AssumptionDay_lead1', 'hol_GoodFriday', 'hol_CorpusChristi_LAG4', 'hol_spain_wknd_flag', 'hol_portugal_wknd_flag', 'weather_wghtd_avg_cld_cvr_tot_pct_max_mid', 'weather_wghtd_avg_cld_cvr_tot_pct_max_low', 'weather_wghtd_avg_cld_cvr_tot_pct_max_high', 'media_total_spend_lag4']}, lagged_regressors=None, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': False, 'daily_seasonality': False}, uncertainty=None)

    When checking the model output summary, I only find 65 regressors in the regressor_features category. Of the missing 4, 3 have ended up in the trend_features category, and 1 in the lag_features category:

    summary = result.model[-1].summary()
    for key, val in summary.pred_category.items():
        print(key)
        print(len(val))
        print(val)
    

    intercept 1 ['Intercept'] time_features 0 [] event_features 0 [] trend_features 3 ['weather_wghtd_avg_cld_cvr_tot_pct_max_mid', 'weather_wghtd_avg_cld_cvr_tot_pct_max_low', 'weather_wghtd_avg_cld_cvr_tot_pct_max_high'] seasonality_features 44 ['sin1_tom_monthly', 'cos1_tom_monthly', 'sin2_tom_monthly', 'cos2_tom_monthly', 'sin1_toq_quarterly', 'cos1_toq_quarterly', 'sin2_toq_quarterly', 'cos2_toq_quarterly', 'sin3_toq_quarterly', 'cos3_toq_quarterly', 'sin4_toq_quarterly', 'cos4_toq_quarterly', 'sin5_toq_quarterly', 'cos5_toq_quarterly', 'sin1_ct1_yearly', 'cos1_ct1_yearly', 'sin2_ct1_yearly', 'cos2_ct1_yearly', 'sin3_ct1_yearly', 'cos3_ct1_yearly', 'sin4_ct1_yearly', 'cos4_ct1_yearly', 'sin5_ct1_yearly', 'cos5_ct1_yearly', 'sin6_ct1_yearly', 'cos6_ct1_yearly', 'sin7_ct1_yearly', 'cos7_ct1_yearly', 'sin8_ct1_yearly', 'cos8_ct1_yearly', 'sin9_ct1_yearly', 'cos9_ct1_yearly', 'sin10_ct1_yearly', 'cos10_ct1_yearly', 'sin11_ct1_yearly', 'cos11_ct1_yearly', 'sin12_ct1_yearly', 'cos12_ct1_yearly', 'sin13_ct1_yearly', 'cos13_ct1_yearly', 'sin14_ct1_yearly', 'cos14_ct1_yearly', 'sin15_ct1_yearly', 'cos15_ct1_yearly'] lag_features 1 ['media_total_spend_lag4'] regressor_features 65 ['ownSame_5_NET_PRICE_BAG', 'media_total_spend', 'Discount_Depth', 'weather_wghtd_avg_tmp_flslk_2m_f_max_low', 'ownSame_3_NET_PRICE_BAG', 'media_tv_traditional', 'hol_EasterSunday', 'cg_STRNGNCY_LGCY_INDX_MEAN', 'hol_RestorationofIndependence_LAG1', 'NET_PRICE_BAG', 'hol_AllSaintsDay_LAG3', 'hol_holiday_count', 'hol_AssumptionDay_lead4', 'hol_AssumptionDay_lead2', 'cc_AVG_IR_MEAN', 'hol_spain_perc_wknd', 'media_digital', 'hol_CorpusChristi_LAG2', 'hol_AssumptionDay_lead3', 'hol_CorpusChristi_LAG3', 'inp_actual_inventory', 'weather_wghtd_avg_tmp_flslk_2m_f_min_high', 'gm_AVG_PRKS_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_max_high', 'hol_holiday_count_lead4', 'hol_spain_hol_flag', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_mid', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_high', 'hol_ConstitutionDay_lead2', 'hol_portugal_perc_longwknd', 'gm_AVG_RSDNTL_AND_PHRMCY_PCNT_CHNG_FRM_BSLNE_MEAN', 'weather_wghtd_avg_tmp_flslk_2m_f_min_low', 'weather_presence_of_snow', 'hol_ImmaculateConception', 'hol_portugal_fri_mon_flag', 'ownSame_2_NET_PRICE_BAG', 'gm_AVG_RTL_AND_RCRTN_PCNT_CHNG_FRM_BSLNE_MEAN', 'hol_spain_fri_mon_flag', 'hol_NationalDay_LAG4', 'Dollar_Discount', 'hol_portugal_perc_wknd', 'weather_wghtd_avg_tmp_flslk_2m_f_min_mid', 'hol_NationalDay_LAG3', 'ownSame_1_NET_PRICE_BAG', 'hol_portugal_hol_flag', 'cg_STRNGNCY_INDX_MEAN', 'hol_ConstitutionDay', 'weather_wghtd_avg_tmp_flslk_2m_f_mean_low', 'hol_Epiphany_lead4', 'cg_CNTNMT_HLTH_INDX_MEAN', 'cc_AVG_CFR_MEAN', 'hol_spain_perc_longwknd', 'weather_wghtd_avg_tmp_flslk_2m_f_max_mid', 'cg_GOVT_RSPNS_INDX_MEAN', 'inp_projected_inventory', 'ph_Unemployment_persons', 'ownSame_4_NET_PRICE_BAG', 'cc_REC_MEAN', 'inp_actual_inventory_flag', 'hol_NationalDay_LAG2', 'hol_AssumptionDay_lead1', 'hol_GoodFriday', 'hol_CorpusChristi_LAG4', 'hol_spain_wknd_flag', 'hol_portugal_wknd_flag'] interaction_features 0 []

    Best, Dario

  • Extract components from forecast

    Extract components from forecast

    Hi, I was wondering if it is possible to extract the different modeling components (e.g. trend, holidays, seasonalities) from the forecasted time series. It's possible to do this in the Prophet framework, see: https://github.com/facebook/prophet/issues/1920

    The reason is that I would like to use a custom trend component calculated outside of Greykite.

Meerkat provides fast and flexible data structures for working with complex machine learning datasets.
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by efficient and robust IO under the hood.

May 11, 2022
A python library for easy manipulation and forecasting of time series.
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

May 13, 2022
Nixtla is an open-source time series forecasting library.
Nixtla is an open-source time series forecasting library.

Nixtla Nixtla is an open-source time series forecasting library. We are helping data scientists and developers to have access to open source state-of-

May 19, 2022
LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch
LibTraffic is a unified, flexible and comprehensive traffic prediction library based on PyTorch

LibTraffic is a unified, flexible and comprehensive traffic prediction library, which provides researchers with a credibly experimental tool and a convenient development framework. Our library is implemented based on PyTorch, and includes all the necessary steps or components related to traffic prediction into a systematic pipeline.

May 17, 2022
Price forecasting of SGB and IRFC Bonds and comparing there returns
Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

Oct 28, 2021
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

May 12, 2022
Time series forecasting with PyTorch

Our article on Towards Data Science introduces the package and provides background information. Pytorch Forecasting aims to ease state-of-the-art time

May 23, 2022
ETNA is an easy-to-use time series forecasting framework.

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from classic machine learning to SOTA neural networks, models combination methods and smart backtesting. ETNA is designed to make working with time series simple, productive, and fun.

May 22, 2022
ETNA – time series forecasting framework

ETNA Time Series Library Predict your time series the easiest way Homepage | Documentation | Tutorials | Contribution Guide | Release Notes ETNA is an

May 17, 2022
Forecasting prices using Facebook/Meta's Prophet model

CryptoForecasting using Machine and Deep learning (Part 1) CryptoForecasting using Machine Learning The main aspect of predicting the stock-related da

Nov 27, 2021
Continuously evaluated, functional, incremental, time-series forecasting
Continuously evaluated, functional, incremental, time-series forecasting

timemachines Autonomous, univariate, k-step ahead time-series forecasting functions assigned Elo ratings You can: Use some of the functionality of a s

May 15, 2022
flexible time-series processing & feature extraction
flexible time-series processing & feature extraction

tsflex is a toolkit for flexible time-series processing & feature extraction, making few assumptions about input data. Useful links Documentation Exam

May 11, 2022
A flexible CTF contest platform for coming PKU GeekGame events

Project Guiding Star: the Backend A flexible CTF contest platform for coming PKU GeekGame events Still in early development Highlights Not configurabl

Mar 3, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

May 18, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

May 18, 2022
Simple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing

Parallelized symbolic regression built on Julia, and interfaced by Python. Uses regularized evolution, simulated annealing, and gradient-free optimization.

May 19, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

May 21, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

May 13, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

May 12, 2022