ArviZ is a Python package for exploratory analysis of Bayesian models

PyPI version Azure Build Status codecov Code style: black Gitter chat DOI DOI Powered by NumFOCUS

ArviZ

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics.

ArviZ in other languages

ArviZ also has a Julia wrapper available ArviZ.jl.

Documentation

The ArviZ documentation can be found in the official docs. First time users may find the quickstart to be helpful. Additional guidance can be found in the usage documentation.

Installation

Stable

ArviZ is available for installation from PyPI. The latest stable version can be installed using pip:

pip install arviz

ArviZ is also available through conda-forge.

conda install -c conda-forge arviz

Development

The latest development version can be installed from the main branch using pip:

pip install git+git://github.com/arviz-devs/arviz.git

Another option is to clone the repository and install using git and setuptools:

git clone https://github.com/arviz-devs/arviz.git
cd arviz
python setup.py install

Gallery

Ridge plot Parallel plot Trace plot Density plot
Posterior plot Joint plot Posterior predictive plot Pair plot
Energy Plot Violin Plot Forest Plot Autocorrelation Plot

Dependencies

ArviZ is tested on Python 3.6, 3.7 and 3.8, and depends on NumPy, SciPy, xarray, and Matplotlib.

Citation

If you use ArviZ and want to cite it please use DOI

Here is the citation in BibTeX format

@article{arviz_2019,
  doi = {10.21105/joss.01143},
  url = {https://doi.org/10.21105/joss.01143},
  year = {2019},
  publisher = {The Open Journal},
  volume = {4},
  number = {33},
  pages = {1143},
  author = {Ravin Kumar and Colin Carroll and Ari Hartikainen and Osvaldo Martin},
  title = {ArviZ a unified library for exploratory analysis of Bayesian models in Python},
  journal = {Journal of Open Source Software}
}

Contributions

ArviZ is a community project and welcomes contributions. Additional information can be found in the Contributing Readme

Code of Conduct

ArviZ wishes to maintain a positive community. Additional details can be found in the Code of Conduct

Donations

ArviZ is a non-profit project under NumFOCUS umbrella. If you want to support ArviZ financially, you can donate here.

Sponsors

NumFOCUS

Comments
  • Use xarray throughout

    Use xarray throughout

    There have been proposals to use xarray as a common language for pymc3, pystan, and pymc4. This library might be a good place to start that by implementing utility functions for translating pymc3's Multitrace and pystan's OrderedDict into xarray objects, and then having all plotting functions work with xarrays.

  • Model comparison issue:

    Model comparison issue: "Found several log likelihood arrays var_name cannot be None"

    Hi community, I have two models that I would like to compare using LOO and WAIC.

    Model 1:

    with pm.Model() as Model_1:
        α_v = pm.HalfCauchy('α_v',2)
        α_a = pm.HalfCauchy('α_a', 2) 
        α_max_ampl = pm.HalfCauchy('α_max_ampl',2)  
        α_tau = pm.HalfCauchy('α_tau', 2)     
        α_ter = pm.HalfCauchy('α_ter', 2)  
        
        β = pm.Normal('β', 0.334, 0.1)
    
        γ_v = pm.Normal('γ_v', params_group[-1,0], .1)
        γ_a = pm.Normal('γ_a', params_group[-1,1], .1) 
        γ_ter = pm.Normal('γ_ter', params_group[-1, 2], .1)     
        γ_max_ampl = pm.Normal('γ_max_ampl', params_group[-1,3], .1)
        γ_tau = pm.Normal('γ_tau', params_group[-1,4], .1)     
    
        μ_v = -α_v*pm.math.exp(-β*Age_years_group) + γ_v
        μ_a = α_a*pm.math.exp(-β*Age_years_group) + γ_a      
        μ_max_ampl = α_max_ampl*pm.math.exp(-β*Age_years_group) + γ_max_ampl
        μ_tau = α_tau*pm.math.exp(-β*Age_years_group) + γ_tau 
        μ_ter = α_ter*pm.math.exp(-β*Age_years_group) + γ_ter     
        
        σ_v = pm.HalfNormal('σ_v', .1)
        σ_a = pm.HalfNormal('σ_a', .1)
        σ_max_ampl = pm.HalfNormal('σ_max_ampl', .1)
        σ_tau = pm.HalfNormal('σ_tau', .1)      
        σ_ter = pm.HalfNormal('σ_ter', .1)     
        
        y_v = pm.Normal('y_v', μ_v, σ_v, observed=params_group[:, 0])    
        y_a = pm.Normal('y_a', μ_a, σ_a, observed=params_group[:, 1])     
        y_ter = pm.Normal('y_ter', μ_ter, σ_ter, observed=params_group[:, 2]) 
        y_max_ampl = pm.Normal('y_max_ampl', μ_max_ampl, σ_max_ampl, observed=params_group[:, 3])     
        y_tau = pm.Normal('y_tau', μ_tau, σ_tau, observed=params_group[:, 4])     
    
    

    Model_2:

    with pm.Model() as Model_2:
        α_v = pm.HalfCauchy('α_v',2)
        α_a = pm.HalfCauchy('α_a', 2) 
        α_max_ampl = pm.HalfCauchy('α_max_ampl',2)  
        α_tau = pm.HalfCauchy('α_tau', 2)     
        α_ter = pm.HalfCauchy('α_ter', 2)  
        
        β_v = pm.Normal('β_v', 0.334, 0.1)#Kail (1991)
        β_a = pm.Normal('β_a', 0.334, 0.1)#Kail (1991)
        β_max_ampl = pm.Normal('β_max_ampl', 0.334, 0.1)#Kail (1991)
        β_tau = pm.Normal('β_tau', 0.334, 0.1)#Kail (1991)
        β_ter = pm.Normal('β_ter', 0.334, 0.1)#Kail (1991)
          
        #asymptote (get the mean value of adults)
        γ_v = pm.Normal('γ_v', params_group[-1,0], .1)
        γ_a = pm.Normal('γ_a', params_group[-1,1], .1) 
        γ_ter = pm.Normal('γ_ter', params_group[-1, 2], .1)     
        γ_max_ampl = pm.Normal('γ_max_ampl', params_group[-1,3], .1)
        γ_tau = pm.Normal('γ_tau', params_group[-1,4], .1)     
    
        μ_v = -α_v*pm.math.exp(-β_v*Age_years_group) + γ_v
        μ_a = α_a*pm.math.exp(-β_a*Age_years_group) + γ_a      
        μ_max_ampl = α_max_ampl*pm.math.exp(-β_max_ampl*Age_years_group) + γ_max_ampl
        μ_tau = α_tau*pm.math.exp(-β_tau*Age_years_group) + γ_tau 
        μ_ter = α_ter*pm.math.exp(-β_ter*Age_years_group) + γ_ter     
        
        σ_v = pm.HalfNormal('σ_v', .1)
        σ_a = pm.HalfNormal('σ_a', .1)
        σ_max_ampl = pm.HalfNormal('σ_max_ampl', .1)
        σ_tau = pm.HalfNormal('σ_tau', .1)      
        σ_ter = pm.HalfNormal('σ_ter', .1)     
        
        y_v = pm.Normal('y_v', μ_v, σ_v, observed=params_group[:, 0])    
        y_a = pm.Normal('y_a', μ_a, σ_a, observed=params_group[:, 1])     
        y_ter = pm.Normal('y_ter', μ_ter, σ_ter, observed=params_group[:, 2]) 
        y_max_ampl = pm.Normal('y_max_ampl', μ_max_ampl, σ_max_ampl, observed=params_group[:, 3])     
        y_tau = pm.Normal('y_tau', μ_tau, σ_tau, observed=params_group[:, 4])           
    

    I thus tried to compute:

    compare_dict = {Model_1: trace, Model_2: trace_2}
    compare_LOO = az.compare(compare_dict) 
    
    

    and got the following error:

    Traceback (most recent call last):
    
      File "<ipython-input-35-26e3f30f8929>", line 1, in <module>
        compare_LOO = az.compare(compare_dict)
    
      File "C:\Users\mservant\Anaconda3\lib\site-packages\arviz\stats\stats.py", line 211, in compare
        ics = ics.append([ic_func(dataset, pointwise=True, scale=scale)])
    
      File "C:\Users\mservant\Anaconda3\lib\site-packages\arviz\stats\stats.py", line 646, in loo
        log_likelihood = _get_log_likelihood(inference_data, var_name=var_name)
    
      File "C:\Users\mservant\Anaconda3\lib\site-packages\arviz\stats\stats_utils.py", line 412, in get_log_likelihood
        "Found several log likelihood arrays {}, var_name cannot be None".format(var_names)
    
    TypeError: Found several log likelihood arrays ['y_v', 'y_a', 'y_ter', 'y_max_ampl', 'y_tau'], var_name cannot be None
    

    Could you please help me solve this issue ? I am confused because I could compute LOO and WAIC using a previous version of arviz.

  • Improved tooltip information for bokeh pair_plot

    Improved tooltip information for bokeh pair_plot

    Description

    Checklist

    • [x] Follows official PR format
    • [ ] Includes a sample plot to visually illustrate the changes (only for plot-related functions)
    • [ ] New features are properly documented (with an example if appropriate)?
    • [ ] Includes new or updated tests to cover the new feature
    • [ ] Code style correct (follows pylint and black guidelines)
    • [ ] Changes are listed in changelog

    Added improved tooltip parameters to bokeh backend files of plots To fix #952

  • Initial prototype of plot_lm

    Initial prototype of plot_lm

    Description

    This PR and #1747 summarizes my work done in Google Summer of Code 2021. I added 2 plots, plot_lm and plot_ts. The first one is covered in this PR and the next one in #1747. The project workflow was like this:

    1. Started with a high-level design: Discussed the user API function with the mentors and other community members. Discussion includes input variable names, accepted datatypes, accepted values and other input details. Also showed a sample output visualization. Once it was approved, I moved on to opening this PR. This Github Gist shows the design and some discussions related to it. The design process took an initial week.

    2. Submitted a prototype: This PR was earlier a prototype. This step is basically implementing the design decision made in the previous step. In addition to the user API, backend functions are also added. As ArviZ uses 2 backends, artist functions for both of them were added. This step took another week.

    3. Improved it according to reviews: In this step, mentors review your code to provide feedback and improvement tips. Learned the best code practices in this step. Ideally, improving never ends. It is important to maintain the added code after it is merged to keep it bug-free. I aim to provide support after GSoC. This step was a bit lengthy and complex and thus took 2 weeks.

    4. Tested the functionality: Added unit tests using pytest. Aimed and achieved to cover all of the added functionality under tests. Similar to step 2, added tests for both of the backends. Also, solved some simple to complex bugs that arise while testing. This step took another week.

    5. Added examples: Examples are added to the docstring as well as to the examples folder. Please check out the files changed tab to know more about this step. This step was quick, took only half a week.

    6. Added documentation: If you want to know how to use plot_lm, checkout out this blog. However, if you want to go on a low level to know the working in detail, I would suggest taking a look over the docstring in ArviZ docs and probably follow the comments sequentially. Another week was consumed in this step.

    Output : image

    Checklist

    • [x] Follows official PR format
    • [x] Includes a sample plot to visually illustrate the changes (only for plot-related functions)
    • [x] New features are properly documented (with an example if appropriate)?
    • [x] Includes new or updated tests to cover the new feature
    • [x] Code style correct (follows pylint and black guidelines)
    • [ ] Changes are listed in changelog
  • [WIP] Start adding Dask support

    [WIP] Start adding Dask support

    Description

    Start adding Dask compatibility starting from diagnostics/stats ufuncs.

    Checklist

    • [x] Follows official PR format
    • [x] New features are properly documented (with an example if appropriate)?
    • [ ] Includes new or updated tests to cover the new feature
    • [ ] Code style correct (follows pylint and black guidelines)
    • [ ] Changes are listed in changelog
  • Explain sample_stats naming convention

    Explain sample_stats naming convention

    Description

    fixes #1053

    Checklist

    • [x] Does the PR follow official PR format?
    • [x] Is the code style correct (follows pylint and black guidelines)?
    • [x] Is the change listed in changelog?
  • Update doc-Refitting NumPyro models with ArviZ (and xarray)

    Update doc-Refitting NumPyro models with ArviZ (and xarray)

    Description

    fixes #1801

    Checklist

    • [x] Follows official PR format
    • [ ] Includes a sample plot to visually illustrate the changes (only for plot-related functions)
    • [x] New features are properly documented (with an example if appropriate)?
    • [ ] Includes new or updated tests to cover the new feature
    • [x] Code style correct (follows pylint and black guidelines)
    • [ ] Changes are listed in changelog

    Changes Done: Replaced az.reloo with {func}~arviz.reloo Edited some grammatical mistakes Replaced PyStanSamplingWrapper terms with NumPyroSamplingWrapper Replaced "Stan Code" with "NumPyro Code" Replaced pystan_wrapper to numpyro_wrapper

  • Skip tests for optional/extra dependencies when not installed

    Skip tests for optional/extra dependencies when not installed

    Description

    When running all tests locally ImportErrors are thrown if optional/extra dependencies are not installed. By skipping these tests when not installed, users can still run the entire test suite if desired. The entire test suite should still be run via CI.

    This behavior is largely achieved by changing

    import xx
    

    to

    xx = pytest.importorskip(xx)
    

    which will try to import module xx, and will skip all tests in file if the module is not installed.

    Individual tests may also be skipped using the decorator:

    @pytest.mark.skipif(importlib.util.find_spec("xx") is None, reason="xx tests only required for CI")
    
    

    The last remaining issue is numba, which is used in 20+ tests, used in a variety of files. One option is to make numba an entry in requirements-dev.txt instead of requirements-optional.txt, or add the skip decorators to each of the 20+ tests.

    This PR addresses Issue #1112 .

    Checklist

    • [X] Follows official PR format
    • [X] New features are properly documented (with an example if appropriate)?
    • [X] Code style correct (follows pylint and black guidelines)
    • [X] Changes are listed in changelog
  • Add more data to InferenceData objects [GSOC]

    Add more data to InferenceData objects [GSOC]

    ~For now should solve only half of #220, the part related to constant_data.~ I would also like to support multiple log_likelihoods somehow (related to #771).

    EDIT: constant data support was added in #814


    EDIT

    I update the description of what this PR does and what will still be pending.

    Does

    • Supports storage of multiple observed variables each with its own log likelihood data
      • dict
      • emcee
      • pymc
      • pystan
    • Updates loo, waic... to get the log likelihood data from the new group. To do so, a function get_log_likelihood has been added in stats_utils. From here on, log likelihood values should be accessed using this helper function (it could even become public or a method of inference data, we have to think about this)

    Note: lp variable is left in sample_stats group, there should be no change involving this.

    Pending

    • support storage of multiple observed values
      • cmdstan
      • cmdstanpy
      • numpyro
      • pyro
    • update ic functions to use the data available
      • add a var_name/obs_name to waic, loo, compare...
      • take into account diferent cases such as an ic value per observation, considering all observations a single variable...
    • add new group to inference data schema
  • [DOC][WIP] Added user guide for dask support inside ArviZ

    [DOC][WIP] Added user guide for dask support inside ArviZ

    This pull request adds a user guide for showing current dask capabilities in ArviZ. Closes #1631

    Link to the notebook

    Edits to be made:

    • [x] Note on out of memory computation
    • [x] Provide a link to best practices doc.
    • [x] Section on how to get dask backed InferenceData objects.
    • [x] For both non-dask and dask enabled method, initialize the data at the start instead of loading it everytime.
    • [x] Small note on Client and ResourceProfiler
    • [x] High level explanation on az.Dask and link to it's api docs
    • [x] Check if dask="allowed" works or not
  • loading cmdstanpy csv output

    loading cmdstanpy csv output

    Hello, I am trying to load a pretty big (1.1GB) cmdstanpy csv output file using arviz with: data = az.from_cmdstan(posterior="path-to-csv") or with data = az.from_cmdstan(posterior=["path-to-csv", "parameter-name"]) both ways take more than an hour and use many GB of memory (at this point even 20gb is not enough and python process crushes). Is there a more efficient way to load the sampling output data?

    This issue was also discussed here - https://discourse.mc-stan.org/t/loading-cmstanpy-output-in-python/17828

    arviz - the current master version (0.9.0) cmdstanpy - 0.9.63 cmdstan - 2.24.1 python - 3.7.0 centOS7

  • Swapped the netCDF4 dependency to h5netcdf

    Swapped the netCDF4 dependency to h5netcdf

    Description

    Addresses https://github.com/arviz-devs/arviz/pull/2029 and https://github.com/arviz-devs/arviz/issues/2028 by swapping out the netCDF4 dependency to h5netcdf, which should be a lighter dependency, since it avoids needing the netCDF4 C binaries.

    Checklist

    • [x] Follows official PR format
    • [x] Includes new or updated tests to cover the new feature
    • [x] Code style correct (follows pylint and black guidelines)
    • [x] Changes are listed in changelog

    :books: Documentation preview :books:: https://arviz--2122.org.readthedocs.build/en/2122/

  • Compare raises InvalidIndexError: slice(None, None, None) when provided with dictionary of inference data objects

    Compare raises InvalidIndexError: slice(None, None, None) when provided with dictionary of inference data objects

    Describe the bug Weird issue with fresh installation. Using cmdstanpy. I created two mcmc objects from cmdstanpy. Both stan codes used default names for log likelihood. I put both of them into a dictionary, converting them to inference data object. Both are proper objects when displayed everything is ok. Individual loo and waic using az.loo and az.waic also work. When submitted to az.compare exception is raised.

    To Reproduce Two cmdstanpy sampling objects are needed

    import arviz as az
    comp_dict = {'Fractional': az.from_cmdstanpy(result2), 'Integer order': az.from_cmdstanpy(result3)}
    

    Both work and contain log_lik object.

    az.compare(comp_dict)
    

    returns:

    { "name": "InvalidIndexError", "message": "slice(None, None, None)", "stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:3800\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3799\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 3800\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_engine\u001b[39m.\u001b[39;49mget_loc(casted_key)\n\u001b[1;32m 3801\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mKeyError\u001b[39;00m \u001b[39mas\u001b[39;00m err:\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/_libs/index.pyx:138\u001b[0m, in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/libs/index.pyx:144\u001b[0m, in \u001b[0;36mpandas.libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n\n\u001b[0;31mTypeError\u001b[0m: 'slice(None, None, None)' is an invalid key\n\nDuring handling of the above exception, another exception occurred:\n\n\u001b[0;31mInvalidIndexError\u001b[0m Traceback (most recent call last)\nCell \u001b[0;32mIn [45], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maz\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcompare\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcomp_dict\u001b[49m\u001b[43m)\u001b[49m\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/arviz/stats/stats.py:306\u001b[0m, in \u001b[0;36mcompare\u001b[0;34m(compare_dict, ic, method, b_samples, alpha, seed, scale, var_name)\u001b[0m\n\u001b[1;32m 304\u001b[0m std_err \u001b[39m=\u001b[39m ses\u001b[39m.\u001b[39mloc[val]\n\u001b[1;32m 305\u001b[0m weight \u001b[39m=\u001b[39m weights[idx]\n\u001b[0;32m--> 306\u001b[0m df_comp\u001b[39m.\u001b[39;49mat[val] \u001b[39m=\u001b[39m (\n\u001b[1;32m 307\u001b[0m idx,\n\u001b[1;32m 308\u001b[0m res[ic],\n\u001b[1;32m 309\u001b[0m res[p_ic],\n\u001b[1;32m 310\u001b[0m d_ic,\n\u001b[1;32m 311\u001b[0m weight,\n\u001b[1;32m 312\u001b[0m std_err,\n\u001b[1;32m 313\u001b[0m d_std_err,\n\u001b[1;32m 314\u001b[0m res[\u001b[39m"\u001b[39m\u001b[39mwarning\u001b[39m\u001b[39m"\u001b[39m],\n\u001b[1;32m 315\u001b[0m res[scale_col],\n\u001b[1;32m 316\u001b[0m )\n\u001b[1;32m 318\u001b[0m df_comp[\u001b[39m"\u001b[39m\u001b[39mrank\u001b[39m\u001b[39m"\u001b[39m] \u001b[39m=\u001b[39m df_comp[\u001b[39m"\u001b[39m\u001b[39mrank\u001b[39m\u001b[39m"\u001b[39m]\u001b[39m.\u001b[39mastype(\u001b[39mint\u001b[39m)\n\u001b[1;32m 319\u001b[0m df_comp[\u001b[39m"\u001b[39m\u001b[39mwarning\u001b[39m\u001b[39m"\u001b[39m] \u001b[39m=\u001b[39m df_comp[\u001b[39m"\u001b[39m\u001b[39mwarning\u001b[39m\u001b[39m"\u001b[39m]\u001b[39m.\u001b[39mastype(\u001b[39mbool\u001b[39m)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexing.py:2438\u001b[0m, in \u001b[0;36m_AtIndexer.setitem\u001b[0;34m(self, key, value)\u001b[0m\n\u001b[1;32m 2435\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mobj\u001b[39m.\u001b[39mloc[key] \u001b[39m=\u001b[39m value\n\u001b[1;32m 2436\u001b[0m \u001b[39mreturn\u001b[39;00m\n\u001b[0;32m-> 2438\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39msuper\u001b[39;49m()\u001b[39m.\u001b[39;49m\u001b[39m__setitem\u001b[39;49m(key, value)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexing.py:2393\u001b[0m, in \u001b[0;36m_ScalarAccessIndexer.setitem\u001b[0;34m(self, key, value)\u001b[0m\n\u001b[1;32m 2390\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(key) \u001b[39m!=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mndim:\n\u001b[1;32m 2391\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m"\u001b[39m\u001b[39mNot enough indexers for scalar access (setting)!\u001b[39m\u001b[39m"\u001b[39m)\n\u001b[0;32m-> 2393\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mobj\u001b[39m.\u001b[39;49m_set_value(\u001b[39m*\u001b[39;49mkey, value\u001b[39m=\u001b[39;49mvalue, takeable\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_takeable)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/frame.py:4208\u001b[0m, in \u001b[0;36mDataFrame._set_value\u001b[0;34m(self, index, col, value, takeable)\u001b[0m\n\u001b[1;32m 4206\u001b[0m iindex \u001b[39m=\u001b[39m cast(\u001b[39mint\u001b[39m, index)\n\u001b[1;32m 4207\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m-> 4208\u001b[0m icol \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mcolumns\u001b[39m.\u001b[39;49mget_loc(col)\n\u001b[1;32m 4209\u001b[0m iindex \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mindex\u001b[39m.\u001b[39mget_loc(index)\n\u001b[1;32m 4210\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_mgr\u001b[39m.\u001b[39mcolumn_setitem(icol, iindex, value)\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:3807\u001b[0m, in \u001b[0;36mIndex.get_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m 3802\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mKeyError\u001b[39;00m(key) \u001b[39mfrom\u001b[39;00m \u001b[39merr\u001b[39;00m\n\u001b[1;32m 3803\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mTypeError\u001b[39;00m:\n\u001b[1;32m 3804\u001b[0m \u001b[39m# If we have a listlike key, _check_indexing_error will raise\u001b[39;00m\n\u001b[1;32m 3805\u001b[0m \u001b[39m# InvalidIndexError. Otherwise we fall through and re-raise\u001b[39;00m\n\u001b[1;32m 3806\u001b[0m \u001b[39m# the TypeError.\u001b[39;00m\n\u001b[0;32m-> 3807\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_check_indexing_error(key)\n\u001b[1;32m 3808\u001b[0m \u001b[39mraise\u001b[39;00m\n\u001b[1;32m 3810\u001b[0m \u001b[39m# GH#42269\u001b[39;00m\n\nFile \u001b[0;32m~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:5963\u001b[0m, in \u001b[0;36mIndex._check_indexing_error\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 5959\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_check_indexing_error\u001b[39m(\u001b[39mself\u001b[39m, key):\n\u001b[1;32m 5960\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m is_scalar(key):\n\u001b[1;32m 5961\u001b[0m \u001b[39m# if key is not a scalar, directly raise an error (the code below\u001b[39;00m\n\u001b[1;32m 5962\u001b[0m \u001b[39m# would convert to numpy arrays and raise later any way) - GH29926\u001b[39;00m\n\u001b[0;32m-> 5963\u001b[0m \u001b[39mraise\u001b[39;00m InvalidIndexError(key)\n\n\u001b[0;31mInvalidIndexError\u001b[0m: slice(None, None, None)" }


    TypeError Traceback (most recent call last) File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/core/indexes/base.py:3800, in Index.get_loc(self, key, method, tolerance) 3799 try: -> 3800 return self._engine.get_loc(casted_key) 3801 except KeyError as err:

    File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

    File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/pandas/_libs/index.pyx:144, in pandas._libs.index.IndexEngine.get_loc()

    TypeError: 'slice(None, None, None)' is an invalid key

    During handling of the above exception, another exception occurred:

    InvalidIndexError Traceback (most recent call last) Cell In [45], line 1 ----> 1 az.compare(comp_dict)

    File ~/opt/anaconda3/envs/stan_sep2022/lib/python3.10/site-packages/arviz/stats/stats.py:306, in compare(compare_dict, ic, method, b_samples, alpha, seed, scale, var_name) 304 std_err = ses.loc[val] 305 weight = weights[idx] --> 306 df_comp.at[val] = ( 307 idx, 308 res[ic], ... 5961 # if key is not a scalar, directly raise an error (the code below 5962 # would convert to numpy arrays and raise later any way) - GH29926 -> 5963 raise InvalidIndexError(key)

    InvalidIndexError: slice(None, None, None)

    Expected behavior I'd expect a dataframe with loo comparison, especially that individually values can be computed. Additional context Arviz version:0.12.1 CmdStanPy version:1.0.7 Python version:3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:54) [Clang 13.0.1 ]

    M1 Mac Mini macOS Monterey 12.5.1

  • Change homepage layout

    Change homepage layout

    PR for #2098 (update homepage and layout)

    The index.rst file divides the content into the following sections:

    • Section: h1 Overview (note that I had to section it this way in order to use Sphinx's grid feature and get the CSS flexbox working)
      • section: h2 ArviZ
      • section: h2 Example Gallery
    • Section: h1 Key Features - This entire section is a draft. I tossed all the sentences from the existing homepage into this section. (@OriolAbril - feel free to edit this part)
    • Section: h1 Support ArviZ
      • div: h3 Contributions
      • div: h3 Citation
      • div: h3 Sponsors
      • div: h3 Donate

    LMK what other changes are needed or if the CSS is too messy...(I could try to comment better or reorganize it) 😅


    :books: Documentation preview :books:: https://arviz--2119.org.readthedocs.build/en/2119/

  • plot_dist_comparison with kind='observed' generates one plot for each observation point

    plot_dist_comparison with kind='observed' generates one plot for each observation point

    Describe the bug When I call plot_dist_comparison with kind="observed" it tries to make four plots for each data point in my model.

    To Reproduce

    from scipy.stats import gamma
    import pymc as pm
    import aesara.tensor as at
    import arviz as az
    
    obs = gamma.rvs(a=10, loc=1000, size=1000)
    
    
    with pm.Model() as sample:
      # Priors
      α = pm.Normal('α', 8, 7.6)
      shape = pm.Uniform('shape', 0, 100)
      
      # Log-link function
      μ = at.exp(α)
      
      # Likelihood
      pm.Gamma('severity', alpha=shape, beta=shape / μ, observed=obs)
      
      inference_data = pm.sample_prior_predictive()
      inference_data.extend(pm.sample())
      inference_data.extend(pm.sample_posterior_predictive(inference_data))
      
    az.plot_dist_comparison(inference_data, kind='observed')
    

    Expected behavior I would expect a single plot for each of "observed_data", "prior_predictive", "posterior_predictive" and the combined plot, insteada of one for each point of data.

    Additional context arviz version: 0.12.1 pymc version: 4.1.7 running on azure databricks

  • [Feature Request] Allow `transform` to support a dictionary

    [Feature Request] Allow `transform` to support a dictionary

    First off, thanks for the great project! I love it!

    I often use pymc to fit GLMs with a log-link function. This means that I frequently am interested in transforming some of the variables, but not others, by applying np.exp in the plots. In order to do this I usually would then need to plot separately the log-transformed variables and those that are not. I think it would be great if transform supported a dictionary syntax, for example the following:

    az.plot_forest(inference_data, transform={'α': np.exp})
    

    Now I would do something like:

    az.plot_forest(inference_data, var_names='α', transform=np.exp)
    # ...
    az.plot_forest(inference_data, var_names=['γ', 'σ'])
    

    Thanks!

    If there is interest in this feature I'm willing to try to implement this, but may take me a while, time permitting...

  • nested-Rhat

    nested-Rhat

    Add nested-Rhat as described in

    • Charles C. Margossian, Matthew D. Hoffman, Pavel Sountsov, Lionel Riou-Durand, Aki Vehtari, Andrew Gelman (2022). Nested Rˆ: Assessing the convergence of Markov chain Monte Carlo when running many short chains. https://arxiv.org/abs/2110.13017

    This needs additional superchain index (chains are grouped to superchains). It would be probably easiest to have an additional argument which would be an array with as many elements as the number of chains and the values would tell the superchain id for each chain.

A python library for Bayesian time series modeling
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sep 10, 2022
Bayesian optimization in JAX

Bayesian optimization in JAX

May 11, 2022
Combines Bayesian analyses from many datasets.
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Feb 13, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization
 Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

May 2, 2022
Case studies with Bayesian methods
Case studies with Bayesian methods

Case studies with Bayesian methods

Jan 4, 2022
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

Aug 24, 2022
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

Mar 31, 2022
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo
A collection of interactive machine-learning experiments: 🏋️models training + 🎨models demo

?? Interactive Machine Learning experiments: ??️models training + ??models demo

Sep 21, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sep 21, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

Sep 16, 2022
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Aug 24, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Sep 21, 2022
Python package for stacking (machine learning technique)
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Sep 12, 2022
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

Sep 20, 2022
A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

Sep 22, 2022
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

Sep 23, 2022
MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

Sep 21, 2022
Python package for machine learning for healthcare using a OMOP common data model

This library was developed in order to facilitate rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database.

Aug 21, 2022
UpliftML: A Python Package for Scalable Uplift Modeling
UpliftML: A Python Package for Scalable Uplift Modeling

UpliftML is a Python package for scalable unconstrained and constrained uplift modeling from experimental data. To accommodate working with big data, the package uses PySpark and H2O models as base learners for the uplift models. Evaluation functions expect a PySpark dataframe as input.

Sep 7, 2022