Open Data Cube analyses continental scale Earth Observation data through time

Open Data Cube Core

Build Status Coverage Status Documentation Status

Overview

The Open Data Cube Core provides an integrated gridded data analysis environment for decades of analysis ready earth observation satellite and related data from multiple satellite and other acquisition systems.

Documentation

See the user guide for installation and usage of the datacube, and for documentation of the API.

Join our Slack if you need help setting up or using the Open Data Cube.

Please help us to keep the Open Data Cube community open and inclusive by reading and following our Code of Conduct.

Requirements

System

  • PostgreSQL 10+
  • Python 3.8+

Developer setup

  1. Clone:
    • git clone https://github.com/opendatacube/datacube-core.git
  2. Create a Python environment for using the ODC. We recommend conda as the easiest way to handle Python dependencies.
conda create -n odc -c conda-forge python=3.8 datacube pre_commit
conda activate odc
  1. Install a develop version of datacube-core.
cd datacube-core
pip install --upgrade -e .
  1. Install the pre-commit hooks to help follow ODC coding conventions when committing with git.
pre-commit install
  1. Run unit tests + PyLint ./check-code.sh

    (this script approximates what is run by Travis. You can alternatively run pytest yourself). Some test dependencies may need to be installed, attempt to install these using:

    pip install --upgrade -e '.[test]'

    If install for these fails please lodge them as issues.

  2. (or) Run all tests, including integration tests.

    ./check-code.sh integration_tests

    • Assumes a password-less Postgres database running on localhost called

    agdcintegration

    • Otherwise copy integration_tests/agdcintegration.conf to ~/.datacube_integration.conf and edit to customise.

Alternatively one can use the opendatacube/datacube-tests docker image to run tests. This docker includes database server pre-configured for running integration tests. Add --with-docker command line option as a first argument to ./check-code.sh script.

./check-code.sh --with-docker integration_tests

Developer setup on Ubuntu

Building a Python virtual environment on Ubuntu suitable for development work.

Install dependencies:

sudo apt-get update
sudo apt-get install -y \
  autoconf automake build-essential make cmake \
  graphviz \
  python3-venv \
  python3-dev \
  libpq-dev \
  libyaml-dev \
  libnetcdf-dev \
  libudunits2-dev

Build the python virtual environment:

pyenv="${HOME}/.envs/odc"  # Change to suit your needs
mkdir -p "${pyenv}"
python3 -m venv "${pyenv}"
source "${pyenv}/bin/activate"
pip install -U pip wheel cython numpy
pip install -e '.[dev]'
pip install flake8 mypy pylint autoflake black
Comments
  • [proposal] Add support for 3D datasets

    [proposal] Add support for 3D datasets

    There are soil and weather datasets that use a third height/Z dimension for storing data. It would be nice to be able to have ODC optionally support datasets with this dimension.

    Is there interest in adding this behavior to ODC?

    @alfredoahds

  • Dockerfile and third-party libs

    Dockerfile and third-party libs

    There has been some discussion on Slack related to Dockerfile and various choices that were made with respect to choice of pre-compiled third-party libs. Since Slack message threads disappear let's continue this discussion here.

    Some facts about current system in no particular order

    • Docker is based of ubuntu:18.04
    • Docker uses ppa:nextgis/ppa to get more recent libgdal
    • Docker builds python version of gdal
    • Docker installs rasterio in binary mode, so rasterio ships it's own version of libgdal
    • rasterio also ships it's own version of libcurl, compiled on Redhat derivative, hence symlink workaround for dealing with ca-certificates location
    • Tests use requierments-test.txt to pin dependencies for 3-d party libs, this is to minimize false positive rate where error is in environment setup and not in our code
    • Tests run on travis that uses 16.04 version of ubuntu and also use nectgis/ppa to get workable environment.

    @woodcockr reported on Slack that he has issues installing shapely in the default docker environment.

  • Amazon S3 support

    Amazon S3 support

    When running Open Data Cube in the cloud, I would like to have datasets in Amazon S3 buckets without having to store them in my EC2 instance. I have seen that in datacube-core release 1.5.2 the new features

    • Support for AWS S3 array storage
    • Driver Manager support for NetCDF, S3, S3-file drivers

    were added. I have read the few documentation there is on these features but I am confused. Is there any documentation on what these features are capable of or any example on how to use them?

  • Datacube.load performance for multi band netCDF data

    Datacube.load performance for multi band netCDF data

    Expected behaviour

    Something comparable to xarray.open_dataset('file_to_load.nc')

    Actual behaviour

    On the same infrastructure current datacube.load(...) which would load the same dataset/file is significantly slower. xarray load time = ~8 ms, datacube load = ~28m

    Simple comparison

    image

    Steps to reproduce the behaviour

    ... Include code, command line parameters as appropriate ...

    Environment information

    • Which datacube --version are you using? Open Data Cube core, version 1.7

    • What datacube deployment/enviornment are you running against? CSIRO (@woodcockr) Internal depolyment

    netCDF metadata

    gdalinfo (output is truncated as there are 366 bands)

    !gdalinfo /data/qtot/qtot_avg_1912.nc
    Warning 1: No UNIDATA NC_GLOBAL:Conventions attribute
    Driver: netCDF/Network Common Data Format
    Files: /data/qtot/qtot_avg_1912.nc
    Size is 841, 681
    Coordinate System is `'
    Origin = (111.974999999999994,-9.975000000000000)
    Pixel Size = (0.050000000000000,-0.050000000000000)
    Metadata:
      latitude#long_name=latitude
      latitude#name=latitude
      latitude#standard_name=latitude
      latitude#units=degrees_north
      longitude#long_name=longitude
      longitude#name=longitude
      longitude#standard_name=longitude
      longitude#units=degrees_east
      NC_GLOBAL#var_name=qtot_avg
      NETCDF_DIM_EXTRA={time}
      NETCDF_DIM_time_DEF={366,4}
      NETCDF_DIM_time_VALUES={4382,4383,4384,4385,4386,4387,4388,4389,4390,4391,4392,4393,4394,4395,4396,4397,4398,4399,4400,4401,4402,4403,4404,4405,4406,4407,4408,4409,4410,4411,4412,4413,4414,4415,4416,4417,4418,4419,4420,4421,4422,4423,4424,4425,4426,4427,4428,4429,4430,4431,4432,4433,4434,4435,4436,4437,4438,4439,4440,4441,4442,4443,4444,4445,4446,4447,4448,4449,4450,4451,4452,4453,4454,4455,4456,4457,4458,4459,4460,4461,4462,4463,4464,4465,4466,4467,4468,4469,4470,4471,4472,4473,4474,4475,4476,4477,4478,4479,4480,4481,4482,4483,4484,4485,4486,4487,4488,4489,4490,4491,4492,4493,4494,4495,4496,4497,4498,4499,4500,4501,4502,4503,4504,4505,4506,4507,4508,4509,4510,4511,4512,4513,4514,4515,4516,4517,4518,4519,4520,4521,4522,4523,4524,4525,4526,4527,4528,4529,4530,4531,4532,4533,4534,4535,4536,4537,4538,4539,4540,4541,4542,4543,4544,4545,4546,4547,4548,4549,4550,4551,4552,4553,4554,4555,4556,4557,4558,4559,4560,4561,4562,4563,4564,4565,4566,4567,4568,4569,4570,4571,4572,4573,4574,4575,4576,4577,4578,4579,4580,4581,4582,4583,4584,4585,4586,4587,4588,4589,4590,4591,4592,4593,4594,4595,4596,4597,4598,4599,4600,4601,4602,4603,4604,4605,4606,4607,4608,4609,4610,4611,4612,4613,4614,4615,4616,4617,4618,4619,4620,4621,4622,4623,4624,4625,4626,4627,4628,4629,4630,4631,4632,4633,4634,4635,4636,4637,4638,4639,4640,4641,4642,4643,4644,4645,4646,4647,4648,4649,4650,4651,4652,4653,4654,4655,4656,4657,4658,4659,4660,4661,4662,4663,4664,4665,4666,4667,4668,4669,4670,4671,4672,4673,4674,4675,4676,4677,4678,4679,4680,4681,4682,4683,4684,4685,4686,4687,4688,4689,4690,4691,4692,4693,4694,4695,4696,4697,4698,4699,4700,4701,4702,4703,4704,4705,4706,4707,4708,4709,4710,4711,4712,4713,4714,4715,4716,4717,4718,4719,4720,4721,4722,4723,4724,4725,4726,4727,4728,4729,4730,4731,4732,4733,4734,4735,4736,4737,4738,4739,4740,4741,4742,4743,4744,4745,4746,4747}
      qtot_avg#long_name=Total runoff: averaged across both HRUs (mm)
      qtot_avg#name=qtot_avg
      qtot_avg#standard_name=qtot_avg
      qtot_avg#units=mm
      qtot_avg#_FillValue=-999
      time#calendar=gregorian
      time#long_name=time
      time#name=time
      time#standard_name=time
      time#units=days since 1900-01-01
    Corner Coordinates:
    Upper Left  ( 111.9750000,  -9.9750000) 
    Lower Left  ( 111.9750000, -44.0250000) 
    Upper Right ( 154.0250000,  -9.9750000) 
    Lower Right ( 154.0250000, -44.0250000) 
    Center      ( 133.0000000, -27.0000000) 
    Band 1 Block=50x1 Type=Float32, ColorInterp=Undefined
      NoData Value=-999
      Unit Type: mm
      Metadata:
        long_name=Total runoff: averaged across both HRUs (mm)
        name=qtot_avg
        NETCDF_DIM_time=4382
        NETCDF_VARNAME=qtot_avg
        standard_name=qtot_avg
        units=mm
        _FillValue=-999
    

    ncdump -h

    netcdf qtot_avg_1912 {
    dimensions:
    	time = UNLIMITED ; // (366 currently)
    	latitude = 681 ;
    	longitude = 841 ;
    variables:
    	int time(time) ;
    		time:name = "time" ;
    		time:long_name = "time" ;
    		time:calendar = "gregorian" ;
    		time:units = "days since 1900-01-01" ;
    		time:standard_name = "time" ;
    	double latitude(latitude) ;
    		latitude:name = "latitude" ;
    		latitude:long_name = "latitude" ;
    		latitude:units = "degrees_north" ;
    		latitude:standard_name = "latitude" ;
    	double longitude(longitude) ;
    		longitude:name = "longitude" ;
    		longitude:long_name = "longitude" ;
    		longitude:units = "degrees_east" ;
    		longitude:standard_name = "longitude" ;
    	float qtot_avg(time, latitude, longitude) ;
    		qtot_avg:_FillValue = -999.f ;
    		qtot_avg:name = "qtot_avg" ;
    		qtot_avg:long_name = "Total runoff: averaged across both HRUs (mm)" ;
    		qtot_avg:units = "mm" ;
    		qtot_avg:standard_name = "qtot_avg" ;
    
    // global attributes:
    		:var_name = "qtot_avg" ;
    }
    
  • ALOS-2 yaml

    ALOS-2 yaml

    Writing to see if you might be able to help us with a VNDC related issue please.

    RESTEC are having some issues around the choice of data types in the following files. The following are the files used by Vietnam to ingest their ALOS-2 data.

    https://github.com/vndatacube/odc-config-files/blob/master/alos/alos2_tile_productdef.yaml

    https://github.com/vndatacube/odc-config-files/blob/master/alos/alos2_tile_wgs84_50m.yaml

    Okumura-san has confirmed that the data types used here match the data definition. With that said, when the RESTEC team try to run their Python notebook, the dc.load step produces the following error for the incidence angle and mask products:

    TypeError: Cannot cast scalar from dtype('float32') to dtype('uint8') according to the rule 'same_kind'

    FYI they are running Python3.5.

    They are able to work around the issue. Two work arounds exist: using int8 (issues with negative values, but this can be further worked around) or using int16 (works without errors, but uses more resources).

    To get this notebook running, there are three options that we see:

    1. Implement a work around – but this is not ideal as it would have to be done in every application/uses more resources (for int16 case).
    2. Change the VN Cube yamls and reingest all of the VNCube data using the data types that work. They would however prefer to not change these values, as they are consistent with the data definition. Also wish to avoid reingesting all the data.
    3. Edit the load() function to manage the data types correctly. They would need assistance from core developers to do this.

    Are you able to advise please? Let me know if you need any more info.

    Many thanks in advance.

  • Celery runner

    Celery runner

    Overview

    New executor that uses Celery (with Redis as broker and data backend).

    This provides an alternative to current setup (dask.distributed). Problem with using dask.distributed is that it requires that tasks are idempotent, since it will sometimes schedule the same task in parallel on different nodes. With many tasks doing I/O this creates problems.

    Celery in comparison has a much simpler execution model, and doesn't have same constraints.

    Redis backend

    Celery supports a number of backends, of them two are fully supported: RabbitMQ and Redis. I have picked Redis as it is the simplest to get running without root access (NCI environment)

    data_task_options

    Adding celery option to --executor command line, same host:port argument is used. Celery executor will connect to Redis instance at a given address, if address is localhost and Redis is not running, it will be launched for the duration of the execution. Workers don't get launched however, so in most cases the app will stall until workers are added to the processing pool (see datacube-worker)

    $HOME/.datacube-redis contains redis password, if this file doesn't exist it will be created with a randomly generated password when launching Redis server.

    Also adding executor alias dask to be the same as distributed. However now that we have 2 distributed backends we should probably favour dask as a name for dask.distributed backend.

    datacube-worker

    New app datacube-worker was added to support launching workers in either celery or dask mode. It accepts the same --executor option as the task app.

  • Empty NetCDF file was created

    Empty NetCDF file was created

    I tried to ingest a granule from Sentinel-2.

    Configuration :

    The file creation is performed without problems. But, when I read bands containing in the NetCDF file (read each band as array), all value are -999 (due to my nodata configuration).

    I have verified that the file is not corrupt and values in JP2 files aren't empty.

    Thanks

  • Csiro/s3 driver

    Csiro/s3 driver

    Reason for this pull request

    Improvements to DataCube:

    • S3 storage backend
      • Windows is not supported.
    • driver manager to dynamically load/switch storage drivers (NetCDF, S3, S3-test).
    • Ingest now supports creation of nD Storage Units in the available storage drivers. e.g. NetCDF: datacube -v ingest -c ls5_nbar_albers.yaml --executor multiproc 8 S3: datacube --driver s3 -v ingest -c ls5_nbar_albers_s3.yaml --executor multiproc 8 S3-test: datacube --driver s3-test -v ingest -c ls5_nbar_albers_s3_test.yaml --executor multiproc 8
    • load and load_data has optional multi-threaded support with use_threads flag.

    Improved testing:

    • tests are run for each driver, where possible.
    • tests corner values
    • tests md5 hash equality on load_data
    • tests multiple time slices
    • reduction in data usage
    • reduction in total number of concurrent db connections.

    Proposed changes

    • cli driver parameter to select driver, if None, it defaults to NetCDF.
    • support for generating n-dimension storage units on ingest.
      • example ingest yaml: docs/config_samples/ingester/ls5_nbar_nbar_s3.yaml
    • supported drivers:
      • NetCDF: based on existing driver.
      • S3: S3 backend for storage.
      • S3-test: Same as S3 but emulated on disk.
    • datacube.api.load_data - use_threads parameter to enable threaded _fuse_measurement with results stored in a shared memory array.
    • datacube.scripts.ingest - uses slightly modified GridWorkFlow to generate 3D tasks for 3D Storage Unit creation.
    • optional creation of s3 tables via "datacube -v system init -s3"

    Todo:

    • More tests.

    • [ ] Closes #xxxx

    • [ ] Tests added / passed

    • [ ] Fully documented, including docs/about/whats_new.rst for all changes

  • Trouble Ingesting USGS Landsat and MODIS data

    Trouble Ingesting USGS Landsat and MODIS data

    When I build a datacube on my server, I have already install the datacube and init the database. How can I index the landsat data like LC80090452014008LGN00 with tif file. I tried use the ‘usgslsprepare.py’

    python datacube-core/utils/usgslsprepare.py /home/tensorx/data/datacube/landsat/*/

    it was wrong

    2017-06-19 15:39:57,274 INFO Processing /home/tensorx/data/datacube/landsat/LC80090452014008LGN00
    Traceback (most recent call last):
      File "datacube-core/utils/usgslsprepare.py", line 265, in <module>
        main()
      File "/home/tensorx/miniconda2/envs/datacube3/lib/python3.5/site-packages/click/core.py", line 722, in __call__
        return self.main(*args, **kwargs)
      File "/home/tensorx/miniconda2/envs/datacube3/lib/python3.5/site-packages/click/core.py", line 697, in main
        rv = self.invoke(ctx)
      File "/home/tensorx/miniconda2/envs/datacube3/lib/python3.5/site-packages/click/core.py", line 895, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/tensorx/miniconda2/envs/datacube3/lib/python3.5/site-packages/click/core.py", line 535, in invoke
        return callback(*args, **kwargs)
      File "datacube-core/utils/usgslsprepare.py", line 256, in main
        documents = prepare_datasets(path)
      File "datacube-core/utils/usgslsprepare.py", line 241, in prepare_datasets
        nbar = prep_dataset(fields, nbar_path)
      File "datacube-core/utils/usgslsprepare.py", line 163, in prep_dataset
        with open(os.path.join(str(path), metafile)) as f:
    UnboundLocalError: local variable 'metafile' referenced before assignment
    

    How to fix it?

  • Better integration with notebook display for core data types

    Better integration with notebook display for core data types

    Summary

    Many of datacube users work mostly in notebooks, yet none of our classes take advantage of rich display capabilities provided by the notebook environment. Here is an example of how easy it is to add “display on a map” feature to GeoBox class, leveraging existing Jupyter ecosystem:

    GeoJSON-GeoBox

    full notebook is here: https://gist.github.com/Kirill888/4ce2f64413e660d1638afa23eede6eb0

    Proposal

    1. Implement _ipython_display_ methods on important objects like GeoBox, Dataset
      • Take advantage of GeoJSON module when available, fallback to textual representation otherwise
    2. Update documentation/example notebooks with instructions on how to best take advantage of rich display ecosystem available inside Jupyter environment
    3. Update various docker files we have sitting around to include GeoJSON nbextension and possibly others
  • 'DatasetType' object has no attribute '_all_measurements' with dask

    'DatasetType' object has no attribute '_all_measurements' with dask

    Hey all, I am receiving an error when using dask to perform some computations. Not sure entirely if this is an ODC/Dask/xarray issue though. Any help would be appreciated. The code example below was extracted from a jupyter notebook.

    Expected behaviour

    Computed data and plot

    Actual behaviour

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-15-37a4078ad80d> in <module>
    ----> 1 resampled.compute().plot(size=10)
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/xarray/core/dataarray.py in compute(self, **kwargs)
        832         """
        833         new = self.copy(deep=False)
    --> 834         return new.load(**kwargs)
        835 
        836     def persist(self, **kwargs) -> "DataArray":
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/xarray/core/dataarray.py in load(self, **kwargs)
        806         dask.array.compute
        807         """
    --> 808         ds = self._to_temp_dataset().load(**kwargs)
        809         new = self._from_temp_dataset(ds)
        810         self._variable = new._variable
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/xarray/core/dataset.py in load(self, **kwargs)
        652 
        653             # evaluate all the dask arrays simultaneously
    --> 654             evaluated_data = da.compute(*lazy_data.values(), **kwargs)
        655 
        656             for k, data in zip(lazy_data, evaluated_data):
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
        450         postcomputes.append(x.__dask_postcompute__())
        451 
    --> 452     results = schedule(dsk, keys, **kwargs)
        453     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
        454 
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
       2712                     should_rejoin = False
       2713             try:
    -> 2714                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
       2715             finally:
       2716                 for f in futures.values():
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
       1991                 direct=direct,
       1992                 local_worker=local_worker,
    -> 1993                 asynchronous=asynchronous,
       1994             )
       1995 
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
        832         else:
        833             return sync(
    --> 834                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
        835             )
        836 
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
        337     if error[0]:
        338         typ, exc, tb = error[0]
    --> 339         raise exc.with_traceback(tb)
        340     else:
        341         return result[0]
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/utils.py in f()
        321             if callback_timeout is not None:
        322                 future = asyncio.wait_for(future, callback_timeout)
    --> 323             result[0] = yield future
        324         except Exception as exc:
        325             error[0] = sys.exc_info()
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/tornado/gen.py in run(self)
        733 
        734                     try:
    --> 735                         value = future.result()
        736                     except Exception:
        737                         exc_info = sys.exc_info()
    
    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
       1850                             exc = CancelledError(key)
       1851                         else:
    -> 1852                             raise exception.with_traceback(traceback)
       1853                         raise exc
       1854                     if errors == "skip":
    
    /home/ubuntu/.local/lib/python3.6/site-packages/datacube/api/core.py in fuse_lazy()
    
    /home/ubuntu/.local/lib/python3.6/site-packages/datacube/api/core.py in _fuse_measurement()
    
    /home/ubuntu/.local/lib/python3.6/site-packages/datacube/storage/_base.py in __init__()
    
    /home/ubuntu/.local/lib/python3.6/site-packages/datacube/model/__init__.py in lookup_measurements()
    
    /home/ubuntu/.local/lib/python3.6/site-packages/datacube/model/__init__.py in _resolve_aliases()
    
    AttributeError: 'DatasetType' object has no attribute '_all_measurements'
    

    Steps to reproduce the behaviour

    from dask.distributed import Client
    import datacube
    import xarray
    client = Client('cluster_address')
    dc = datacube.Datacube()
    query = {
        'lat': (48.15, 48.35),
        'lon': (16.3, 16.5),
        'time': ('2017-01-01', '2020-12-31')
    }
    data = dc.load(product='product', 
                   output_crs='EPSG:32633', 
                   resolution=(-10,10),
                   dask_chunks={'x': 250, 'y': 250, 'time':20},
                    **query)
    arr = data.band_1
    resampled = arr.resample(time='1w').mean().mean(axis=(1,2))
    resampled.compute().plot(size=10)
    

    Environment information

    There are some slight mismatches in the versions I hope that these aren't the issue:

    /opt/conda/envs/datacube/lib/python3.6/site-packages/distributed/client.py:1130: VersionMismatchWarning: Mismatched versions found
    
    +-------------+----------------+---------------+---------------+
    | Package     | client         | scheduler     | workers       |
    +-------------+----------------+---------------+---------------+
    | dask        | 2.27.0         | 2.27.0        | 2.28.0        |
    | distributed | 2.27.0         | 2.27.0        | 2.28.0        |
    | numpy       | 1.19.1         | 1.19.2        | 1.19.2        |
    | python      | 3.6.11.final.0 | 3.6.9.final.0 | 3.6.9.final.0 |
    | toolz       | 0.11.1         | 0.10.0        | 0.11.1        |
    +-------------+----------------+---------------+---------------+
      warnings.warn(version_module.VersionMismatchWarning(msg[0]["warning"]))
    
    • Which datacube --version are you using? 1.8.3
    • What datacube deployment/enviornment are you running against? Jupyterhub running in docker
  • Database Transaction API - as per EP07.

    Database Transaction API - as per EP07.

    Reason for this pull request

    As discussed in Enhancement Propose 07

    Proposed changes

    Simple example of new API:

    with dc.index.transaction() as trans:
       # Archive old datasets and add new ones in single transaction
       dc.index.datasets.archive([old_ds1.id, old_ds2.id], transaction=trans)
       dc.index.datasets.add(ds1)
       dc.index.datasets.add(ds2)
    
       # If execution gets to here, the transaction is committed.
       # If an exception was raised by any of the above methods, the transaction is rolled back.
    

    Further details discussed in Enhancement Propose 07. API is documented in docstrings.

    MacOS and Windows Conda smoke tests are failing again - some issue with GDAL.

    • [x] Implements EP07
    • [x] Tests added / passed
    • [x] Fully documented, including docs/about/whats_new.rst for all changes
  • datacube core api search for latest datasets

    datacube core api search for latest datasets

    Expected behaviour

    a way to get the latest dataset via api.

    Actual behaviour

    not available via api, an alternative query suggested by @SpacemanPaul

    There's no real way to do that through datacube API unfortunately.  You would have to use SQL:
    
    select id
    from agdc.dataset ds, agdc.dataset_type prod
    where ds.dataset_type_ref = prod.id
    and prod.name = 'product_name'
    and ds.added = (select max(added) from agdc.dataset ds2);
    
    (And it probably won't be a very efficient query either)
    

    Steps to reproduce the behaviour

    Nil

    Environment information

    • Which datacube --version are you using?
    • What datacube deployment/enviornment are you running against?

    Note: Stale issues will be automatically closed after a period of six months with no activity. To ensure critical issues are not closed, tag them with the Github pinned tag. If you are a community member and not a maintainer please escalate this issue to maintainers via GIS StackExchange or Slack.

  • Updating of docker image for GH automated testing is problematic.

    Updating of docker image for GH automated testing is problematic.

    Expected behaviour

    Create a PR that modifies the python environment (e.g. setup.py, docker/constraints.txt) and makes code changes that depend on that environment. Automated tests run on github using a docker image built from the changes in the PR. Tests pass.

    Actual behaviour

    Automated tests run on github using the docker image last pushed to dockerhub which does not include the environment changes made in the PR. Tests fail.

    Steps to reproduce the behaviour

    Create a PR that adds a new library dependency, and imports from that library.

  • Add a pre-commit hook to verify license headers

    Add a pre-commit hook to verify license headers

    We use pre-commit to check for trivial errors in code before committing it. It's useful for keeping formatting consistent and catching errors early that can be picked up with a code linter. Tight feedback loops are better than long ones.

    I think it would be worth setting up a pre-commit hook for checking license headers to ensure any files we add or change still include the corrected ODC license.

  • Programatically datacube initialization in virtual environment

    Programatically datacube initialization in virtual environment

    Expected behaviour

    It should be possible to initialize the datacube programatically

    Actual behaviour

    It is possible to run something like this from within python:

    subprocess.check_call(["python", "-m", "datacube.scripts.cli_app", "-v", "system", "init"])
    

    However, it fails when running a python interpreter outside the path (like running a conva environment in WSL2), as the called python interpreter is a different one (for example, /usr/bin/python, checked running subprocess.check_call(["which", "python"])

    Steps to reproduce the behaviour

    • Prepare a postgresql empty database using docker compose
    • Install Pycharm Professional edition
    • install WSL2
    • Install conda in WSL2
    • create a conda environment
    • In pycharm, create an WSL2 project liking to the python interpreter in the conda environment (not to the default python interpreter)
    • Execute a code like this
    def init_database(hostname: str, port: Optional[int] = None, username: Optional[str] = None,
                      password: Optional[str] = None, database: str = "datacube"):
    
        # The initialization may be done using the following environment variables
        # DB_HOSTNAME, DB_USERNAME, DB_PASSWORD and DB_DATABASE
        if port is not None:
            os.environ["DB_HOSTNAME"] = f"{hostname}:{port}"
        else:
            os.environ["DB_HOSTNAME"] = hostname
        if username:
            os.environ["DB_USERNAME"] = username
        if password:
            os.environ["DB_PASSWORD"] = password
        os.environ["DB_DATABASE"] = database
    
        # Execute the command
        from datacube.scripts.system import database_init
        subprocess.check_call(["which", "python"])
        subprocess.check_call(["python", "-m", "datacube.scripts.cli_app", "-v", "system", "init"])
    
    init_database(...)
    

    Environment information

    datacube=1.8.7

    Additional comments

    I have seen that there are functions for initializing the database in https://github.com/opendatacube/datacube-core/blob/629fc6475618024455ed0377c13ca78333179169/datacube/scripts/system.py#L48, but I do not know what argument pass to index

    And thanks for your great project!

  • Allow simple defaults for crs, resolution and alignment

    Allow simple defaults for crs, resolution and alignment

    Reason for this pull request

    It would be easy if odc.load(), when all the datasets to load have the same CRS, resolution and alignment, uses those values as defaults (if not given as parameters, or via like, or via a grid_spec in the product).

    Proposed changes

    • If output_crs, resolution and/or align are not specified in dc.load(), and neither is like, and the product has no grid_spec, and if all the datasets to load share the same crs, resolution and/or alignment, then use those values as defaults.

    • [X] Closes #1301

    • [x] Tests passed (existing ones, no new tests added for this yet)

    • [ ] Fully documented, including docs/about/whats_new.rst for all changes

Enable geospatial data mining through Google Earth Engine in Grasshopper 3D, via its most recent Hops component.
Enable geospatial data mining through Google Earth Engine in Grasshopper 3D, via its most recent Hops component.

AALU_Geo Mining This repository is produced for a masterclass at the Architectural Association Landscape Urbanism programme. Requirements Rhinoceros (

Jul 14, 2022
Get Landsat surface reflectance time-series from google earth engine
Get Landsat surface reflectance time-series from google earth engine

geextract Google Earth Engine data extraction tool. Quickly obtain Landsat multispectral time-series for exploratory analysis and algorithm testing On

Aug 16, 2022
Python package for earth-observing satellite data processing

Satpy The Satpy package is a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data f

Sep 23, 2022
geemap - A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and ipywidgets.
 geemap - A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and ipywidgets.

A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and folium

Sep 19, 2022
Simple CLI for Google Earth Engine Uploads
Simple CLI for Google Earth Engine Uploads

geeup: Simple CLI for Earth Engine Uploads with Selenium Support This tool came of the simple need to handle batch uploads of both image assets to col

Sep 11, 2022
A Python interface between Earth Engine and xarray
A Python interface between Earth Engine and xarray

eexarray A Python interface between Earth Engine and xarray Description eexarray was built to make processing gridded, mesoscale time series data quic

Sep 19, 2022
A simple python script that, given a location and a date, uses the Nasa Earth API to show a photo taken by the Landsat 8 satellite. The script must be executed on the command-line.
A simple python script that, given a location and a date, uses the Nasa Earth API to show a photo taken by the Landsat 8 satellite. The script must be executed on the command-line.

What does it do? Given a location and a date, it uses the Nasa Earth API to show a photo taken by the Landsat 8 satellite. The script must be executed

Aug 4, 2022
Digital Earth Australia notebooks and tools repository
 Digital Earth Australia notebooks and tools repository

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and xarray

Sep 16, 2022
A python package that extends Google Earth Engine.
A python package that extends Google Earth Engine.

A python package that extends Google Earth Engine GitHub: https://github.com/davemlz/eemont Documentation: https://eemont.readthedocs.io/ PyPI: https:

Sep 17, 2022
Calculate & view the trajectory and live position of any earth-orbiting satellite

satellite-visualization A cross-platform application to calculate & view the trajectory and live position of any earth-orbiting satellite in 3D. This

Jan 8, 2022
ESMAC diags - Earth System Model Aerosol-Cloud Diagnostics Package

Earth System Model Aerosol-Cloud Diagnostics Package This Earth System Model (ES

Jan 4, 2022
A ninja python package that unifies the Google Earth Engine ecosystem.
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

Sep 16, 2022
Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.
Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.

Using Global fishing watch's data to build a machine learning model that can identify illegal fishing and poaching activities through satellite and geo-location data.

May 6, 2022
framework for large-scale SAR satellite data processing

pyroSAR A Python Framework for Large-Scale SAR Satellite Data Processing The pyroSAR package aims at providing a complete solution for the scalable or

Sep 15, 2022
Script that allows to download data with satellite's orbit height and create CSV with their change in time.
Script that allows to download data with satellite's orbit height and create CSV with their change in time.

Satellite orbit height ◾ Requirements Python >= 3.8 Packages listen in reuirements.txt (run pip install -r requirements.txt) Account on Space Track ◾

Jan 17, 2022
A package built to support working with spatial data using open source python

EarthPy EarthPy makes it easier to plot and manipulate spatial data in Python. Why EarthPy? Python is a generic programming language designed to suppo

Sep 10, 2022
Open GeoJSON data on geojson.io

geojsonio.py Open GeoJSON data on geojson.io from Python. geojsonio.py also contains a command line utility that is a Python port of geojsonio-cli. Us

Sep 6, 2022
EOReader is a multi-satellite reader allowing you to open optical and SAR data.

Remote-sensing opensource python library reading optical and SAR sensors, loading and stacking bands, clouds, DEM and index.

Sep 13, 2022
GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.
GeoNode is an open source platform that facilitates the creation, sharing, and collaborative use of geospatial data.

Table of Contents What is GeoNode? Try out GeoNode Install Learn GeoNode Development Contributing Roadmap Showcase Most useful links Licensing What is

Sep 15, 2022