TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TorchGeo

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

The goal of this library is to make it simple:

  1. for machine learning experts to use geospatial data in their workflows, and
  2. for remote sensing experts to use their data in machine learning workflows.

See our installation instructions, documentation, and examples to learn how to use torchgeo.

External links: docs codecov

Tests: docs style tests

Installation instructions

Until the first release, you can install an environment compatible with torchgeo with conda, pip, or spack as shown below.

Conda

Note: if you do not have access to a GPU or are running on macOS, replace pytorch-gpu with pytorch-cpu in the environment.yml file.

$ conda config --set channel_priority strict
$ conda env create --file environment.yml
$ conda activate torchgeo

Pip

With Python 3.6 or later:

$ pip install -r requirements.txt

Spack

$ spack env activate .
$ spack install

Documentation

You can find the documentation for torchgeo on ReadTheDocs.

Example usage

The following sections give basic examples of what you can do with torchgeo. For more examples, check out our tutorials.

Train and test models using our PyTorch Lightning based training script

We provide a script, train.py for training models using a subset of the datasets. We do this with the PyTorch Lightning LightningModules and LightningDataModules implemented under the torchgeo.trainers namespace. The train.py script is configurable via the command line and/or via YAML configuration files. See the conf/ directory for example configuration files that can be customized for different training runs.

$ python train.py config_file=conf/landcoverai.yaml

Download and use the Tropical Cyclone Wind Estimation Competition dataset

This dataset is from a competition hosted by Driven Data in collaboration with Radiant Earth. See here for more information.

Using this dataset in torchgeo is as simple as importing and instantiating the appropriate class.

import torchgeo.datasets

dataset = torchgeo.datasets.TropicalCycloneWindEstimation(split="train", download=True)
print(dataset[0]["image"].shape)
print(dataset[0]["wind_speed"])

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Comments
  • Increase coverage of trainers

    Increase coverage of trainers

    Since we no longer run integration tests on main, our trainer modules are currently the least well-tested code. This PR attempts to add unit tests for these trainers. I'm a bit stuck at the moment, so opening this up for feedback on better ways to test this code while I focus on more important things.

  • Dependabot for more stable CI

    Dependabot for more stable CI

    Our CI has been incredibly unstable lately. Every time a new version of a dependency is released, something in our tests breaks, especially mypy.

    Using dependabot, we can pin all of our dependencies to a specific version. The bot will then periodically check for updates and open a PR to update the dependency version. That way, only the version update PR will break, not everyone else's PRs.

    Side note: a lot of our deps don't yet have wheels for Python 3.10, or have never had wheels for Windows. I was thinking about switching from pip to conda for all CI. Unfortunately, it looks like dependabot only supports pip, not conda.

  • Datasets: consistent capitalization of band names

    Datasets: consistent capitalization of band names

    There seems to be a 50/50 mix of RGB_BANDS/ALL_BANDS and rgb_bands/all_bands in our datasets. The GeoDataset base class uses lowercase, so this PR changes all other datasets to match. From what I can tell, PEP-8 doesn't seem to distinguish between class attributes and instance attributes, so I'm not sure if these are considered variables or global variables.

    @ashnair1 not sure if this affects #687

  • IDTReeS bugs

    IDTReeS bugs

    Description

    While using IDTReeS to train an object detector, I noticed quite a few bugs within this dataset.

    1. Coordinates are swapped

    Fixed by #683

    OSBS_27_before

    The above image is a comparison of the actual ground truth of OSBS_27 and the plot produced by torchgeo.

    2. Missing polygon

    Shapefile contains 5 polygons whereas the plot only shows 4. This issue seems to be at the dataset level.

    OSBS_27_missingpoly

    3. Negative box coordinates in pixel space

    This is a weird one. I noticed that some boxes had negative pixel coordinates. Observed this in MLBS_5 but its possible this could be present in others as well.

    Comparing QGIS and the corrected torchgeo plot for MLBS_5 didn't reveal much

    MLBS_5_initial

    As you can see, the top 4 polygons are missing. But unlike the previous issue, these polygons do exist. However they seem to be outside the bounds of the image.

    MLBS_5_second

    This could explain the negative coordinates but I'm not sure as to why there is a difference between the plots.

    Steps to reproduce

    Most of these issues can be seen by downloading the IDTReeS dataset and comparing the torchgeo plots vs the shapefiles.

    Version

    0.4.0.dev0

  • ValueError: empty range for randrange()

    ValueError: empty range for randrange()

    When using RandomBatchGeoSampler, 50% of the time the following error will occur. With no code change, this runs perfectly fine the other 50% of the time.

    code:

    sampler = RandomBatchGeoSampler(ds, size=1024, batch_size=5, length=5 * 5)
    dl = DataLoader(ds, batch_sampler=sampler, collate_fn=stack_samples)
    
    for idx, batch in enumerate(dl):
        for idx_s, image in enumerate(batch['image']):
            image = torch.squeeze(image)
    

    error:

      File "/shared/ritwik/miniconda3/envs/dino/lib/python3.7/site-packages/torchgeo/samplers/batch.py", line 115, in __iter__
        bounding_box = get_random_bounding_box(bounds, self.size, self.res)
      File "/shared/ritwik/miniconda3/envs/dino/lib/python3.7/site-packages/torchgeo/samplers/utils.py", line 49, in get_random_bounding_box
        minx = random.randrange(int(width)) * res + bounds.minx
      File "/shared/ritwik/miniconda3/envs/dino/lib/python3.7/random.py", line 190, in randrange
        raise ValueError("empty range for randrange()")
    ValueError: empty range for randrange()
    
  • Indices Transforms

    Indices Transforms

    ~~This is simply a draft of a transform for computing NDVI and concatenating to the sample["image"] channels. Open to suggestions/feedback.~~

    ~~Note: I just realized that the requiring the index of the channels for computing the indices doesn't scale. See Enhanced Vegation Index which requires 4+ bands to compute EVI = G * ((NIR - R) / (NIR + C1 * R – C2 * B + L)).~~

    Update:

    This PR adds the following:

    • AugmentationSequential wrapper around kornia.augmentation.AugmentationSequential which supports our sample/batch dicts.
    • ndbi, ndsi, ndvi, ndwi functionals which given specific multispectral band input will compute the associated index. AppendNDBI, AppendNDSI, AppendNDVI, AppendNDWI transform modules which take as input a batch dict, computes the desired index, and appends to the channel dimension.
    • Unit tests
    • kornia>=0.5.4 as dependency which is the version when kornia.augmentation.AugmentationSequential was added. Additionally, kornia's only dependency is pytorch.

    TODO:

    • Add jupyter notebook to docs/tutorials display the usage of indices transforms
    • Fix notebook sphinx errors

    Closes #112

  • SpaceNet 2

    SpaceNet 2

    • Created a SpaceNet metaclass that works on mlhub collections instead of datasets.
    • Refactored SpaceNet1
    • Added SpaceNet 2
    • Added tests and test data for SpaceNet 2

    Note: There is a slight issue with the Vegas collection. Refer radiantearth/radiant-mlhub#65. Simple solution: hard code location of img1 label.

  • failed conda install on windows10

    failed conda install on windows10

    I was trying to install torchgeo on windows 10 using conda on a new env, I got the following error

    >conda create -n torch-geo-test python=3.10
    
    >conda install torchgeo -c conda-forge
    
    Collecting package metadata (current_repodata.json): done
    Solving environment: failed with initial frozen solve. Retrying with flexible solve.
    Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
    Collecting package metadata (repodata.json): done
    Solving environment: failed with initial frozen solve. Retrying with flexible solve.
    Solving environment: -
    Found conflicts! Looking for incompatible packages.
    This can take several minutes.  Press CTRL-C to abort.
    Examining @/win-64::__win==0=0: 100%|████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.72it/s]|
    failed
    
    UnsatisfiableError: The following specifications were found to be incompatible with each other:
    
    Output in format: Requested package -> Available versions
    Note that strict channel priority may have removed packages required for satisfiability.
    

    However, a week ago, I installed it on another environment for testing a week ago, and it worked perfectly. I also tried the same thing on WSL2 (ubuntu 22.04), and it worked.

  • Cant import Sentinel 2 Class

    Cant import Sentinel 2 Class

    Hi, I have tried with both the normal conda installation of torch and the development one. when I try to import the Sentinel 2 class I get the following error:

    from torchgeo.datasets import Sentinel2

    cannot import name 'draw_segmentation_masks' from 'torchvision.utils' (/Users/gracecolverd/opt/miniconda3/envs/torch_array/lib/python3.8/site-packages/torchvision/utils.py)

    Thanks, Grace

  • Jupyter Notebook tutorials

    Jupyter Notebook tutorials

    We need to figure out how to render Jupyter Notebooks in our documentation so that we can provide easy-to-use tutorials for new users. This should work similarly to https://pytorch.org/tutorials/.

    Ideally I would like to be able to test these tutorials so that they stay up-to-date.

  • GridGeoSampler: change stride of last patch to sample entire ROI

    GridGeoSampler: change stride of last patch to sample entire ROI

    This PR changes the way in which GridGeoSampler samples patches from each tile if the tile size is not a multiple of the stride. We want to cover the entire tile, requiring us to adjust the stride of the final row/column.

    This also changes the number of patches returned from each tile. Let $i$ be the size of the input tile. Let $k$ be the requested size of the output patch. Let $s$ be the requested stride. Let $o$ be the number of output rows/columns sampled from each tile.

    Before

    $$ o = \left\lfloor \frac{i - k}{s} \right\rfloor + 1 $$

    After

    $$ o = \left\lceil \frac{i - k}{s} \right\rceil + 1 $$

    Reboot of #448 Fixes #431

  • Convert all index transforms to Kornia

    Convert all index transforms to Kornia

    This PR converts all of our index transforms to be valid Kornia augmentations. These transforms can now be used with AugmentationSequential just like all of the other transforms we use.

  • DataModules: run all data augmentation on the GPU

    DataModules: run all data augmentation on the GPU

    This PR overhauls all of our data modules to improve uniformity. This includes the following changes:

    • [x] Add GeoDataModule and NonGeoDataModule base classes to reduce code duplication
    • [x] Only instantiate the datasets that are needed for a particular stage
    • [x] Replace torchvision with kornia (better support for MSI, GPU, inverse)
    • [x] Replace dataset transforms with on_after_batch_transfer (CPU -> GPU, sample -> batch, faster)
    • [x] Remove instance methods for preprocessing (fixes #886)
    • [ ] Documentation

    In a future PR, I'm planning on extending this to the rest of our transforms:

    • Rewrite all index transforms to be compatible with Kornia (#999)
    • Update tutorials to use Kornia with our transforms
    • Upstream and remove our custom transforms and AugmentationSequential hacks
  • Remove tests/datamodules

    Remove tests/datamodules

    Summary

    I would like to remove (almost) all tests in tests/datamodules and replace them with new tests in tests/trainers that actually ensure our datamodules and trainers are compatible.

    Rationale

    The current tests simply ensure that the data loaders don't crash. They don't actually test that the datamodules are compatible with our trainers.

    Implementation

    The bulk of these have already been converted in #329. The remaining tests are:

    • [ ] chesapeake (tests invalid arguments)
    • [ ] fair1m (requires rotated ObjectDetectionTask trainer: #840)
    • [x] inria (#975)
    • [x] loveda (#966)
    • [x] nasa_marine_debris (#979)
    • [ ] oscd (requires ChangeDetectionTask trainer)
    • [x] potsdam (#929)
    • [ ] usavars (requires multi-label RegressionTask trainer)
    • [x] vaihingen (#853)
    • [ ] xview2 (requires ChangeDetectionTask trainer)
    • [ ] utils (coverage for test_pct == None, do we need this?)

    Alternatives

    We may end up keeping some of these that test invalid arguments. The important thing is not that we don't test datamodules standalone, but that we test them with trainers whenever possible.

    Additional information

    No response

  • Add Multi-Weight Support API

    Add Multi-Weight Support API

    This PR closes #762 by adding support for loading pretrained weights from various sources, specifically for Earth Observational Data following torchvision.

    Example:

    from torchgeo.models import ResNet50_Weights
    from torchgeo.trainers import ClassificationTask
    
    task = ClassificationTask(
        classification_model="resnet50",
        loss="ce",
        num_classes=10,
        in_channels=3,
        weights=ResNet50_Weights.SENTINEL2_ALL_MOCO.get_state_dict()
    )
    

    Some things I have encountered:

    • I think torchvision is able to make a cleaner API because all the available weights they have are exclusively for models they themselves implement in torchvision. In the Earth Observational Data case, the pretrained weights I have found come from a variety of sources but mostly use timm which aligns with torchgeo. However, there are sometimes some hoops to jump through before being able to actually load the state dict into the timm model. This is currently handled by get_state_dict()
    • With the few weights I have already run into issues with naming schemes discussed in #804 and it seems like with trying to collect pretrained weights from a variety of sources for specific use cases will make the proper naming scheme unclear, but maybe there is a more clever way to handle it with the accompanying meta data?

    Questions:

    • where should the weights be integrated within the torchgeo file hierarchy
    • I am not sure what the best way of unit testing this is
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Dec 24, 2022
Transfer Learning Shootout for PyTorch's model zoo (torchvision)
Transfer Learning Shootout for PyTorch's model zoo (torchvision)

pytorch-retraining Transfer Learning shootout for PyTorch's model zoo (torchvision). Load any pretrained model with custom final layer (num_classes) f

Jun 29, 2022
MCMC samplers for Bayesian estimation in Python, including Metropolis-Hastings, NUTS, and Slice

Sampyl May 29, 2018: version 0.3 Sampyl is a package for sampling from probability distributions using MCMC methods. Similar to PyMC3 using theano to

Dec 25, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Dec 13, 2021
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-trained models.

Jan 8, 2023
Annotate datasets with a semi-trained or fully trained YOLOv5 model

YOLOv5 Auto Annotator Annotate datasets with a semi-trained or fully trained YOLOv5 model Prerequisites Ubuntu >=20.04 Python >=3.7 System dependencie

May 14, 2022
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

Oct 14, 2022
Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"
Implementation for

DSBF Introduction This repository contains the implementation code for paper: Domain-Specific Bias Filtering for Single Labeled Domain Generalization

Jan 5, 2023
ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The ImageNet-CoG Benchmark Project Website Paper (arXiv) Code repository for the ImageNet-CoG Benchmark introduced in the paper "Concept Generalizatio

Oct 9, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

Dec 16, 2022
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

KoRean based ELECTRA (KR-ELECTRA) This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computa

Jun 3, 2022
Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Lottery Jackpots Exist in Pre-trained Models (Paper Link) Requirements Python >= 3.7.4 Pytorch >= 1.6.1 Torchvision >= 0.4.1 Reproduce the Experiment

Jun 28, 2022
Implementation of PyTorch-based multi-task pre-trained models

mtdp Library containing implementation related to the research paper "Multi-task pre-training of deep neural networks for digital pathology" (Mormont

Oct 14, 2022
Pytorch implementation of MLP-Mixer with loading pre-trained models.

MLP-Mixer-Pytorch PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained p

Sep 29, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled -

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Dec 8, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Nov 23, 2022
Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

Aug 4, 2021