Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN)

This repository contains the experiments done in the work An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling by Shaojie Bai, J. Zico Kolter and Vladlen Koltun.

We specifically target a comprehensive set of tasks that have been repeatedly used to compare the effectiveness of different recurrent networks, and evaluate a simple, generic but powerful (purely) convolutional network on the recurrent nets' home turf.

Experiments are done in PyTorch. If you find this repository helpful, please cite our work:

@article{BaiTCN2018,
	author    = {Shaojie Bai and J. Zico Kolter and Vladlen Koltun},
	title     = {An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling},
	journal   = {arXiv:1803.01271},
	year      = {2018},
}

Domains and Datasets

Update: The code should be directly runnable with PyTorch v1.0.0 or above (PyTorch v>1.3.0 strongly recommended). The older versions of PyTorch are no longer supported.

This repository contains the benchmarks to the following tasks, with details explained in each sub-directory:

  • The Adding Problem with various T (we evaluated on T=200, 400, 600)
  • Copying Memory Task with various T (we evaluated on T=500, 1000, 2000)
  • Sequential MNIST digit classification
  • Permuted Sequential MNIST (based on Seq. MNIST, but more challenging)
  • JSB Chorales polyphonic music
  • Nottingham polyphonic music
  • PennTreebank [SMALL] word-level language modeling (LM)
  • Wikitext-103 [LARGE] word-level LM
  • LAMBADA [LARGE] word-level LM and textual understanding
  • PennTreebank [MEDIUM] char-level LM
  • text8 [LARGE] char-level LM

While some of the large datasets are not included in this repo, we use the observations package to download them, which can be easily installed using pip.

Usage

Each task is contained in its own directory, with the following structure:

[TASK_NAME] /
    data/
    [TASK_NAME]_test.py
    models.py
    utils.py

To run TCN model on the task, one only need to run [TASK_NAME]_test.py (e.g. add_test.py). To tune the hyperparameters, one can specify via argument options, which can been seen via the -h flag.

Owner
CMU Locus Lab
Zico Kolter's Research Group
CMU Locus Lab
Comments
  • About different sequence input

    About different sequence input

    I have a totally different sequence, the smallest length is about 100 words, the max is about 5000. I attempt to padding zero to the same length of 5000, but the classified result is terrible. but if I just input different size, that's means keep the original and make the batch_size just 1, that's works well. I don't know why this happens.

  • Can the TCN module be made faster?

    Can the TCN module be made faster?

    I'm using your TCN module for a language modeling task. My code follows the structure of your char_cnn code. It works but the performance is very bad compared to an LSTM network. Each epoch with the TCN network takes about 10 times longer. Do you know if the performance can be improved? Here is the forward method from the TCN class:

        def forward(self, x):
            emb = self.drop(self.encoder(x))
            y = self.tcn(emb.transpose(1, 2))
            o = self.decoder(y.transpose(1, 2))
            return o.contiguous()
    

    Perhaps it is the transpose calls that is making the code slow?

  • When I use 28 seqence length for LSTM and TCN, LSTM is much faster than TCN.

    When I use 28 seqence length for LSTM and TCN, LSTM is much faster than TCN.

    It seems to me that LSTM is faster when the sequence length is short (say 28). When the sequence length is long (say 784), LSTM will be much slower than TCN.

    It seems to me for TCN, the computation time is independent of the sequence length.

    Am I correct?

  • RNN/LSTM Baselines?

    RNN/LSTM Baselines?

    This is a great set of experiments! I'm wondering if the code for the RNN/LSTM baselines reported in the paper are available somewhere. At present, I only see code for the TCN model.

    Thanks!

  • why temporal pad at all?

    why temporal pad at all?

    Nice work! I'm researching time series regression using machine learning so I'm looking at LSTM, TCN and Transformers based models and getting good results with your model.

    One general question? I'm not sure I understand the reason why we pad each layer of a TCN at all. I understand that it ensures each layer produces a sequence of the same length so there's a benefit in that your predictions are aligned with your inputs. But it's very similar to initialising an AR(p) model with a vector of zeros when you predict forward - the initial predictions will all be "wrong" until the effect of the initial state has decayed out. LSTM's also have this issue - most applications seem to set the initial state per batch to zero which results in transients errors at the start of the batch (some authors train a separate model to estimate the initial state which I've had good success with). I would assume this would impact training as well and it seems to make sense to mask out the start of the output sequence when calculating the loss or the model may try and adapt to "fix" the impact of the wrong ic.

    Certainly when I train a regression-based TCN I can observe transient errors at the start of the prediction - i.e. the diagram below underpredicts for the first 96 samples (that's 1 day of 15minute electricity consumption) then overpredicts for the first week before settling down. Interested in your thoughts.

    Also, one general observation - the prediction from TCN seem noisier than LSTM, I thought the long AR window might filter out more noise than it has. Plus it's quite sensitive to learning rate - low learning rate produces a very noisey output sequence.

    image

  • Recommendation for image to text

    Recommendation for image to text

    My goal is to train a model that can output sequences of text from image inputs. Using the IAM handwriting dataset for example, we would pass the model an image

    image

    and expect it to return "broadcast and television report on his". Historically, the common (i.e. recurrent) way to accomplish this would be an encoder (CNN) + decoder (LSTM) architecture like OpenNMT's implementation. I am interested in replacing the decoder with a TCN, but am unsure how to approach the image data. The CNN encoder will create a batch of N features maps with reduced spatial dimensions (H', W')

    image

    The issue is a TCN expects 3D tensors (N, L, C) whereas each "timestep" of the image is 2D (N, H, W, C). Following the p-MNIST example in the paper, we could flatten the image into a 1D sequence with length H' x W'. Then the TCN would effectively snake through the pseudo-timesteps like below

    image

    However, if we want one prediction per timestep it makes much more sense to define a left-to-right sequence instead of a snaking one since that's the direction the text is depicted in the image. Did you experiment at all with image to text models, and if so, how did you chose to represent the images?

    I also wonder about the loss function for training a TCN decoder. Assuming you divide the image width into more timesteps than your maximum expected sequence length, it seems like connectionist temporal classification (CTC) would be a good choice. Then you do not have to worry about alignment between the target sequence and model's prediction. For instance, "bbb--ee-cau--sssss----e" would be collapsed to "because" by combining neighboring duplicates and removing blanks. Do you agree or is there a different loss function you would suggest?

  • Clarification on figure 3(a)

    Clarification on figure 3(a)

    Hello and thanks for the paper and the helpful codebase!

    I just wanted to clarify how the convergence plots in the paper were generated, particularly fig 3(a). The Y axis is labelled test accuracy, however the X values seems to be more frequent than every epoch. Could you confirm what data is being evaluated on here and if smoothing is taking place? Thanks

  • How to reproduce results from the paper?

    How to reproduce results from the paper?

    Is it just the testing result in the last epoch using default parameters? I have tried to run add_test.py and below is the result i get for the 10 epochs.

    Test set: Average loss: 0.168699 Test set: Average loss: 0.001142 Test set: Average loss: 0.000922 Test set: Average loss: 0.000345 Test set: Average loss: 0.000143 Test set: Average loss: 0.000188 Test set: Average loss: 0.000121 Test set: Average loss: 0.000028 Test set: Average loss: 0.000244 Test set: Average loss: 0.000042

    Which one should I use for benchmarking? In the paper, the result of TCN was 5.8e-5 but it seems like we can use 2.8e-5 or 4.2e-5 here.

  • Cutting off effective history when evaluating char_cnn model.

    Cutting off effective history when evaluating char_cnn model.

    I don't get why on test time (or when evaluating the model on a validation set), we don't compute the loss on all the sequence and not only on a part of the sequence that ensures sufficient history. The model is not evaluated on the whole dataset but only on a sub-part, are the results reliable ? or even comparable to other models (LSTM, ect ) that doesn't use this method ?

  • Is TCN suitable for time series regression?

    Is TCN suitable for time series regression?

    Hello,

    Thank you for your great paper and sharing!

    I'm wondering how to use TCN to solve time series regression problems. In my time series scenario, data for each moment contains multiple variables and each variable is a real number. For example, data for time step 0 is something like "vector_0 = <0.1, 0.2, 0.3, ...>", and I want to use historical k vectors to predict the next vector data.

    I have developed a LSTM model for this question. The input shape of LSTM model is "batch_size, time_steps(k), input_size(length of each vector)", and the prediction result is the last value of LSTM. Then I could calculate the MSE loss and do backward. How can I use TCN to solve this problem?

    Best Regards

  • Causal Transposed Convolution

    Causal Transposed Convolution

    Hi,

    Thanks for this great paper ... I am trying to use this architecture in auto-encoder setting such that the encoder part is a stack of strided-dilated-causal conv layers and now thinking about the decoder part.

    In terms of up-sampling using transposed convolutions, does it follow the same intuition in order to have causal up-sampling (i.e. to exclude the reconstructions of future part) ? Or shall we generate sample-by-sample without transposed conv layers ?

    With many thanks in advance Best Regards

  • How should I choose correct layers number?

    How should I choose correct layers number?

    Suppose my input sequences are of varying lengths (between 15 ~ 400) and 3 features.

    What should I pass to TemporalConvNet(3, [3] * layers) for layers?

  • why     raise AssertionError(

    why raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

    Namespace(batch_size=32, clip=-1, cuda=True, dropout=0.0, epochs=10, ksize=7, levels=8, log_interval=100, lr=0.004, nhid=30, optim='Adam', seed=1111, seq_len=400) Producing data... Traceback (most recent call last): File "D:\javaworkspace\TCN\TCN\adding_problem\add_test.py", line 63, in model.cuda() File "D:\anaconda3\envs\script\lib\site-packages\torch\nn\modules\module.py", line 688, in cuda return self._apply(lambda t: t.cuda(device)) File "D:\anaconda3\envs\script\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply module._apply(fn) File "D:\anaconda3\envs\script\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply module._apply(fn) File "D:\anaconda3\envs\script\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply module._apply(fn) [Previous line repeated 1 more time] File "D:\anaconda3\envs\script\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply param_applied = fn(param) File "D:\anaconda3\envs\script\lib\site-packages\torch\nn\modules\module.py", line 688, in return self.apply(lambda t: t.cuda(device)) File "D:\anaconda3\envs\script\lib\site-packages\torch\cuda_init.py", line 210, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

  • copy memory questions

    copy memory questions

    Hi,

    Good day. I notice that the sequence length for y is same as X in this case.

    May I know how can we modify so that the sequence length for y is different from X? I did tried modified the seq length for y, but it end up error in cross entropy loss.

    Hopefully can get reply soon Thanks

  • Training on variable-length sequences

    Training on variable-length sequences

  • Time series classification

    Time series classification

    Hi, Your work is great. I have a little question about TCN. I am trying to apply TCN to time series classification (feed the output of the last time step to a Fully connected layer, but the results seem not good as the conventional CNN with global pooling). I do not know if that is normal? Maybe I did something wrong. Could you please give me some help? Thanks a lot.

Related tags
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Jun 23, 2022
[ICLR'19] Trellis Networks for Sequence Modeling

TrellisNet for Sequence Modeling This repository contains the experiments done in paper Trellis Networks for Sequence Modeling by Shaojie Bai, J. Zico

Aug 15, 2022
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper “Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

Sep 7, 2022
Implementation of the "Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos" paper.
Implementation of the

Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos Introduction Point cloud videos exhibit irregularities and lack of or

Sep 19, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Sep 18, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

Sep 21, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Jul 19, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Aug 27, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

Sep 15, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Sep 7, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

Sep 6, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sep 20, 2022
Sequence to Sequence Models with PyTorch
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sep 9, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Sep 7, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Sep 13, 2022
Sequence lineage information extracted from RKI sequence data repo
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Jan 15, 2022
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

Apr 22, 2022
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Sep 13, 2022
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Sep 16, 2022