Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here


Efficient, reusable RNNs and LSTMs for torch

Subscribe to updates I use torch-rnn

Statistics on torch-rnn

Number of watchers on Github 1852
Number of open issues 98
Average time to close an issue 10 days
Main language Lua
Average time to merge a PR 5 days
Open pull requests 26+
Closed pull requests 4+
Last commit about 2 years ago
Repo Created over 3 years ago
Repo Last Updated over 1 year ago
Size 846 KB
Organization / Authorjcjohnson
Page Updated
Do you use torch-rnn? Leave a review!
View open issues (98)
View torch-rnn activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating torch-rnn for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)


torch-rnn provides high-performance, reusable RNN and LSTM modules for torch7, and uses these modules for character-level language modeling similar to char-rnn.

You can find documentation for the RNN and LSTM modules here; they have no dependencies other than torch and nn, so they should be easy to integrate into existing projects.

Compared to char-rnn, torch-rnn is up to 1.9x faster and uses up to 7x less memory. For more details see the Benchmark section below.


Docker Images

Cristian Baldi has prepared Docker images for both CPU-only mode and GPU mode; you can find them here.

System setup

You'll need to install the header files for Python 2.7 and the HDF5 library. On Ubuntu you should be able to install like this:

sudo apt-get -y install python2.7-dev
sudo apt-get install libhdf5-dev

Python setup

The preprocessing script is written in Python 2.7; its dependencies are in the file requirements.txt. You can install these dependencies in a virtual environment like this:

virtualenv .env                  # Create the virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install Python dependencies
# Work for a while ...
deactivate                       # Exit the virtual environment

Lua setup

The main modeling code is written in Lua using torch; you can find installation instructions here. You'll need the following Lua packages:

After installing torch, you can install / update these packages by running the following:

# Install most things using luarocks
luarocks install torch
luarocks install nn
luarocks install optim
luarocks install lua-cjson

# We need to install torch-hdf5 from GitHub
git clone
cd torch-hdf5
luarocks make hdf5-0-0.rockspec

CUDA support (Optional)

To enable GPU acceleration with CUDA, you'll need to install CUDA 6.5 or higher and the following Lua packages:

You can install / update them by running:

luarocks install cutorch
luarocks install cunn

OpenCL support (Optional)

To enable GPU acceleration with OpenCL, you'll need to install the following Lua packages:

You can install / update them by running:

luarocks install cltorch
luarocks install clnn

OSX Installation

Jeff Thompson has written a very detailed installation guide for OSX that you can find here.


To train a model and use it to generate new text, you'll need to follow three simple steps:

Step 1: Preprocess the data

You can use any text file for training models. Before training, you'll need to preprocess the data using the script scripts/; this will generate an HDF5 file and JSON file containing a preprocessed version of the data.

If you have training data stored in my_data.txt, you can run the script like this:

python scripts/ \
  --input_txt my_data.txt \
  --output_h5 my_data.h5 \
  --output_json my_data.json

This will produce files my_data.h5 and my_data.json that will be passed to the training script.

There are a few more flags you can use to configure preprocessing; read about them here

Step 2: Train the model

After preprocessing the data, you'll need to train the model using the train.lua script. This will be the slowest step. You can run the training script like this:

th train.lua -input_h5 my_data.h5 -input_json my_data.json

This will read the data stored in my_data.h5 and my_data.json, run for a while, and save checkpoints to files with names like cv/checkpoint_1000.t7.

You can change the RNN model type, hidden state size, and number of RNN layers like this:

th train.lua -input_h5 my_data.h5 -input_json my_data.json -model_type rnn -num_layers 3 -rnn_size 256

By default this will run in GPU mode using CUDA; to run in CPU-only mode, add the flag -gpu -1.

To run with OpenCL, add the flag -gpu_backend opencl.

There are many more flags you can use to configure training; read about them here.

Step 3: Sample from the model

After training a model, you can generate new text by sampling from it using the script sample.lua. Run it like this:

th sample.lua -checkpoint cv/checkpoint_10000.t7 -length 2000

This will load the trained checkpoint cv/checkpoint_10000.t7 from the previous step, sample 2000 characters from it, and print the results to the console.

By default the sampling script will run in GPU mode using CUDA; to run in CPU-only mode add the flag -gpu -1 and to run in OpenCL mode add the flag -gpu_backend opencl.

There are more flags you can use to configure sampling; read about them here.


To benchmark torch-rnn against char-rnn, we use each to train LSTM language models for the tiny-shakespeare dataset with 1, 2 or 3 layers and with an RNN size of 64, 128, 256, or 512. For each we use a minibatch size of 50, a sequence length of 50, and no dropout. For each model size and for both implementations, we record the forward/backward times and GPU memory usage over the first 100 training iterations, and use these measurements to compute the mean time and memory usage.

All benchmarks were run on a machine with an Intel i7-4790k CPU, 32 GB main memory, and a Titan X GPU.

Below we show the forward/backward times for both implementations, as well as the mean speedup of torch-rnn over char-rnn. We see that torch-rnn is faster than char-rnn at all model sizes, with smaller models giving a larger speedup; for a single-layer LSTM with 128 hidden units, we achieve a 1.9x speedup; for larger models we achieve about a 1.4x speedup.

Below we show the GPU memory usage for both implementations, as well as the mean memory saving of torch-rnn over char-rnn. Again torch-rnn outperforms char-rnn at all model sizes, but here the savings become more significant for larger models: for models with 512 hidden units, we use 7x less memory than char-rnn.


  • Get rid of Python / JSON / HDF5 dependencies?
torch-rnn open issues Ask a question     (View All Issues)
  • almost 3 years Trouble sampling on Raspberry Pi 2 model B
  • almost 3 years Sampling making sense text
  • almost 3 years expected align(#) on line 579
  • almost 3 years can this works with glibc <2.14 ?
  • almost 3 years Python version required
  • almost 3 years Using stored checkpoint in a java program
  • about 3 years TemporalConvolution question
  • about 3 years Feature ideas: scroing, erasures, binary data
  • about 3 years Crashes on small dataset.
  • about 3 years Feature Proposal - More robust resuming?
  • about 3 years Feature request: Checkpoint on time delta
  • about 3 years Citing this code
  • about 3 years Preprocess
  • about 3 years 'ThCudaCheckFail' Using Cuda7.5 Docker img
  • about 3 years "Could NOT find HDF5"
  • about 3 years How to do back propagation if I only use specific hidden state?
  • about 3 years No LuaRocks module found for cltorch
  • about 3 years Sampling question for small dataset
  • about 3 years Error: error: wrote 0 blocks instead of 1
  • about 3 years Option to store checkpoint on Ctrl+C interrupt
  • about 3 years Beam search for sampling
  • about 3 years Stuck at 1000th iteration and no checkpoint written
  • about 3 years Question about Sampling
  • about 3 years How to deep copy checkpoint.model ?
  • about 3 years Batch norm applied over x Vs. Batch norm applied over Wx
  • about 3 years Torch-RNN poetry
  • about 3 years Unsupported HDF5 version
  • about 3 years Set checkpoint_every as 10% of total iteration
  • about 3 years masks for minibatch.
  • over 3 years [Q] making a simple model for a sequence learning
torch-rnn open pull requests (View All Pulls)
  • Stream sampled text
  • Update to support Python 3
  • GRU cells
  • Option for non HDF5
  • (Optionally) Stream sampled text
  • Bi-Directional RNN support
  • Modified to accept syllabic prediction...
  • HTTP server for generating samples on demand
  • Avoiding the index being out of bound
  • support UTF-8 start_text
  • Improved bookkeeping of settings when resuming from existing checkpoint.
  • Fix crash when loading old checkpoints with `-reset_iterations 0` flag
  • Documentation: Indicate missing defaults in flags.
  • Output each char individually, avoid cost of appending to static string
  • sample: Add output file option
  • Add preprocess script to use words as tokens with typo and rare word reduction
  • Add python 3 compatability to python scripts
  • LSTM: Move Wx matrix multiplication out of the loop in forward
  • Bring back and fix TemporalAdapter
  • Add epoch and validation loss to checkpoint
  • Do not use test and validation datasets whilst building the vocabulary
  • Fill and write each array before creating the next one, to save memory.
  • Fixed broken lua-cjson link
  • Updates for compatibility with changes in new HDF5 packages
  • Preprocess words
  • Added a '-seed' option to sample.lua to allow for an identical rerun.…
torch-rnn questions on Stackoverflow (View All Questions)
  • Torch: RNN clones run out of GPU memory
torch-rnn list of languages used
Other projects in Lua