Want to take your software engineering career to the next level? Join the mailing list for career tips & advice Click here


Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

Star full 4f7b624809470f25b6493d5a7b30d9b9cb905931146e785d67c86ef0c205a402Star full 4f7b624809470f25b6493d5a7b30d9b9cb905931146e785d67c86ef0c205a402Star full 4f7b624809470f25b6493d5a7b30d9b9cb905931146e785d67c86ef0c205a402Star full 4f7b624809470f25b6493d5a7b30d9b9cb905931146e785d67c86ef0c205a402Star half bd79095782ee4930099175e5ce7f4c89fa3ddabcd56fffcc7c74f6f2a2d46b27 (3 ratings)
Rated 4.5 out of 5
Subscribe to updates I use char-rnn-tensorflow

Statistics on char-rnn-tensorflow

Number of watchers on Github 2608
Number of open issues 48
Average time to close an issue 17 days
Main language Python
Average time to merge a PR 19 days
Open pull requests 31+
Closed pull requests 11+
Last commit almost 2 years ago
Repo Created almost 5 years ago
Repo Last Updated 3 months ago
Size 508 KB
Organization / Authorsherjilozair
Page Updated
Do you use char-rnn-tensorflow? Leave a review!
View open issues (48)
View char-rnn-tensorflow activity
View on github
Book a Mock Interview With Me (Silicon Valley Engineering Leader, 100s of interviews conducted)
Software engineers: It's time to get promoted. Starting NOW! Subscribe to my mailing list and I will equip you with tools, tips and actionable advice to grow in your career.
Evaluating char-rnn-tensorflow for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)


Join the chat at https://gitter.im/char-rnn-tensorflow/Lobby Coverage Status Build Status

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow.

Inspired from Andrej Karpathy's char-rnn.


Basic Usage

To train with default parameters on the tinyshakespeare corpus, run python train.py. To access all the parameters use python train.py --help.

To sample from a checkpointed model, python sample.py. Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU. To force CPU mode, use export CUDA_VISIBLE_DEVICES="" and unset CUDA_VISIBLE_DEVICES afterward (resp. set CUDA_VISIBLE_DEVICES="" and set CUDA_VISIBLE_DEVICES= on Windows).

To continue training after interruption or to run on more epochs, python train.py --init_from=save


You can use any plain text file as input. For example you could download The complete Sherlock Holmes as such:

cd data
mkdir sherlock
cd sherlock
wget https://sherlock-holm.es/stories/plain-text/cnus.txt
mv cnus.txt input.txt

Then start train from the top level directory using python train.py --data_dir=./data/sherlock/

A quick tip to concatenate many small disparate .txt files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt.


Tuning your models is kind of a dark art at this point. In general:

  1. Start with as much clean input.txt as possible e.g. 50MiB
  2. Start by establishing a baseline using the default settings.
  3. Use tensorboard to compare all of your runs visually to aid in experimenting.
  4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
  5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
  6. Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will remember for durations longer than this sequence, but the effect falls off for longer character distances.
  7. Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.


To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your log_dir. E.g.:

$ tensorboard --logdir=./logs/

Then open a browser to http://localhost:6006 or the correct IP/Port specified.


  • [ ] Add explanatory comments
  • [ ] Expose more command-line arguments
  • [ ] Compare accuracy and performance with char-rnn
  • [ ] More Tensorboard instrumentation


Please feel free to:

  • Leave feedback in the issues
  • Open a Pull Request
  • Join the gittr chat
  • Share your success stories and data sets!
char-rnn-tensorflow open issues Ask a question     (View All Issues)
  • almost 4 years Interface to save checkpoint?
  • almost 4 years How to reduce GPU memory?
  • almost 4 years Enhancement - Dockerfile + GIST
  • almost 4 years Tensorflow 0.11 Tuple issue
  • almost 4 years loop function
  • about 4 years W tensorflow/core/framework/op_kernel.cc:909] Resource exhausted:OOM when allocating tensor with shape[1250,25670]
  • over 4 years Char sequence probability
  • over 4 years Support Unicode Input Files
  • over 4 years No dropout option?
  • over 4 years No validation/test?
  • over 4 years Compat issues with latest TF
  • over 4 years MemoryError
  • over 4 years Why we need ydata[-1] = xdata[0]?
  • over 4 years Tuning the temperature
  • over 4 years weighted_pick() can return invalid index
  • over 4 years create_batches in TextLoader in utils.py doesn't seem to transform the data into batches correctly
  • over 4 years Can not convert a list into a Tensor or Operation
  • over 4 years create_batches() from utils.py throws error
  • over 4 years IndexError is out of Bounce
  • almost 5 years How to calculate prob of a new sentence.
  • almost 5 years Curious...
  • almost 5 years Not a big deal but your output files are not part of .gitignore
  • almost 5 years Sampling probablilities do not sum to 1
char-rnn-tensorflow open pull requests (View All Pulls)
  • Changes which might be interesting
  • when training, model can be initialized from previously saved model
  • Better error message
  • Added the last save
  • Implement temperature
  • Dropouts
  • Open pickled files in binary mode
  • Allow for infinite sampling streams.
  • Not to create unnessary tensorflow variables when sampling
  • More portable way of executing tensor
  • Fix #47
  • add a script to compute the perplexity of test data
  • BRNN + perplexity evaluation
  • Adding a bidirectional neural network
  • Replace deprecated Tensorflow functions
  • Make save_dir if doesn't exist
  • Simple tests on Travis CI
  • Show defaults in --help
  • Add .gitignore
  • better code to employ decaying learning rate
  • Minor fixes
  • Improve readme and command line help
  • Enable execution of main Python scripts train.py and sample.py
  • Removed unnecessary line
  • Use most fequent character as prime instead of ' ' to not fail on mod…
  • select-device
  • cyrillic sampling bugfix
  • Add explanatory comments to files
  • Make sampling with a unicode prime work correctly
  • add tf
  • Implement temperature
char-rnn-tensorflow list of languages used
More projects by sherjilozair View all
Other projects in Python
Powered by Autocode - Instant Webhooks, Scripts and APIs
Autocode logo wordmark