Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here


Code for my book on Multi-Armed Bandit Algorithms

Subscribe to updates I use BanditsBook

Statistics on BanditsBook

Number of watchers on Github 507
Number of open issues 5
Average time to close an issue 1 day
Main language R
Average time to merge a PR about 15 hours
Open pull requests 3+
Closed pull requests 2+
Last commit about 3 years ago
Repo Created almost 7 years ago
Repo Last Updated over 1 year ago
Size 60 KB
Organization / Authorjohnmyleswhite
Page Updated
Do you use BanditsBook? Leave a review!
View open issues (5)
View BanditsBook activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating BanditsBook for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

Code to Accompany the Book Bandit Algorithms for Website Optimization

This repo contains code in several languages that implements several standard algorithms for solving the Multi-Armed Bandits Problem, including:

  • epsilon-Greedy
  • Softmax (Boltzmann)
  • UCB1
  • UCB2
  • Hedge
  • Exp3

It also contains code that provides a testing framework for bandit algorithms based around simple Monte Carlo simulations.


This codebase is split up by language. In most languages, there are parallel implementations of the core algorithms and infrastructure for testing the algorithms:

  • Python
  • Julia
  • Ruby

In R, there is a body of code for visualizing the results of simulations and analyzing those results. The R code would benefit from some refactoring to make it DRYer.

If you're interested in seeing how some of these algorithms would be implemented in Javascript, you should try out Mark Reid's code:

If you're looking for Java code, try Dani Sola's work:

If you're looking for Scala code, try everpeace(Shingo Omura)'s work:

If you're looking for Go code, try Rany Keddo's work:

If you're looking for Clojure code, try Paul Ingles's work:

If you're looking for Swift code, see

For a Flask implementation, see

Getting Started

To try out this code, you can go into the Python or Julia directories and then run the demo script.

In Python, that looks like:


In Julia, that looks like:

julia demo.jl

You should step through that code line-by-line to understand what the functions are doing. The book provides more in-depth explanations of how the algorithms work.

The Ruby code was contributed by Kashif Rasul. If you're interested in translating the code into another language, please submit a pull request. I will merge any new implementations as soon as I can.

Adding New Algorithms: API Expectations

As described in the book, a Bandit algorithm should implement two methods:

  • select_arm(): A method that returns the index of the Arm that the Bandit object selects on the current play. No arguments are required.
  • update(): A method that updates the internal state of the Bandit object in response to its most recently selected arm's reward. The index of the chosen arm and the amount of reward received must be passed as arguments.

As described in the book, an Arm simulator should implement:

  • draw(): A method that returns a single instance of reward from the arm that was pulled. No arguments are required.

In addition, the Bandit algorithms are designed to implement one additional method used in simulations:

  • initialize(): A method that returns nothing. Instead, this method resets all of the data-driven variables in a Bandit object. For most objects, this resets the counts and values field to their initial states. No arguments are required.

Beyond the testing framework described in the book, I am currently providing an additional system built around the concept of an Environment. Environment objects encapsulate not only a set of Arms, but also a mechanism for having those Arms change over time. This allows you to simulate complex scenarios that aren't well described by a constant set of arms.

If you would like to implement your own Environment, you will need to provide a very simple interface. The Environment interface requries you to implement two methods:

  • arms(): A method that returns the array of arms that exist at time T. You must pass T as an argument.
  • n_arms(): A method that returns the number of arms that the environment will return with each call to arms(). While the arms may change over time, the number of arms should not. No arguments are required.
BanditsBook open issues Ask a question     (View All Issues)
  • over 5 years Change description to show code exists for multiple languages
  • over 6 years Bug in epsilon_greedy/plot_standard.R?
BanditsBook open pull requests (View All Pulls)
  • for python: prevent deterministic start for each simulation (usually one...
  • Help make Python examples more pythonic
  • add thompson sampling
BanditsBook list of languages used
Other projects in R