Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here


self-healing etcd on mesos!

Subscribe to updates I use etcd-mesos

Statistics on etcd-mesos

Number of watchers on Github 60
Number of open issues 43
Average time to close an issue 16 days
Main language Go
Average time to merge a PR 4 days
Open pull requests 6+
Closed pull requests 7+
Last commit over 2 years ago
Repo Created over 4 years ago
Repo Last Updated almost 2 years ago
Size 1.6 MB
Organization / Authormesosphere
Page Updated
Do you use etcd-mesos? Leave a review!
View open issues (43)
View etcd-mesos activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating etcd-mesos for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

[ALPHA] etcd-mesos

This is an Apache Mesos framework that runs an etcd cluster. It performs periodic health checks to ensure that the cluster has a stable leader and that raft is making progress. It replaces nodes that die.



  • [x] runs, monitors, and administers an etcd cluster of your desired size
  • [x] recovers from n/2-1 failures by reconfiguring the etcd cluster and launching replacement nodes
  • [x] recovers from up to n-1 simultaneous failures by picking a survivor to re-seed a new cluster (ranks survivors by raft index, prefering the replica with the highest commit)
  • [x] etcd proxy configurer (etcd-mesos-proxy) and optional SRV record support via mesos-dns


Marathon spec:

  "id": "etcd",
  "container": {
    "docker": {
      "forcePullImage": true,
      "image": "mesosphere/etcd-mesos:0.1.0-alpha-target-23-24-25"
    "type": "DOCKER"
  "cpus": 0.2,
  "env": {
    "FRAMEWORK_NAME": "etcd",
    "WEBURI": "http://etcd.marathon.mesos:$PORT0/stats",
    "MESOS_MASTER": "zk://master.mesos:2181/mesos",
    "ZK_PERSIST": "zk://master.mesos:2181/etcd",
    "AUTO_RESEED": "true",
    "RESEED_TIMEOUT": "240",
    "CLUSTER_SIZE": "3",
    "CPU_LIMIT": "1",
    "DISK_LIMIT": "4096",
    "MEM_LIMIT": "2048",
    "VERBOSITY": "1"
  "healthChecks": [
      "gracePeriodSeconds": 60,
      "intervalSeconds": 30,
      "maxConsecutiveFailures": 0,
      "path": "/healthz",
      "portIndex": 0,
      "protocol": "HTTP"
  "instances": 1,
  "mem": 128.0,
  "ports": [


First, check out the tagged release for your version of Mesos.

For Mesos versions 23, 24, or 25, check out v0.1.0-alpha-target-23-24-25

For Mesos versions 22 and below (the farther below, the less the chances of compatibility), check out v0.1.0-alpha-target-22

Next, built it!


The important binaries (etcd-mesos-scheduler, etcd-mesos-proxy, etcd-mesos-executor) are now present in the bin subdirectory.

A typical production invocation will look something like this:

/path/to/etcd-mesos-scheduler \
    -log_dir=/var/log/etcd-mesos \
    -master=zk://zk1:2181,zk2:2181,zk3:2181/mesos \
    -framework-name=etcd \
    -cluster-size=5 \
    -executor-bin=/path/to/etcd-mesos-executor \
    -etcd-bin=/path/to/etcd \
    -etcdctl-bin=/path/to/etcdctl \

If you'd like to build a new docker container, change the DOCKER_ORG and VERSION in the Makefile, and then run:

make docker

service discovery

Options for finding your etcd nodes on mesos:

  • Run the included proxy binary locally on systems that use etcd. It retrieves the etcd configuration from mesos and starts an etcd proxy node. Note that this it not a good idea on clusters with lots of tasks running, as the master will iterate through each task and spit out a fairly large chunk of JSON, so this approach should be avoided in favor of mesos-dns on larger clusters.

    etcd-mesos-proxy --master=zk://localhost:2181/mesos --framework-name=etcd
  • Use mesos-dns or another system that creates SRV records and have an etcd proxy use SRV discovery:

    etcd --proxy=on --discovery-srv=etcd.mesos
  • Use Mesos DNS or another DNS SRV system and have clients resolve _etcd-server._client.<framework name>.mesos

  • Use another system that builds configuration from mesos's state.json endpoint. This is how #1 works, so check out the code for it in cmd/etcd-mesos-proxy/app.go if you want to go this route. Be sure to minimize calls to the master for state.json on larger clusters, as this becomes an expensive operation that can easily DDOS your master if you are not careful.

  • Current membership may be queried from the etcd-mesos-scheduler's /members http endpoint that listens on the --admin-port (default 23400)

etcd-mesos open issues Ask a question     (View All Issues)
  • about 3 years Build is failing...
  • about 3 years tasks not being cleaned up
  • about 3 years Building inside a Docker Container failing
  • over 3 years zk reconnection problems due to shortage of file descriptors
  • over 3 years etcd deployment fails with DCOS if framework found in Zookeeper
  • over 3 years rebuild etcd-mesos dcos image and publish new version to multiverse
  • almost 4 years framework removal detection broken starting with mesos-0.24.0
  • almost 4 years Support Framework Authentication
  • about 4 years proxy v2 api from etcd-mesos scheduler to backing cluster
  • about 4 years support external URL's for etcd, etcdctl, executor bins
  • about 4 years proxy should use zk state, not state.json
  • about 4 years document how to migrate backing ZK data to a new chroot/cluster
  • about 4 years support full-cluster loss with automated backup and recovery
  • about 4 years investigate patching + upstreaming proxy to bail out or requery when reconnected to a different clusterid
  • about 4 years address multiverse packaging feedback
  • about 4 years stricter threshold for lock
  • about 4 years mesos version detector script
  • about 4 years bail out if reregistered version != registered version
  • about 4 years tune down backoffs
  • about 4 years nuke PumpTheBrakes
  • about 4 years add support for mesos persistent volumes for optional recovery
  • about 4 years Add / handler to admin server with a minimal dashboard
  • about 4 years versioned alpha release branches compatible with 0.22, 0.23, 0.24 versions of mesos
  • about 4 years add tests for reseed logic
  • about 4 years configurable static port
  • over 4 years fix logging and log rotation
etcd-mesos open pull requests (View All Pulls)
  • Add advertise-address flag to handle the scheduler being behind a NAT
  • Allow authentication to clusters
  • Remove deprecated safety.
  • Revamp
  • Move container base image to Alpine Linux 3.5.
  • Decline offers if there are no pending tasks.
etcd-mesos list of languages used
Other projects in Go