Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here

backup-utils

GitHub Enterprise Backup Utilities

Subscribe to updates I use backup-utils


Statistics on backup-utils

Number of watchers on Github 435
Number of open issues 17
Average time to close an issue 6 days
Main language Shell
Average time to merge a PR 6 days
Open pull requests 32+
Closed pull requests 16+
Last commit over 1 year ago
Repo Created over 5 years ago
Repo Last Updated over 1 year ago
Size 1.51 MB
Organization / Authorgithub
Latest Releasev2.11.3
Contributors26
Page Updated
Do you use backup-utils? Leave a review!
View open issues (17)
View backup-utils activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating backup-utils for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

GitHub Enterprise Backup Utilities

This repository includes backup and recovery utilities for GitHub Enterprise.

Features

The backup utilities implement a number of advanced capabilities for backup hosts, built on top of the backup and restore features already included in GitHub Enterprise.

  • Complete GitHub Enterprise backup and recovery system via two simple utilities:
    ghe-backup and ghe-restore.
  • Online backups. The GitHub appliance need not be put in maintenance mode for the duration of the backup run.
  • Incremental backup of Git repository data. Only changes since the last snapshot are transferred, leading to faster backup runs and lower network bandwidth and machine utilization.
  • Efficient snapshot storage. Only data added since the previous snapshot consumes new space on the backup host.
  • Multiple backup snapshots with configurable retention periods.
  • Backup commands run under the lowest CPU/IO priority on the GitHub appliance, reducing performance impact while backups are in progress.
  • Runs under most Linux/Unix environments.
  • MIT licensed, open source software maintained by GitHub, Inc.

Requirements

The backup utilities should be run on a host dedicated to long-term permanent storage and must have network connectivity with the GitHub Enterprise appliance.

Backup host requirements

Backup host software requirements are modest: Linux or other modern Unix operating system with bash, git, OpenSSH 5.6 or newer, and rsync v2.6.4 or newer.

The backup host must be able to establish network connections outbound to the GitHub appliance over SSH. TCP port 122 is used to backup GitHub Enterprise 2.0 or newer instances, and TCP port 22 is used for older versions (11.10.34X).

Storage requirements

Storage requirements vary based on current Git repository disk usage and growth patterns of the GitHub appliance. We recommend allocating at least 5x the amount of storage allocated to the primary GitHub appliance for historical snapshots and growth over time.

The backup utilities use hard links to store data efficiently, so the backup snapshots must be written to a filesystem with support for hard links.

Using a case sensitive file system is strongly recommended to avoid conflicts.

GitHub Enterprise version requirements

The backup utilities are fully supported under GitHub Enterprise 2.0 or greater.

The previous release series (11.10.34x) is also supported but must meet minimum version requirements. For online and incremental backup support, the GitHub Enterprise instance must be running version 11.10.342 or above.

Earlier versions are supported, but online and incremental backups are not supported. We strongly recommend upgrading to the latest release if you're running a version prior to 11.10.342. Visit enterprise.github.com to download the most recent GitHub Enterprise version.

Note: You can restore a snapshot that's at most two feature releases behind the restore target's version of GitHub Enterprise. For example, to restore a snapshot of GitHub Enterprise 2.4, the target GitHub Enterprise appliance must be running GitHub Enterprise 2.5.x or 2.6.x. You can't restore a snapshot from 2.4 to 2.7, because that's three releases ahead.

Getting started

  1. Download the latest release version and extract the repository using tar:

    tar -xzvf /path/to/github-backup-utils-vMAJOR.MINOR.PATCH.tar.gz

    or clone the repository using Git:

    git clone -b stable https://github.com/github/backup-utils.git

  2. Copy the backup.config-example file to backup.config and modify as necessary. The GHE_HOSTNAME value must be set to the GitHub Enterprise host name. Additional options are available and documented in the configuration file but none are required for basic backup functionality.

* backup-utils will attempt to load the backup configuration from the following locations, in this order:

  ```
  $GHE_BACKUP_CONFIG (User configurable environment variable)
  $GHE_BACKUP_ROOT/backup.config (Root directory of backup-utils install)
  $HOME/.github-backup-utils/backup.config
  /etc/github-backup-utils/backup.config
  ```
* In a clustering environment, the `GHE_EXTRA_SSH_OPTS` key must be configured with the `-i <abs path to private key>` SSH option.
  1. Add the backup host's SSH key to the GitHub appliance as an Authorized SSH key. See Adding an SSH key for shell access for instructions.

  2. Run bin/ghe-host-check to verify SSH connectivity with the GitHub appliance.

  3. Run bin/ghe-backup to perform an initial full backup.

Migrating from GitHub Enterprise v11.10.34x to v2.0, or v2.1

If you are migrating from GitHub Enterprise version 11.10.34x to 2.0 or 2.1 (note, migrations to versions greater than 2.1 are not officially supported), please see the Migrating from GitHub Enterprise v11.10.34x documentation in the GitHub Enterprise System Administrator's Guide. It includes important information on using the backup utilities to migrate data from your v11.10.34x instance to v2.0 or v2.1.

Using the backup and restore commands

After the initial backup, use the following commands:

  • The ghe-backup command creates incremental snapshots of repository data, along with full snapshots of all other pertinent data stores.
  • The ghe-restore command restores snapshots to the same or separate GitHub Enterprise appliance. You must add the backup host's SSH key to the target GitHub Enterprise appliance before using this command.
Example backup and restore usage

The following assumes that GHE_HOSTNAME is set to github.example.com in backup.config.

Creating a backup snapshot:

$ ghe-backup
Starting backup of github.example.com in snapshot 20140727T224148
Connect github.example.com OK (v11.10.343)
Backing up GitHub settings ...
Backing up SSH authorized keys ...
Backing up SSH host keys ...
Backing up MySQL database ...
Backing up Redis database ...
Backing up Git repositories ...
Backing up GitHub Pages ...
Backing up Elasticsearch indices ...
Completed backup of github.example.com in snapshot 20140727T224148 at 23:01:58

Restoring from last successful snapshot to a newly provisioned GitHub Enterprise appliance at IP 5.5.5.5:

$ ghe-restore 5.5.5.5
Starting rsync restore of 5.5.5.5 from snapshot 20140727T224148
Connect 5.5.5.5 OK (v11.10.343)
Enabling maintenance mode on 5.5.5.5 ...
Restoring Git repositories ...
Restoring GitHub Pages ...
Restoring MySQL database ...
Restoring Redis database ...
Restoring SSH authorized keys ...
Restoring Elasticsearch indices ...
Restoring SSH host keys ...
Completed restore of 5.5.5.5 from snapshot 20140817T174152
Visit https://5.5.5.5/setup/settings to configure the recovered appliance.

A different backup snapshot may be selected by passing the -s argument and the datestamp-named directory from the backup location.

The ghe-backup and ghe-restore commands also have a verbose output mode (-v) that lists files as they're being transferred. It's often useful to enable when output is logged to a file.

When restoring to an already configured GHE instance, settings, certificate, and license data are not restored to prevent overwriting manual configuration on the restore host. This behavior can be overridden by passing the -c argument to ghe-restore, forcing settings, certificate, and license data to be overwritten with the backup copy's data.

Scheduling backups

Regular backups should be scheduled using cron(8) or similar command scheduling service on the backup host. The backup frequency will dictate the worst case recovery point objective (RPO) in your backup plan. We recommend the following:

  • Hourly backups for GitHub Enterprise versions 11.10.342 or greater (due to improved online and incremental backup support)
  • Daily backups for versions prior to 11.10.342.

Note: the time required to do full offline backups of large datasets under GitHub Enterprise versions prior to 11.10.342 may prohibit the use of daily backups. We strongly recommend upgrading to 11.10.342 or greater in that case.

Example scheduling usage

The following examples assume the backup utilities are installed under /opt/backup-utils. The crontab entry should be made under the same user that manual backup/recovery commands will be issued under and must have write access to the configured GHE_DATA_DIR directory.

Note that the GHE_NUM_SNAPSHOTS option in backup.config should be tuned based on the frequency of backups. The ten most recent snapshots are retained by default. The number should be adjusted based on backup frequency and available storage.

To schedule hourly backup snapshots with verbose informational output written to a log file and errors generating an email:

MAILTO=admin@example.com

0 * * * * /opt/backup-utils/bin/ghe-backup -v 1>>/opt/backup-utils/backup.log 2>&1

To schedule nightly backup snapshots instead, use:

MAILTO=admin@example.com

0 0 * * * /opt/backup-utils/bin/ghe-backup -v 1>>/opt/backup-utils/backup.log 2>&1

Backup snapshot file structure

Backup snapshots are stored in rotating increment directories named after the date and time the snapshot was taken. Each snapshot directory contains a full backup snapshot of all relevant data stores. Repository, Search, and Pages data is stored efficiently via hard links.

Please note Symlinks must be maintained when archiving backup snapshots. Dereferencing or excluding symlinks, or storing the snapshot contents on a filesystem which does not support symlinks will result in operational problems when the data is restored.

The following example shows a snapshot file hierarchy for hourly frequency. There are five snapshot directories, with the current symlink pointing to the most recent successful snapshot:

./data
   |- 20140724T010000
   |- 20140725T010000
   |- 20140726T010000
   |- 20140727T010000
   |- 20140728T010000
      |- authorized-keys.json
      |- elasticsearch/
      |- enterprise.ghl
      |- mysql.sql.gz
      |- pages/
      |- redis.rdb
      |- repositories/
      |- settings.json
      |- ssh-host-keys.tar
      |- strategy
      |- version
   |- current -> 20140728T010000

Note: the GHE_DATA_DIR variable set in backup.config can be used to change the disk location where snapshots are written.

How does backup utilities differ from a High Availability replica?

It is recommended that both backup utilities and an High Availability replica are used as part of a GitHub Enterprise deployment but they serve different roles.

The purpose of the High Availability replica

The High Availability replica is a fully redundant secondary GitHub Enterprise instance, kept in sync with the primary instance via replication of all major datastores. This active/passive cluster configuration is designed to minimize service disruption in the event of hardware failure or major network outage affecting the primary instance. Because some forms of data corruption or loss may be replicated immediately from primary to replica, it is not a replacement for the backup utilities as part of your disaster recovery plan.

The purpose of the backup utilities

Backup utilities are a disaster recovery tool. This tool takes date-stamped snapshots of all major datastores. These snapshots are used to restore an instance to a prior state or set up a new instance without having another always-on GitHub Enterprise instance (like the High Availability replica).

Support

If you find a bug or would like to request a feature in backup-utils, please open an issue or pull request on this repository. If you have a question related to your specific GitHub Enterprise setup or would like assistance with backup site setup or recovery, please contact our Enterprise support team instead.

backup-utils open issues Ask a question     (View All Issues)
  • about 3 years Compression on Snapshots
  • about 3 years Tests fail on ghe-backup
  • about 3 years `readlink -f` not supported on OS X
  • over 3 years Docs: Backup host filesystem should be case-sensitive
  • over 3 years Unlink fails on Solaris when run from cron as non root.
  • over 3 years Issues with ps on Solaris
  • over 3 years Custom CA certificates aren't included in backup-utils
  • almost 4 years S3 backups require that the ~/.s3cfg file exists even though its not always needed
  • about 4 years Git issues due to a backup run failure cause all future backups to fail silently
  • about 4 years Tests fail when /usr/bin/python != python2
  • over 4 years Feature: Allow overriding config values on restore
  • over 4 years Status Indicator
  • over 4 years unlink will fail on Solaris unless backup is run as root
  • over 4 years Backup to mounted samba share fails silently
  • almost 5 years Does backup to s3 work?
  • almost 5 years s3cmd feature
  • about 5 years Repository Liberation
  • about 5 years Built in support for warm DR standby
backup-utils open pull requests (View All Pulls)
  • update new S3 backup/restore method
  • Prevent multiple restores
  • Ask for index name when using Elasticsearch _cat API
  • fall back to system /etc/github-backup-utils/ghe-backup-config if none found in share
  • Add Bash styleguide
  • Minimum version enforcement for cluster restores
  • Use .sync_in_progress file during restores
  • Ignore all dpkg-buildpackage generated files
  • Don't change working directory when loading in ghe-backup-config
  • Benchmarking restores
  • restore host keys for cluster environment as well
  • Added a new GHE_REMOTE_GIT_FSCK option
  • Help and Verbose output for commands in bin/
  • Add backup version note
  • Adding a script to speed up cluster backups on enterprise
  • Bump version: 2.7.2
  • Fix macOS/BSD regressions
  • Ignore errors from ps
  • Fix ghe-backup-config test flakiness
  • Use Travis container infrastructure
  • Set restore status on all cluster nodes
  • Remove redundant hookshot elasticsearch index backups and restores
  • Automate backup-utils releases
  • Add usage instructions for user-facing commands
  • Solaris 11 fixes and new configuration documentation
  • Retry loop for redis-cli BGSAVE
  • Update README.md
  • Improve detection of failures in cluster backup rsync threads
  • move host check so that ES dump scripts can run standalone
  • Remove indices created with ES 1.x
  • Retry with the admin ssh port on network unreachable too.
  • WIP: Unify the backup & restore process
backup-utils list of languages used
backup-utils latest release notes
v2.11.3 GitHub Enterprise Backup Utilities v2.11.3

Includes general improvements, bug fixes and support for GitHub Enterprise v2.11.3

  • Update argument parsing and help/usage consistency #320
  • Fix failures variable #353
  • Remove other snapshot contents before removing the incomplete file #358
  • Backup and restore the management console password #361
  • Check for git before allowing SSH multiplex #362
  • Cleanup SSH multiplexing on exit #363
  • Filter cluster nodes by role during backup and restore #367
  • Optimise route generation and finalisation during cluster restores of pages #369
  • Allow extra rsync options to override default options #370
v2.11.2 GitHub Enterprise Backup Utilities v2.11.2

Includes general improvements, bug fixes and support for GitHub Enterprise v2.11.2

  • Allow the restoration of configuration to Cluster #347
  • Switch to TMPDIR before initiating SSH multiplexing workaround to prevent locking the destination filesystem #348
v2.11.1 GitHub Enterprise Backup Utilities v2.11.1

Includes general improvements, bug fixes and support for GitHub Enterprise v2.11.0

  • Refresh the existing indices when restoring Elasticsearch indices to cluster #328
  • Fix failure to restore 2.9/2.10 backups to 2.11 prevented by incorrect detection of the audit log migration #333
  • Use git to generate short name for SSH multiplex control path #335
  • Remove use of --literally when computing arbitrary shasum #338
  • Remove -o option from ps use #341
Other projects in Shell