Photo of DeepakNess DeepakNess

Backing up important GitHub repos

Lately, GitHub hasn't been very stable and I also read a story of GitHub blocking someone's account on Reddit, so I do not have a lot of trust to keep my code only on one platform. And I used my small Raspberry Pi 4B 1 GB device to keep additional copies of my important repos.

Here's the bash script to mirror specific GitHub repos locally (with optional GitLab push). Supports org repos, wikis, LFS, allowlist/blocklist filtering, and dry-run mode:

First, this backup_github_repos.sh file:

#!/usr/bin/env bash
set -Eeuo pipefail

usage() {
  cat <<'EOF'
Backup all owned GitHub repositories to local mirror clones and optionally push them to GitLab.

Usage:
  ./backup_github_repos.sh [path/to/config.env]

Prerequisites:
  - git
  - gh
  - jq
  - gh auth login
  - gh auth setup-git

Optional:
  - git-lfs (if FETCH_LFS=1)
  - GitLab SSH key (if ENABLE_GITLAB_PUSH=1)

This script:
  - mirrors all repos you own on GitHub
  - can also mirror selected GitHub org repos
  - can limit backups to an exact allowlist of repos
  - can back up wiki repos when they exist
  - can push matching repos to GitLab over SSH

It does NOT back up GitHub issues, pull requests, discussions, or release assets.
EOF
}

log() {
  printf '[%s] %s\n' "$(date '+%F %T')" "$*"
}

warn() {
  printf '[%s] WARNING: %s\n' "$(date '+%F %T')" "$*" >&2
}

die() {
  printf '[%s] ERROR: %s\n' "$(date '+%F %T')" "$*" >&2
  exit 1
}

run() {
  if [[ "${DRY_RUN}" == "1" ]]; then
    printf 'DRY_RUN:'
    printf ' %q' "$@"
    printf '\n'
    return 0
  fi
  "$@"
}

trim() {
  local value="${1:-}"
  value="${value#"${value%%[![:space:]]*}"}"
  value="${value%"${value##*[![:space:]]}"}"
  printf '%s' "$value"
}

require_cmd() {
  command -v "$1" >/dev/null 2>&1 || die "Missing required command: $1"
}

csv_list_contains() {
  local needle="$1"
  local csv="$2"
  local item

  IFS=',' read -r -a items <<<"$csv"
  for raw_item in "${items[@]}"; do
    item="$(trim "$raw_item")"
    [[ -n "$item" ]] || continue
    [[ "$item" == "$needle" ]] && return 0
  done

  return 1
}

gh_repo_stream_owned() {
  gh api --paginate "/user/repos?affiliation=owner&per_page=100" \
    | jq -c '.[] | {
        name,
        full_name,
        private,
        archived,
        fork,
        has_wiki,
        clone_url,
        ssh_url,
        owner: .owner.login
      }'
}

gh_repo_stream_org() {
  local org="$1"
  gh api --paginate "/orgs/${org}/repos?type=all&per_page=100" \
    | jq -c '.[] | {
        name,
        full_name,
        private,
        archived,
        fork,
        has_wiki,
        clone_url,
        ssh_url,
        owner: .owner.login
      }'
}

write_repo_metadata() {
  local repo_json="$1"
  local owner="$2"
  local repo="$3"
  local metadata_dir="${BACKUP_ROOT}/metadata/${owner}"
  mkdir -p "$metadata_dir"
  jq '.' <<<"$repo_json" > "${metadata_dir}/${repo}.json"
}

sync_git_mirror() {
  local source_url="$1"
  local destination="$2"
  local label="$3"

  mkdir -p "$(dirname "$destination")"

  if [[ -d "$destination" ]]; then
    log "Updating mirror: ${label}"
    run git -C "$destination" remote set-url origin "$source_url"
    run git -C "$destination" remote update --prune
  else
    log "Creating mirror: ${label}"
    run git clone --mirror "$source_url" "$destination"
  fi
}

fetch_lfs_objects() {
  local destination="$1"
  local label="$2"

  if [[ "${FETCH_LFS}" != "1" ]]; then
    return 0
  fi

  if ! git lfs version >/dev/null 2>&1; then
    die "FETCH_LFS=1 but git-lfs is not installed"
  fi

  log "Fetching LFS objects: ${label}"
  if ! run git -C "$destination" lfs fetch --all origin; then
    warn "LFS fetch failed for ${label}. The Git mirror is still valid, but LFS content may be incomplete."
  fi
}

gitlab_remote_url() {
  local repo_name="$1"
  printf 'git@gitlab.com:%s/%s.git' "$GITLAB_NAMESPACE" "$repo_name"
}

gitlab_remote_exists() {
  local repo_name="$1"
  git ls-remote "$(gitlab_remote_url "$repo_name")" >/dev/null 2>&1
}

ensure_gitlab_repo() {
  local repo_name="$1"

  if gitlab_remote_exists "$repo_name"; then
    return 0
  fi

  if [[ "${GITLAB_CREATE_REPOS}" != "1" ]]; then
    die "GitLab repo ${GITLAB_NAMESPACE}/${repo_name} does not exist and GITLAB_CREATE_REPOS=0"
  fi

  log "GitLab repo ${GITLAB_NAMESPACE}/${repo_name} will be created on first push if your SSH key can create projects in that namespace"
}

push_git_mirror_to_gitlab() {
  local destination="$1"
  local repo_name="$2"
  local label="$3"
  local remote_url

  remote_url="$(gitlab_remote_url "$repo_name")"

  if git -C "$destination" remote get-url gitlab >/dev/null 2>&1; then
    run git -C "$destination" remote set-url gitlab "$remote_url"
  else
    run git -C "$destination" remote add gitlab "$remote_url"
  fi

  log "Pushing branches and tags to GitLab: ${label}"
  run git -C "$destination" push --prune gitlab \
    '+refs/heads/*:refs/heads/*' \
    '+refs/tags/*:refs/tags/*'

  if [[ "${FETCH_LFS}" == "1" ]]; then
    if ! run git -C "$destination" lfs push --all gitlab; then
      warn "LFS push failed for ${label}. Check GitLab LFS configuration if this repo uses LFS."
    fi
  fi
}

should_backup_repo() {
  local repo_json="$1"
  local archived
  local forked
  local full_name

  archived="$(jq -r '.archived' <<<"$repo_json")"
  forked="$(jq -r '.fork' <<<"$repo_json")"
  full_name="$(jq -r '.full_name' <<<"$repo_json")"

  if [[ -n "${REPO_ALLOWLIST}" ]] && ! csv_list_contains "$full_name" "$REPO_ALLOWLIST"; then
    return 1
  fi

  if [[ -n "${REPO_BLOCKLIST}" ]] && csv_list_contains "$full_name" "$REPO_BLOCKLIST"; then
    return 1
  fi

  if [[ "$archived" == "true" && "${INCLUDE_ARCHIVED_REPOS}" != "1" ]]; then
    return 1
  fi

  if [[ "$forked" == "true" && "${INCLUDE_FORKS}" != "1" ]]; then
    return 1
  fi

  return 0
}

sync_repo_bundle() {
  local repo_json="$1"
  local full_name
  local owner
  local repo
  local clone_url
  local private_flag
  local has_wiki
  local mirror_dir
  local wiki_dir
  local wiki_url
  local wiki_repo_name

  full_name="$(jq -r '.full_name' <<<"$repo_json")"
  owner="${full_name%/*}"
  repo="${full_name#*/}"
  clone_url="$(jq -r '.clone_url' <<<"$repo_json")"
  private_flag="$(jq -r '.private' <<<"$repo_json")"
  has_wiki="$(jq -r '.has_wiki' <<<"$repo_json")"

  mirror_dir="${BACKUP_ROOT}/mirrors/${owner}/${repo}.git"
  write_repo_metadata "$repo_json" "$owner" "$repo"
  sync_git_mirror "$clone_url" "$mirror_dir" "$full_name"
  fetch_lfs_objects "$mirror_dir" "$full_name"

  if [[ "${ENABLE_GITLAB_PUSH}" == "1" ]]; then
    ensure_gitlab_repo "$repo"
    push_git_mirror_to_gitlab "$mirror_dir" "$repo" "$full_name"
  fi

  if [[ "$has_wiki" != "true" || "${INCLUDE_WIKIS}" != "1" ]]; then
    return 0
  fi

  wiki_url="${clone_url%.git}.wiki.git"
  wiki_repo_name="${repo}.wiki"
  wiki_dir="${BACKUP_ROOT}/mirrors/${owner}/${wiki_repo_name}.git"

  if git ls-remote "$wiki_url" >/dev/null 2>&1; then
    sync_git_mirror "$wiki_url" "$wiki_dir" "${full_name} wiki"
    if [[ "${ENABLE_GITLAB_PUSH}" == "1" && "${GITLAB_PUSH_WIKIS}" == "1" ]]; then
      ensure_gitlab_repo "$wiki_repo_name"
      push_git_mirror_to_gitlab "$wiki_dir" "$wiki_repo_name" "${full_name} wiki"
    fi
  else
    warn "Wiki enabled but no wiki repo found for ${full_name}; skipping wiki backup"
  fi
}

if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
  usage
  exit 0
fi

if [[ $# -gt 1 ]]; then
  usage >&2
  exit 1
fi

if [[ $# -eq 1 ]]; then
  ENV_FILE="$1"
  [[ -f "$ENV_FILE" ]] || die "Config file not found: $ENV_FILE"
  set -a
  # shellcheck source=/dev/null
  source "$ENV_FILE"
  set +a
fi

BACKUP_ROOT="${BACKUP_ROOT:-$HOME/github-repo-backups}"
INCLUDE_OWNED_REPOS="${INCLUDE_OWNED_REPOS:-1}"
GITHUB_ORGS="${GITHUB_ORGS:-}"
REPO_ALLOWLIST="${REPO_ALLOWLIST:-}"
REPO_BLOCKLIST="${REPO_BLOCKLIST:-}"
INCLUDE_ARCHIVED_REPOS="${INCLUDE_ARCHIVED_REPOS:-1}"
INCLUDE_FORKS="${INCLUDE_FORKS:-1}"
INCLUDE_WIKIS="${INCLUDE_WIKIS:-1}"
FETCH_LFS="${FETCH_LFS:-0}"
ENABLE_GITLAB_PUSH="${ENABLE_GITLAB_PUSH:-0}"
GITLAB_NAMESPACE="${GITLAB_NAMESPACE:-}"
GITLAB_CREATE_REPOS="${GITLAB_CREATE_REPOS:-1}"
GITLAB_PUSH_WIKIS="${GITLAB_PUSH_WIKIS:-1}"
DRY_RUN="${DRY_RUN:-0}"

require_cmd git
require_cmd gh
require_cmd jq
gh auth status --hostname github.com >/dev/null 2>&1 \
  || die "GitHub CLI is not authenticated. Run: gh auth login"

if [[ "${ENABLE_GITLAB_PUSH}" == "1" ]]; then
  [[ -n "${GITLAB_NAMESPACE}" ]] || die "ENABLE_GITLAB_PUSH=1 requires GITLAB_NAMESPACE"
fi

mkdir -p "${BACKUP_ROOT}/mirrors" "${BACKUP_ROOT}/metadata"

declare -A seen_repos=()
repo_count=0

if [[ "${INCLUDE_OWNED_REPOS}" != "1" && -z "${GITHUB_ORGS}" ]]; then
  die "Nothing to do. Set INCLUDE_OWNED_REPOS=1 and/or GITHUB_ORGS"
fi

log "Backup root: ${BACKUP_ROOT}"

while IFS= read -r repo_json; do
  [[ -n "$repo_json" ]] || continue

  full_name="$(jq -r '.full_name' <<<"$repo_json")"
  if [[ -n "${seen_repos[$full_name]:-}" ]]; then
    continue
  fi
  seen_repos["$full_name"]=1

  if ! should_backup_repo "$repo_json"; then
    log "Skipping repo due to filters: ${full_name}"
    continue
  fi

  sync_repo_bundle "$repo_json"
  ((repo_count+=1))
done < <(
  {
    if [[ "${INCLUDE_OWNED_REPOS}" == "1" ]]; then
      gh_repo_stream_owned
    fi

    IFS=',' read -r -a orgs <<<"${GITHUB_ORGS}"
    for raw_org in "${orgs[@]}"; do
      org="$(trim "$raw_org")"
      [[ -n "$org" ]] || continue
      gh_repo_stream_org "$org"
    done
  }
)

log "Completed backup run for ${repo_count} repositories"

And then this .env file:

# Local backup location
BACKUP_ROOT="$HOME/github-backup/repo-backups"

# Backup all repos owned by the authenticated GitHub user.
INCLUDE_OWNED_REPOS=1

# Optional comma-separated GitHub org names to back up too.
GITHUB_ORGS=""

# Optional exact repo allow/block lists using owner/repo names.
# If REPO_ALLOWLIST is non-empty, only those repos are backed up.
REPO_ALLOWLIST=""
REPO_BLOCKLIST=""

# Include archived repos and forks.
INCLUDE_ARCHIVED_REPOS=1
INCLUDE_FORKS=1

# Also back up wiki repos when they exist.
INCLUDE_WIKIS=1

# Set to 1 if you use Git LFS and have git-lfs installed.
FETCH_LFS=0

# Optional GitLab mirror push.
ENABLE_GITLAB_PUSH=0

# The target user or group on GitLab.
GITLAB_NAMESPACE=""

# If set to 1, GitLab may create missing repos on first push if your SSH key
# has permission to create projects in that namespace.
GITLAB_CREATE_REPOS=0

# Also push wiki mirrors to GitLab.
GITLAB_PUSH_WIKIS=0

# Print actions without changing anything.
DRY_RUN=0

And then here are some instructions to set this up:

  1. Install dependencies
sudo apt update && sudo apt install git jq

Then install the GitHub CLI:

sudo mkdir -p -m 755 /etc/apt/keyrings
wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
sudo apt update && sudo apt install gh
  1. Authenticate
gh auth login
gh auth setup-git
  1. Place your files

Put backup_github_repos.sh and your config.env somewhere like ~/github-backup/. Make the script executable:

chmod +x backup_github_repos.sh
  1. Test it
./backup_github_repos.sh config.env
  1. Automate with cron

Run crontab -e and add a line like:

0 3 * * * /home/pi/github-backup/backup_github_repos.sh /home/pi/github-backup/config.env >> /home/pi/github-backup/backup.log 2>&1

This runs the backup daily at 3 AM and logs output. Adjust the paths and schedule to your liking.

I might be missing some steps here, so if you're stuck somewhere make sure to give this to an LLM and ask it to help you do the setup.

Webmentions

What’s this?