Backing up important GitHub repos
Lately, GitHub hasn't been very stable and I also read a story of GitHub blocking someone's account on Reddit, so I do not have a lot of trust to keep my code only on one platform. And I used my small Raspberry Pi 4B 1 GB device to keep additional copies of my important repos.
Here's the bash script to mirror specific GitHub repos locally (with optional GitLab push). Supports org repos, wikis, LFS, allowlist/blocklist filtering, and dry-run mode:
First, this backup_github_repos.sh file:
#!/usr/bin/env bash
set -Eeuo pipefail
usage() {
cat <<'EOF'
Backup all owned GitHub repositories to local mirror clones and optionally push them to GitLab.
Usage:
./backup_github_repos.sh [path/to/config.env]
Prerequisites:
- git
- gh
- jq
- gh auth login
- gh auth setup-git
Optional:
- git-lfs (if FETCH_LFS=1)
- GitLab SSH key (if ENABLE_GITLAB_PUSH=1)
This script:
- mirrors all repos you own on GitHub
- can also mirror selected GitHub org repos
- can limit backups to an exact allowlist of repos
- can back up wiki repos when they exist
- can push matching repos to GitLab over SSH
It does NOT back up GitHub issues, pull requests, discussions, or release assets.
EOF
}
log() {
printf '[%s] %s\n' "$(date '+%F %T')" "$*"
}
warn() {
printf '[%s] WARNING: %s\n' "$(date '+%F %T')" "$*" >&2
}
die() {
printf '[%s] ERROR: %s\n' "$(date '+%F %T')" "$*" >&2
exit 1
}
run() {
if [[ "${DRY_RUN}" == "1" ]]; then
printf 'DRY_RUN:'
printf ' %q' "$@"
printf '\n'
return 0
fi
"$@"
}
trim() {
local value="${1:-}"
value="${value#"${value%%[![:space:]]*}"}"
value="${value%"${value##*[![:space:]]}"}"
printf '%s' "$value"
}
require_cmd() {
command -v "$1" >/dev/null 2>&1 || die "Missing required command: $1"
}
csv_list_contains() {
local needle="$1"
local csv="$2"
local item
IFS=',' read -r -a items <<<"$csv"
for raw_item in "${items[@]}"; do
item="$(trim "$raw_item")"
[[ -n "$item" ]] || continue
[[ "$item" == "$needle" ]] && return 0
done
return 1
}
gh_repo_stream_owned() {
gh api --paginate "/user/repos?affiliation=owner&per_page=100" \
| jq -c '.[] | {
name,
full_name,
private,
archived,
fork,
has_wiki,
clone_url,
ssh_url,
owner: .owner.login
}'
}
gh_repo_stream_org() {
local org="$1"
gh api --paginate "/orgs/${org}/repos?type=all&per_page=100" \
| jq -c '.[] | {
name,
full_name,
private,
archived,
fork,
has_wiki,
clone_url,
ssh_url,
owner: .owner.login
}'
}
write_repo_metadata() {
local repo_json="$1"
local owner="$2"
local repo="$3"
local metadata_dir="${BACKUP_ROOT}/metadata/${owner}"
mkdir -p "$metadata_dir"
jq '.' <<<"$repo_json" > "${metadata_dir}/${repo}.json"
}
sync_git_mirror() {
local source_url="$1"
local destination="$2"
local label="$3"
mkdir -p "$(dirname "$destination")"
if [[ -d "$destination" ]]; then
log "Updating mirror: ${label}"
run git -C "$destination" remote set-url origin "$source_url"
run git -C "$destination" remote update --prune
else
log "Creating mirror: ${label}"
run git clone --mirror "$source_url" "$destination"
fi
}
fetch_lfs_objects() {
local destination="$1"
local label="$2"
if [[ "${FETCH_LFS}" != "1" ]]; then
return 0
fi
if ! git lfs version >/dev/null 2>&1; then
die "FETCH_LFS=1 but git-lfs is not installed"
fi
log "Fetching LFS objects: ${label}"
if ! run git -C "$destination" lfs fetch --all origin; then
warn "LFS fetch failed for ${label}. The Git mirror is still valid, but LFS content may be incomplete."
fi
}
gitlab_remote_url() {
local repo_name="$1"
printf 'git@gitlab.com:%s/%s.git' "$GITLAB_NAMESPACE" "$repo_name"
}
gitlab_remote_exists() {
local repo_name="$1"
git ls-remote "$(gitlab_remote_url "$repo_name")" >/dev/null 2>&1
}
ensure_gitlab_repo() {
local repo_name="$1"
if gitlab_remote_exists "$repo_name"; then
return 0
fi
if [[ "${GITLAB_CREATE_REPOS}" != "1" ]]; then
die "GitLab repo ${GITLAB_NAMESPACE}/${repo_name} does not exist and GITLAB_CREATE_REPOS=0"
fi
log "GitLab repo ${GITLAB_NAMESPACE}/${repo_name} will be created on first push if your SSH key can create projects in that namespace"
}
push_git_mirror_to_gitlab() {
local destination="$1"
local repo_name="$2"
local label="$3"
local remote_url
remote_url="$(gitlab_remote_url "$repo_name")"
if git -C "$destination" remote get-url gitlab >/dev/null 2>&1; then
run git -C "$destination" remote set-url gitlab "$remote_url"
else
run git -C "$destination" remote add gitlab "$remote_url"
fi
log "Pushing branches and tags to GitLab: ${label}"
run git -C "$destination" push --prune gitlab \
'+refs/heads/*:refs/heads/*' \
'+refs/tags/*:refs/tags/*'
if [[ "${FETCH_LFS}" == "1" ]]; then
if ! run git -C "$destination" lfs push --all gitlab; then
warn "LFS push failed for ${label}. Check GitLab LFS configuration if this repo uses LFS."
fi
fi
}
should_backup_repo() {
local repo_json="$1"
local archived
local forked
local full_name
archived="$(jq -r '.archived' <<<"$repo_json")"
forked="$(jq -r '.fork' <<<"$repo_json")"
full_name="$(jq -r '.full_name' <<<"$repo_json")"
if [[ -n "${REPO_ALLOWLIST}" ]] && ! csv_list_contains "$full_name" "$REPO_ALLOWLIST"; then
return 1
fi
if [[ -n "${REPO_BLOCKLIST}" ]] && csv_list_contains "$full_name" "$REPO_BLOCKLIST"; then
return 1
fi
if [[ "$archived" == "true" && "${INCLUDE_ARCHIVED_REPOS}" != "1" ]]; then
return 1
fi
if [[ "$forked" == "true" && "${INCLUDE_FORKS}" != "1" ]]; then
return 1
fi
return 0
}
sync_repo_bundle() {
local repo_json="$1"
local full_name
local owner
local repo
local clone_url
local private_flag
local has_wiki
local mirror_dir
local wiki_dir
local wiki_url
local wiki_repo_name
full_name="$(jq -r '.full_name' <<<"$repo_json")"
owner="${full_name%/*}"
repo="${full_name#*/}"
clone_url="$(jq -r '.clone_url' <<<"$repo_json")"
private_flag="$(jq -r '.private' <<<"$repo_json")"
has_wiki="$(jq -r '.has_wiki' <<<"$repo_json")"
mirror_dir="${BACKUP_ROOT}/mirrors/${owner}/${repo}.git"
write_repo_metadata "$repo_json" "$owner" "$repo"
sync_git_mirror "$clone_url" "$mirror_dir" "$full_name"
fetch_lfs_objects "$mirror_dir" "$full_name"
if [[ "${ENABLE_GITLAB_PUSH}" == "1" ]]; then
ensure_gitlab_repo "$repo"
push_git_mirror_to_gitlab "$mirror_dir" "$repo" "$full_name"
fi
if [[ "$has_wiki" != "true" || "${INCLUDE_WIKIS}" != "1" ]]; then
return 0
fi
wiki_url="${clone_url%.git}.wiki.git"
wiki_repo_name="${repo}.wiki"
wiki_dir="${BACKUP_ROOT}/mirrors/${owner}/${wiki_repo_name}.git"
if git ls-remote "$wiki_url" >/dev/null 2>&1; then
sync_git_mirror "$wiki_url" "$wiki_dir" "${full_name} wiki"
if [[ "${ENABLE_GITLAB_PUSH}" == "1" && "${GITLAB_PUSH_WIKIS}" == "1" ]]; then
ensure_gitlab_repo "$wiki_repo_name"
push_git_mirror_to_gitlab "$wiki_dir" "$wiki_repo_name" "${full_name} wiki"
fi
else
warn "Wiki enabled but no wiki repo found for ${full_name}; skipping wiki backup"
fi
}
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
usage
exit 0
fi
if [[ $# -gt 1 ]]; then
usage >&2
exit 1
fi
if [[ $# -eq 1 ]]; then
ENV_FILE="$1"
[[ -f "$ENV_FILE" ]] || die "Config file not found: $ENV_FILE"
set -a
# shellcheck source=/dev/null
source "$ENV_FILE"
set +a
fi
BACKUP_ROOT="${BACKUP_ROOT:-$HOME/github-repo-backups}"
INCLUDE_OWNED_REPOS="${INCLUDE_OWNED_REPOS:-1}"
GITHUB_ORGS="${GITHUB_ORGS:-}"
REPO_ALLOWLIST="${REPO_ALLOWLIST:-}"
REPO_BLOCKLIST="${REPO_BLOCKLIST:-}"
INCLUDE_ARCHIVED_REPOS="${INCLUDE_ARCHIVED_REPOS:-1}"
INCLUDE_FORKS="${INCLUDE_FORKS:-1}"
INCLUDE_WIKIS="${INCLUDE_WIKIS:-1}"
FETCH_LFS="${FETCH_LFS:-0}"
ENABLE_GITLAB_PUSH="${ENABLE_GITLAB_PUSH:-0}"
GITLAB_NAMESPACE="${GITLAB_NAMESPACE:-}"
GITLAB_CREATE_REPOS="${GITLAB_CREATE_REPOS:-1}"
GITLAB_PUSH_WIKIS="${GITLAB_PUSH_WIKIS:-1}"
DRY_RUN="${DRY_RUN:-0}"
require_cmd git
require_cmd gh
require_cmd jq
gh auth status --hostname github.com >/dev/null 2>&1 \
|| die "GitHub CLI is not authenticated. Run: gh auth login"
if [[ "${ENABLE_GITLAB_PUSH}" == "1" ]]; then
[[ -n "${GITLAB_NAMESPACE}" ]] || die "ENABLE_GITLAB_PUSH=1 requires GITLAB_NAMESPACE"
fi
mkdir -p "${BACKUP_ROOT}/mirrors" "${BACKUP_ROOT}/metadata"
declare -A seen_repos=()
repo_count=0
if [[ "${INCLUDE_OWNED_REPOS}" != "1" && -z "${GITHUB_ORGS}" ]]; then
die "Nothing to do. Set INCLUDE_OWNED_REPOS=1 and/or GITHUB_ORGS"
fi
log "Backup root: ${BACKUP_ROOT}"
while IFS= read -r repo_json; do
[[ -n "$repo_json" ]] || continue
full_name="$(jq -r '.full_name' <<<"$repo_json")"
if [[ -n "${seen_repos[$full_name]:-}" ]]; then
continue
fi
seen_repos["$full_name"]=1
if ! should_backup_repo "$repo_json"; then
log "Skipping repo due to filters: ${full_name}"
continue
fi
sync_repo_bundle "$repo_json"
((repo_count+=1))
done < <(
{
if [[ "${INCLUDE_OWNED_REPOS}" == "1" ]]; then
gh_repo_stream_owned
fi
IFS=',' read -r -a orgs <<<"${GITHUB_ORGS}"
for raw_org in "${orgs[@]}"; do
org="$(trim "$raw_org")"
[[ -n "$org" ]] || continue
gh_repo_stream_org "$org"
done
}
)
log "Completed backup run for ${repo_count} repositories"
And then this .env file:
# Local backup location
BACKUP_ROOT="$HOME/github-backup/repo-backups"
# Backup all repos owned by the authenticated GitHub user.
INCLUDE_OWNED_REPOS=1
# Optional comma-separated GitHub org names to back up too.
GITHUB_ORGS=""
# Optional exact repo allow/block lists using owner/repo names.
# If REPO_ALLOWLIST is non-empty, only those repos are backed up.
REPO_ALLOWLIST=""
REPO_BLOCKLIST=""
# Include archived repos and forks.
INCLUDE_ARCHIVED_REPOS=1
INCLUDE_FORKS=1
# Also back up wiki repos when they exist.
INCLUDE_WIKIS=1
# Set to 1 if you use Git LFS and have git-lfs installed.
FETCH_LFS=0
# Optional GitLab mirror push.
ENABLE_GITLAB_PUSH=0
# The target user or group on GitLab.
GITLAB_NAMESPACE=""
# If set to 1, GitLab may create missing repos on first push if your SSH key
# has permission to create projects in that namespace.
GITLAB_CREATE_REPOS=0
# Also push wiki mirrors to GitLab.
GITLAB_PUSH_WIKIS=0
# Print actions without changing anything.
DRY_RUN=0
And then here are some instructions to set this up:
- Install dependencies
sudo apt update && sudo apt install git jq
Then install the GitHub CLI:
sudo mkdir -p -m 755 /etc/apt/keyrings
wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null
sudo apt update && sudo apt install gh
- Authenticate
gh auth login
gh auth setup-git
- Place your files
Put backup_github_repos.sh and your config.env somewhere like ~/github-backup/. Make the script executable:
chmod +x backup_github_repos.sh
- Test it
./backup_github_repos.sh config.env
- Automate with cron
Run crontab -e and add a line like:
0 3 * * * /home/pi/github-backup/backup_github_repos.sh /home/pi/github-backup/config.env >> /home/pi/github-backup/backup.log 2>&1
This runs the backup daily at 3 AM and logs output. Adjust the paths and schedule to your liking.
I might be missing some steps here, so if you're stuck somewhere make sure to give this to an LLM and ask it to help you do the setup.
Webmentions