NebulaCB — Couchbase Mission Control

Everything You Need to Manage Couchbase

From rolling upgrades to AI-powered troubleshooting, NebulaCB covers every aspect of Couchbase cluster lifecycle management.

🛰️

Mission Control Cockpit

A NASA-style mission-control view that puts every signal you need on one screen. Status pill, 4-tile top strip (source / target / XDCR flow / load + alerts), full-width upgrade timeline, and a 3-column bottom row (live logs / data integrity / controls).

Glowing health tiles with pulsing animations
Phase rail with 6-stage upgrade tracking
Live event stream from XDCR + alerts + node state
Tail-style log panel with severity + source filters
Side-by-side data integrity proof + control buttons

⚙️

Upgrade Orchestrator

Automate Couchbase rolling upgrades via the Kubernetes Operator. NebulaCB patches the CouchbaseCluster CR image, monitors pod rollover, and tracks rebalance completion.

Helm-based rolling upgrade via CouchbaseCluster CR
Pre-check validation (all nodes ready)
Node-by-node progress tracking (10s poll)
Downgrade button to roll back to previous version
Abort to stop tracking mid-upgrade

🔁

XDCR Replication Management

Monitor cross-datacenter replication in real time. Track replication lag, pipeline restarts, topology changes, and GOXDCR delay with a 5-minute countdown timer.

Bidirectional XDCR monitoring (source ↔ target)
Pipeline pause / resume / restart / stop controls
GOXDCR delay detection with countdown timer
Topology change tracking during upgrades
Integrated XDCR troubleshooting modal

✅

Data Integrity Validation

Continuous proof that zero data loss occurs during upgrades. Compare source and target clusters with hash verification and sequence gap detection.

Document count timeline with delta convergence chart
SHA-256 hash sampling for content verification
Sequence gap detection for ordered streams
Full audit on demand (all keys compared)
Real-time convergence status (ZERO LOSS CONFIRMED)

⚡

Storm Load Generator

Simulate production traffic during upgrades. Generate writes, reads, and deletes with configurable rates, burst patterns, and hot-key distributions.

Configurable writes/reads per second and doc sizes
Burst mode with multiplier and interval
Hot key percentage for realistic access patterns
Standalone xdcr-loadtest script for dual-cluster writes
Real-time latency P50/P95/P99 tracking

🛡️

HA & Automatic Failover

Configure automatic failover with health checks, timeout thresholds, and recovery modes. Supports manual and graceful failover between clusters.

Auto-failover with configurable timeout
Manual and graceful failover triggers
Failover history and event timeline
Cross-region failover support
Preserve data mode for safe recovery

💾

Backup & Restore (EE + CE)

Works on Couchbase Enterprise and Community Edition. EE clusters use cbbackupmgr; CE clusters fall back to a parallel SDK JSONL export — no license required.

Auto-detect engine: cbbackupmgr → SDK JSONL
CE mode: 16-worker KV fetch pool, millions of docs
Restore modal: pick backup from list, target any cluster
Live progress: docs / bytes while running
JSONL + metadata.json on disk, easy to inspect
Cron-based scheduling, retention, compression (EE)

📦

Data Migration

Migrate data between clusters with parallel workers, batch processing, and optional transformation rules. Validates integrity after migration.

Parallel worker pool (configurable)
Batch processing with retry logic
Transform rules (rename, convert, filter)
Post-migration validation
Progress tracking with ETA

☸️

Kubernetes Operator Integration

Deep integration with the Couchbase Autonomous Operator. Auto-discovers pods, manages NodePort exposure, and patches CRDs for upgrades.

Auto port-forwarding for k8s clusters
Direct NodePort access with kv_port config
CouchbaseCluster CR patching for upgrades
Pod discovery and health monitoring
Helm chart for deploying NebulaCB itself

📜

K8s Observability Suite

Four new enterprise tabs that bring the rest of the cluster lifecycle into NebulaCB: pod logs, Kubernetes events, Operator state, and an opinionated runbook library — all without leaving the dashboard.

Pod Logs — live tail across namespaces
Events — real-time K8s event stream with filters
Operator — CouchbaseCluster CR + Operator health
Runbooks — built-in remediation playbooks
Force-reconnect button for SDK pool recovery

📦

System Packages & systemd

First-class native install. Ships as .deb for Ubuntu/Debian, .rpm for CentOS/RHEL/Rocky/Alma/Fedora/openSUSE/SLES, plus a distro-aware shell installer. Runs under a hardened systemd unit on port 8899.

nfpm-built .deb and .rpm with pre/post hooks
install.sh detects Ubuntu/Debian/CentOS/SUSE
Hardened systemd unit (NoNewPrivileges, ProtectSystem)
Dedicated nebulacb system user, /etc/nebulacb config
Make targets: package-deb, package-rpm, install-local

Mission Control Dashboard

A cockpit-style interface with real-time WebSocket updates, live cluster health, XDCR flow visualization, and one-click controls.

📊

Cluster Health & Metrics

Node status, CPU/memory, ops/sec, doc counts, version, edition, rebalance state per cluster.

🔁

XDCR Replication Flow

Visual pipeline: source → target with lag, restarts, topology changes, mutation queue, GOXDCR delay.

✅

Data Loss Proof Panel

Doc count timeline, delta convergence chart, hash sampling results, monitoring duration, zero-loss verdict.

🎮

Control Panel

16 action buttons: load, upgrade, downgrade, XDCR, audit, AI analyze, backup, failover, chaos injection.

Tab Navigation — 10 Workspaces

Cockpit is the new default. The legacy Dashboard tab stays for parity, and four new enterprise tabs bring K8s observability inside NebulaCB.

🛰️
Cockpit
Mission-control grid (default)

◈
Dashboard
Legacy panel-stack view

🤖
Ask AI
Chat with AI about cluster issues

🔍
RCA
Root cause analysis reports

📚
Knowledge
12+ built-in troubleshooting guides

📊
Insights
History of all AI analyses

📜
Pod Logs
Live K8s pod log tail

⚡
Events
Real-time Kubernetes events

☸️
Operator
CouchbaseCluster CR & operator state

📋
Runbooks
Opinionated remediation playbooks

Mission Control Panel — 18 Commands

Every operator action is one click away. All commands hit the backend via /api/v1/command and stream live state back through WebSocket.

Category	Command	What it does
Load	`start_load`	Start the Storm generator against configured clusters
	`pause_load`	Pause writers without tearing down workers
	`resume_load`	Resume paused writers without re-initialising
	`stop_load`	Stop generation and flush stats
Upgrade	`start_upgrade`	Patch `CouchbaseCluster` CR and track pod-by-pod rollout
	`abort_upgrade`	Stop tracking the in-flight upgrade
	`downgrade`	Roll back to the previous image via Operator rolling restart
XDCR	`pause_xdcr`	Pause replication pipeline
	`resume_xdcr`	Resume after pause
	`stop_xdcr`	Stop and remove the replication
	`restart_xdcr`	Recreate the pipeline (useful after topology change)
	`xdcr_troubleshoot`	Open diagnostics modal with delay history + live state
Validation	`run_audit`	Full source-vs-target comparison (hash + sequence + key diff)
Chaos	`inject_failure`	Inject XDCR partition or node failure for resilience testing
AI	`ai_analyze`	Trigger on-demand AI root cause analysis
Backup	`start_backup`	Start a cluster backup (EE cbbackupmgr or CE SDK JSONL fallback)
Backup	`start_restore`	Restore from a previous backup — modal lists backups and target cluster
HA	`manual_failover`	Promote target, mark source failed (with confirmation modal)

AI-Powered Analysis with Ollama

Run AI locally with Ollama — no cloud API keys needed. NebulaCB learns from your cluster logs and metrics to provide context-aware recommendations.

💬

Ask AI

Chat with NebulaCB AI about any cluster issue. It has full context of your cluster state, XDCR status, alerts, and metrics. Ask questions in natural language and get actionable answers.

> "Why is XDCR replication lag increasing?"
> "Is the cluster ready for an upgrade?"
> "Analyze performance bottlenecks"
> "What's the best backup strategy?"

🔍

Root Cause Analysis

Trigger AI-powered RCA for specific categories: XDCR issues, upgrade failures, performance problems, failover events, backup errors, or data integrity concerns. Get structured reports with evidence chains and remediation steps.

Result: severity, root cause, evidence chain,
remediation steps with risk levels + commands

📚

Knowledge Base

Built-in library of 12+ common Couchbase issues covering XDCR lag, pipeline restarts, stuck rebalances, auto-failover, backup failures, memory pressure, disk queues, and Kubernetes NodePort configuration. Searchable and filterable.

Categories: XDCR, Upgrade, Failover, Backup,
Performance, Data Integrity, Configuration

🧠

Ollama Integration

Run AI 100% locally with Ollama. No data leaves your network. Supports llama3, llama4, and any Ollama-compatible model. Also supports Anthropic Claude and OpenAI as cloud providers.

config.json:
"ai": {
  "enabled": true,
  "provider": "ollama",
  "model": "llama3",
  "api_endpoint": "http://127.0.0.1:11434"
}

Installation

Four ways to get started with NebulaCB.

Build from Source (Go 1.24+)

# Clone and build
git clone https://github.com/bwalia/nebulacb.git
cd nebulacb
make build

# Configure your clusters
vim config.json

# Start the server
make run

# Open dashboard
open http://localhost:8899

# Login: admin / nebulacb

Run the XDCR Load Test

# Send random writes to both clusters during upgrades
go run ./cmd/xdcr-loadtest/ -rate 100 -duration 30m

# Custom: 70% to source, 30% to target, larger docs
go run ./cmd/xdcr-loadtest/ -rate 500 -ratio 0.7 -doc-min 1024 -doc-max 8192

Enable AI with Ollama

# Install Ollama (macOS)
brew install ollama

# Pull a model
ollama pull llama3

# Update config.json
"ai": {
  "enabled": true,
  "provider": "ollama",
  "model": "llama3",
  "api_endpoint": "http://127.0.0.1:11434"
}

Native Install (Ubuntu / Debian / CentOS / RHEL / Rocky / Alma / Fedora / openSUSE / SLES)

# Build distributable .deb and .rpm with nfpm
make package          # builds both .deb and .rpm into ./dist/
make package-deb      # Ubuntu / Debian only
make package-rpm      # CentOS / RHEL / Rocky / Alma / Fedora / openSUSE / SLES

# Install on a Debian-family host
sudo dpkg -i dist/nebulacb_1.0.0_amd64.deb

# Install on an RPM-family host
sudo rpm -i dist/nebulacb-1.0.0-1.x86_64.rpm
# or, with dependency resolution:
sudo dnf install dist/nebulacb-1.0.0-1.x86_64.rpm
sudo zypper install dist/nebulacb-1.0.0-1.x86_64.rpm

Local Install via Shell Script (no package manager)

# Builds binary + UI, then installs system-wide
make install-local

# Or supply your own config to seed /etc/nebulacb/config.json
make install-local SOURCE_CONFIG=/path/to/config.json

# Service runs as user 'nebulacb' on port 8899
sudo systemctl status nebulacb
sudo journalctl -u nebulacb -f
curl http://localhost:8899/api/v1/health

# Uninstall (preserves /etc/nebulacb)
make uninstall-local
# Full purge (removes config, data, logs, user)
make uninstall-local ARGS=--purge

What Gets Installed

/usr/local/bin/nebulacb                    # Static Go binary (~20 MB)
/usr/local/share/nebulacb/web/...          # React UI build
/etc/nebulacb/config.json                  # Editable config (0640 root:nebulacb)
/etc/systemd/system/nebulacb.service       # Hardened systemd unit
/var/lib/nebulacb/                         # State directory
/var/log/nebulacb/                         # Log directory (journal also works)

# Hardening flags in the unit:
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ProtectKernelTunables=true
ProtectKernelModules=true

After install, edit /etc/nebulacb/config.json to point at your Couchbase clusters and run sudo systemctl restart nebulacb. The dashboard is served at http://<host>:8899.

Docker Compose (includes two Couchbase clusters)

# Start everything: NebulaCB + Couchbase 7.2.2 + Couchbase 7.6.0
docker-compose up -d

# Open dashboard at http://localhost:8080
# Source cluster: http://localhost:8091
# Target cluster: http://localhost:9091

# Tear down
docker-compose down -v

After starting, initialize both Couchbase clusters, create the test bucket, and set up XDCR replication. See the README for step-by-step instructions.

Deploy to Kubernetes with Helm

# Install NebulaCB
helm install nebulacb deploy/helm/nebulacb \
  -n nebulacb --create-namespace

# Upgrade
helm upgrade nebulacb deploy/helm/nebulacb -n nebulacb

# Uninstall
helm uninstall nebulacb -n nebulacb

Expose Couchbase Clusters via NodePort

# Patch Couchbase Operator for external access
kubectl patch couchbasecluster cb-local -n couchbase --type merge \
  -p '{"spec":{"networking":{"exposedFeatures":["client","admin"]}}}'

# Set kv_port in config.json to skip port-forwarding
"source": {
  "host": "192.168.1.193:32451",
  "kv_port": 32419,
  ...
}

Try It Live — Smoke Test After Install

Every endpoint below works against the default install. Use these to verify your deployment in under 60 seconds.

📡

Health & Dashboard

Check the server is up, all clusters are connected, and the dashboard API returns live cluster state.

# Public health probe (no auth)
curl http://localhost:8899/api/v1/health

# Full dashboard state (auth required)
curl -u admin:nebulacb \
  http://localhost:8899/api/v1/dashboard

# Get a session token for the UI
curl -X POST http://localhost:8899/api/v1/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"admin","password":"nebulacb"}'

🔍

Data Integrity Audit

Run a full source-vs-target comparison. Exercises the gocb SDK, REST topology discovery, and the validator module end-to-end.

# Kick off a full audit
curl -u admin:nebulacb \
  -X POST http://localhost:8899/api/v1/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"run_audit"}'

# Same thing via the CLI
bin/nebulacb-cli run-audit
bin/nebulacb-cli status

🔁

XDCR Diagnostics

Get live pipeline state, topology-change history, GOXDCR delay windows, and every diagnostic check the troubleshoot modal runs.

curl -u admin:nebulacb \
  http://localhost:8899/api/v1/xdcr/diagnostics | jq

# Restart the pipeline
curl -u admin:nebulacb \
  -X POST http://localhost:8899/api/v1/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"restart_xdcr"}'

⚡

Dual-Cluster Load Test

Generate random writes against both clusters to stress XDCR during an upgrade. Prints per-cluster throughput every 5 seconds.

# 100 writes/sec, 50/50 split, 5 min
go run ./cmd/xdcr-loadtest/ \
  -rate 100 -duration 5m

# Or trigger the in-process Storm generator
curl -u admin:nebulacb \
  -X POST http://localhost:8899/api/v1/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"start_load"}'

🤖

AI Tabs (Ollama required)

Run AI locally. The Knowledge Base tab always works, but Ask AI, RCA, and Insights require a running Ollama instance.

# One-time setup
curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull llama3

# Trigger an AI analysis
curl -u admin:nebulacb \
  -X POST http://localhost:8899/api/v1/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"ai_analyze"}'

☸️

K8s Observability Tabs

Pod Logs, Events, and Operator tabs pull from the kubeconfig set in /etc/nebulacb/config.json. Make sure the nebulacb system user can read that file.

# Place kubeconfig where the service can read it
sudo install -m 0640 -o root -g nebulacb \
  ~/.kube/config /etc/nebulacb/kubeconfig.yaml

# Point config.json at it and restart
sudo sed -i 's|"kubeconfig": ".*"|"kubeconfig": "/etc/nebulacb/kubeconfig.yaml"|' \
  /etc/nebulacb/config.json
sudo systemctl restart nebulacb

Architecture

NebulaCB runs on your machine and connects to Couchbase clusters via REST API and the gocb SDK.

                             React Dashboard (:8899)
                                    |
                            WebSocket + REST API
                                    |
                          NebulaCB Go Server
                    /    |    |    |    |    |    \
              Storm  XDCR  Validator  Orchestrator  Monitor  AI   Failover
                |      |      |           |           |      |      |
             ClientPool (gocb SDK + REST + NodePort connections)
              /                                                   \
   Couchbase Source                                      Couchbase Target
   (k8s / docker / native)                              (k8s / docker / native)
              \_____________________ XDCR _____________________/
                              (bidirectional)

   Local AI: Ollama (llama3) at 127.0.0.1:11434
   Metrics: Prometheus endpoint at :9090/metrics

CLI Commands

Command	Description
`nebulacb-cli status`	Full dashboard status (clusters, upgrade, XDCR, load, integrity, alerts)
`nebulacb-cli start-load`	Start the Storm load generator
`nebulacb-cli stop-load`	Stop load generation
`nebulacb-cli start-upgrade`	Trigger rolling upgrade
`nebulacb-cli abort-upgrade`	Stop tracking the upgrade
`nebulacb-cli restart-xdcr`	Restart XDCR pipeline
`nebulacb-cli run-audit`	Run full data integrity audit
`nebulacb-cli report`	Generate post-upgrade report

Project Structure

nebulacb/
  cmd/
    nebulacb/              # Main server
    cli/                   # CLI client
    xdcr-loadtest/         # Dual-cluster load test
  internal/
    ai/                    # AI analyzer (Ollama/Claude/OpenAI)
    api/                   # HTTP + WebSocket server
    storm/                 # Load generator
    xdcr/                  # XDCR replication engine
    validator/             # Data integrity validation
    orchestrator/          # Upgrade + downgrade
    failover/              # HA & failover
    backup/                # Backup & restore
    migration/             # Data migration
    monitor/               # Multi-cluster polling
    region/                # Multi-region management
  pkg/
    couchbase/             # SDK client + connection pool
    kubernetes/            # K8s client + port-forward
  web/nebulacb-ui/         # React dashboard
  deploy/helm/nebulacb/    # Helm chart

Upgrade Fearlessly.
Validate Everything.
Lose Nothing.

Everything You Need to Manage Couchbase

Mission Control Dashboard

Tab Navigation — 10 Workspaces

Mission Control Panel — 18 Commands

AI-Powered Analysis with Ollama

How It Works

Installation

Build from Source (Go 1.24+)

Run the XDCR Load Test

Enable AI with Ollama

Native Install (Ubuntu / Debian / CentOS / RHEL / Rocky / Alma / Fedora / openSUSE / SLES)

Local Install via Shell Script (no package manager)

What Gets Installed

Docker Compose (includes two Couchbase clusters)

Deploy to Kubernetes with Helm

Expose Couchbase Clusters via NodePort

Try It Live — Smoke Test After Install

Architecture

CLI Commands

Project Structure

Ready to Upgrade Fearlessly?

Upgrade Fearlessly.Validate Everything.Lose Nothing.

Everything You Need to Manage Couchbase

Mission Control Dashboard

Tab Navigation — 10 Workspaces

Mission Control Panel — 18 Commands

AI-Powered Analysis with Ollama

How It Works

Installation

Build from Source (Go 1.24+)

Run the XDCR Load Test

Enable AI with Ollama

Native Install (Ubuntu / Debian / CentOS / RHEL / Rocky / Alma / Fedora / openSUSE / SLES)

Local Install via Shell Script (no package manager)

What Gets Installed

Docker Compose (includes two Couchbase clusters)

Deploy to Kubernetes with Helm

Expose Couchbase Clusters via NodePort

Try It Live — Smoke Test After Install

Architecture

CLI Commands

Project Structure

Ready to Upgrade Fearlessly?

Upgrade Fearlessly.
Validate Everything.
Lose Nothing.