Skip to main content

Backup & Disaster Recovery

This guide covers backup strategies, recovery procedures, and disaster recovery planning for Lightning Enable deployments.

Azure SQL Automatic Backups

Lightning Enable uses Azure SQL Database, which provides comprehensive automatic backup capabilities.

Point-in-Time Restore (PITR)

Azure SQL automatically creates backups that enable point-in-time restore:

Backup TypeFrequencyRetention
Full backupsWeekly7-35 days (configurable)
Differential backupsEvery 12-24 hours7-35 days
Transaction log backupsEvery 5-10 minutes7-35 days

Default retention: 7 days for Basic/Standard, 35 days for Premium/Business Critical tiers.

Configure Retention

Increase PITR retention in Azure Portal under your SQL Database > Settings > Backups > Retention policies.

Long-Term Retention (LTR)

For compliance and archival needs, configure long-term retention:

PolicyDescriptionMax Retention
Weekly (W)Keep one backup per week10 years
Monthly (M)Keep one backup per month10 years
Yearly (Y)Keep one backup per year10 years

Example LTR policy:

  • Weekly backups retained for 4 weeks
  • Monthly backups retained for 12 months
  • Yearly backups retained for 5 years

Geo-Redundant Backup Storage

Azure SQL backups are stored in geo-redundant storage (GRS) by default:

  • Primary region: Read-access geo-redundant storage (RA-GRS)
  • Secondary region: Asynchronously replicated copies
  • RPO: < 1 hour for geo-replication

Storage redundancy options:

OptionDescriptionUse Case
LRSLocally redundant (single datacenter)Cost-sensitive, non-critical
ZRSZone-redundant (within region)High availability within region
GRSGeo-redundant (cross-region)Disaster recovery (recommended)

How to Restore from Backup

Via Azure Portal

Point-in-Time Restore:

  1. Navigate to Azure Portal > SQL databases > your database
  2. Click Restore in the toolbar
  3. Select Point-in-time restore type
  4. Choose the restore point (date/time)
  5. Enter a new database name (e.g., LightningEnable_Restored_20260109)
  6. Select target server (same or different)
  7. Click Review + create > Create
  8. Wait for restore to complete (5-30 minutes depending on size)

Restore from LTR Backup:

  1. Navigate to SQL server > Backups > Long-term retention
  2. Select the database
  3. Choose the LTR backup to restore
  4. Click Restore
  5. Configure target database name and server
  6. Click Create

Via Azure CLI

Point-in-Time Restore:

# Restore to a specific point in time
az sql db restore \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--name "LightningEnable" \
--dest-name "LightningEnable_Restored" \
--time "2026-01-09T10:30:00Z"

Restore from Deleted Database:

# List deleted databases
az sql db list-deleted \
--resource-group "your-resource-group" \
--server "your-sql-server"

# Restore deleted database
az sql db restore \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--name "LightningEnable" \
--dest-name "LightningEnable_Restored" \
--deleted-time "2026-01-08T15:00:00Z"

Restore from LTR Backup:

# List available LTR backups
az sql db ltr-backup list \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--database "LightningEnable"

# Restore from specific LTR backup
az sql db ltr-restore \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--dest-database "LightningEnable_Restored" \
--backup-id "/subscriptions/.../backups/..."

Via PowerShell

# Point-in-time restore
Restore-AzSqlDatabase `
-ResourceGroupName "your-resource-group" `
-ServerName "your-sql-server" `
-DatabaseName "LightningEnable" `
-TargetDatabaseName "LightningEnable_Restored" `
-PointInTime "2026-01-09T10:30:00Z"

Recovery Time Objectives

Understanding your recovery targets helps plan appropriate backup strategies.

RTO (Recovery Time Objective)

Definition: Maximum acceptable time to restore service after an incident.

ScenarioEstimated RTOFactors
Point-in-time restore (same region)5-30 minutesDatabase size, transaction log volume
Geo-restore (different region)< 12 hoursGeo-replication lag, database size
LTR restore30-60 minutesBackup size, network throughput
Full redeployment1-4 hoursApp Service deployment, DNS propagation

To minimize RTO:

  • Use Premium/Business Critical tier for faster restores
  • Pre-configure deployment scripts and runbooks
  • Maintain infrastructure-as-code (Bicep/Terraform)
  • Test recovery procedures quarterly

RPO (Recovery Point Objective)

Definition: Maximum acceptable data loss measured in time.

Backup TypeRPONotes
Transaction log backups5-10 minutesNear real-time recovery
Differential backups12-24 hoursBetween differential backups
Geo-redundant replication< 5 secondsAsync replication to secondary
LTR backupsDays to weeksDepends on LTR policy

Lightning Enable effective RPO: ~10 minutes (transaction log backup frequency)

Payment Data Consideration

Lightning payments are settled instantly on the Bitcoin network. Even if database recovery loses recent records, the actual Bitcoin payments remain settled. Webhook delivery logs help reconcile any gaps.

Application Data Considerations

What Data Is Backed Up

Azure SQL backups include all Lightning Enable data:

Data TypeTable(s)Criticality
MerchantsMerchantsCritical - merchant configurations
PaymentsPaymentsCritical - payment records
WebhooksWebhookSubscriptions, WebhookDeliveryLogsHigh - delivery tracking
RefundsRefundsCritical - refund records
L402 TokensL402TokensMedium - can be regenerated
Hangfire JobsHangfire.*Low - jobs will re-queue

Encrypted Fields

Lightning Enable encrypts sensitive data at rest using AES-256-GCM:

TableEncrypted Fields
MerchantsApiKeyHash, OpenNodeApiKey, WebhookSecretHash
Critical: Encryption Key Management

The DB_ENCRYPTION_KEY is required to decrypt merchant API keys and secrets.

If the encryption key is lost:

  • Encrypted merchant data is permanently unrecoverable
  • Merchants must re-register and regenerate all API keys
  • OpenNode keys must be re-entered

Backup your encryption key:

  1. Store in Azure Key Vault with soft-delete enabled
  2. Export to secure offline storage (hardware security module or encrypted backup)
  3. Document key recovery procedure
  4. Test key restoration annually

Hangfire Job State

Hangfire background jobs (webhook retries, cleanup tasks) are stored in SQL:

  • Pending jobs: Will execute after recovery
  • In-progress jobs: May duplicate; ensure idempotent handlers
  • Failed jobs: Retained for retry based on policy

Post-recovery action: Review Hangfire dashboard (/hangfire) for stale or duplicate jobs.

Disaster Recovery Procedures

Scenario 1: Database Corruption or Accidental Deletion

Symptoms: Application errors, missing data, database inaccessible

Procedure:

  1. Assess the situation

    # Check database status
    az sql db show --resource-group "rg" --server "server" --name "LightningEnable"
  2. Identify restore point

    • Determine when corruption/deletion occurred
    • Choose restore point 5-10 minutes before incident
  3. Perform point-in-time restore

    az sql db restore \
    --resource-group "your-rg" \
    --server "your-server" \
    --name "LightningEnable" \
    --dest-name "LightningEnable_Restored" \
    --time "2026-01-09T10:00:00Z"
  4. Update connection string (if using restored database directly)

    • Update App Service configuration
    • Or rename databases (swap restored with original)
  5. Verify data integrity

    • Check merchant count
    • Verify recent payments exist
    • Test API functionality

Scenario 2: Region-Wide Outage

Symptoms: Azure region unavailable, all services down

Procedure:

  1. Initiate geo-restore

    az sql db restore \
    --resource-group "your-secondary-rg" \
    --server "your-secondary-server" \
    --name "LightningEnable" \
    --dest-name "LightningEnable" \
    --source-database-deletion-date "" \
    --restore-point-in-time "" \
    --geo-backup-policy-state "Enabled"
  2. Deploy application to secondary region

    • Use infrastructure-as-code templates
    • Deploy to pre-provisioned App Service or create new
  3. Update configuration

    • Connection string for new database
    • Verify all environment variables
  4. Update DNS

    • Point api.lightningenable.com to new App Service
    • Consider Azure Traffic Manager for automatic failover
  5. Verify SSL/TLS

    • Ensure certificates are valid for domain
    • App Service managed certificates may need re-creation

Scenario 3: Application Redeployment

Symptoms: Need to redeploy application (code update, configuration fix)

Procedure:

  1. Database unchanged - No action needed for data

  2. Redeploy via GitHub Actions

    # Trigger deployment workflow
    gh workflow run deploy-production.yml
  3. Or manual deployment

    # Build and publish
    dotnet publish -c Release -o ./publish

    # Deploy to Azure
    az webapp deploy \
    --resource-group "your-rg" \
    --name "your-app-service" \
    --src-path ./publish.zip
  4. Run database migrations (if any)

    dotnet ef database update --connection "your-connection-string"
  5. Verify deployment

    • Health check endpoint: GET /health
    • Test payment creation: POST /api/payments

DNS and SSL Considerations

DNS Failover

For production, configure DNS for resilience:

Option 1: Azure Traffic Manager

  • Automatic health monitoring
  • Failover to secondary endpoint
  • Geographic routing support

Option 2: Manual DNS Update

  • Update A/CNAME record in DNS provider
  • TTL should be low (300 seconds) for faster propagation

SSL/TLS Certificates

Deployment TypeCertificate Management
Azure App Service (managed)Automatic renewal
Custom domainEnsure certificate is backed up or re-issuable
Azure Front DoorManaged certificates available

Post-recovery:

  • Verify HTTPS works: curl -I https://api.lightningenable.com/health
  • Check certificate expiration: openssl s_client -connect api.lightningenable.com:443

Verification Checklist

After any recovery, complete this verification checklist:

Database Verification

  • Database is online and accessible
  • Connection string is correctly configured
  • Merchant records exist and count is expected
  • Recent payments are present (check last 24 hours)
  • Encrypted fields decrypt successfully (test merchant API key validation)

Application Verification

  • Health endpoint returns 200: GET /health
  • API authentication works: GET /api/merchants with valid key
  • Payment creation works: POST /api/payments (use test mode)
  • Webhook delivery is functional
  • Hangfire dashboard accessible: /hangfire
  • No duplicate/stuck Hangfire jobs

External Integration Verification

  • OpenNode connectivity: payments reach OpenNode
  • Stripe webhooks: subscription events processed
  • DNS resolves to correct endpoint
  • SSL certificate valid and not expiring soon

Post-Recovery Tasks

  • Notify stakeholders of recovery completion
  • Document incident and recovery timeline
  • Review and reconcile any payment discrepancies
  • Update runbook with lessons learned
  • Schedule post-incident review

For production Lightning Enable deployments:

# Set PITR retention to 14 days
az sql db update \
--resource-group "your-rg" \
--server "your-server" \
--name "LightningEnable" \
--backup-storage-redundancy "Geo"

# Configure LTR policy
az sql db ltr-policy set \
--resource-group "your-rg" \
--server "your-server" \
--database "LightningEnable" \
--weekly-retention "P4W" \
--monthly-retention "P12M" \
--yearly-retention "P5Y" \
--week-of-year 1

Next Steps