Backup & Disaster Recovery

This guide covers backup strategies, recovery procedures, and disaster recovery planning for Lightning Enable deployments.

Azure SQL Automatic Backups

Lightning Enable uses Azure SQL Database, which provides comprehensive automatic backup capabilities.

Point-in-Time Restore (PITR)

Azure SQL automatically creates backups that enable point-in-time restore:

Backup Type	Frequency	Retention
Full backups	Weekly	7-35 days (configurable)
Differential backups	Every 12-24 hours	7-35 days
Transaction log backups	Every 5-10 minutes	7-35 days

Default retention: 7 days for Basic/Standard, 35 days for Premium/Business Critical tiers.

Configure Retention

Increase PITR retention in Azure Portal under your SQL Database > Settings > Backups > Retention policies.

Long-Term Retention (LTR)

For compliance and archival needs, configure long-term retention:

Policy	Description	Max Retention
Weekly (W)	Keep one backup per week	10 years
Monthly (M)	Keep one backup per month	10 years
Yearly (Y)	Keep one backup per year	10 years

Example LTR policy:

Weekly backups retained for 4 weeks
Monthly backups retained for 12 months
Yearly backups retained for 5 years

Geo-Redundant Backup Storage

Azure SQL backups are stored in geo-redundant storage (GRS) by default:

Primary region: Read-access geo-redundant storage (RA-GRS)
Secondary region: Asynchronously replicated copies
RPO: < 1 hour for geo-replication

Storage redundancy options:

Option	Description	Use Case
LRS	Locally redundant (single datacenter)	Cost-sensitive, non-critical
ZRS	Zone-redundant (within region)	High availability within region
GRS	Geo-redundant (cross-region)	Disaster recovery (recommended)

How to Restore from Backup

Via Azure Portal

Point-in-Time Restore:

Navigate to Azure Portal > SQL databases > your database
Click Restore in the toolbar
Select Point-in-time restore type
Choose the restore point (date/time)
Enter a new database name (e.g., LightningEnable_Restored_20260109)
Select target server (same or different)
Click Review + create > Create
Wait for restore to complete (5-30 minutes depending on size)

Restore from LTR Backup:

Navigate to SQL server > Backups > Long-term retention
Select the database
Choose the LTR backup to restore
Click Restore
Configure target database name and server
Click Create

Via Azure CLI

Point-in-Time Restore:

# Restore to a specific point in time
az sql db restore \
  --resource-group "your-resource-group" \
  --server "your-sql-server" \
  --name "LightningEnable" \
  --dest-name "LightningEnable_Restored" \
  --time "2026-01-09T10:30:00Z"

Restore from Deleted Database:

# List deleted databases
az sql db list-deleted \
  --resource-group "your-resource-group" \
  --server "your-sql-server"

# Restore deleted database
az sql db restore \
  --resource-group "your-resource-group" \
  --server "your-sql-server" \
  --name "LightningEnable" \
  --dest-name "LightningEnable_Restored" \
  --deleted-time "2026-01-08T15:00:00Z"

Restore from LTR Backup:

# List available LTR backups
az sql db ltr-backup list \
  --resource-group "your-resource-group" \
  --server "your-sql-server" \
  --database "LightningEnable"

# Restore from specific LTR backup
az sql db ltr-restore \
  --resource-group "your-resource-group" \
  --server "your-sql-server" \
  --dest-database "LightningEnable_Restored" \
  --backup-id "/subscriptions/.../backups/..."

Via PowerShell

# Point-in-time restore
Restore-AzSqlDatabase `
  -ResourceGroupName "your-resource-group" `
  -ServerName "your-sql-server" `
  -DatabaseName "LightningEnable" `
  -TargetDatabaseName "LightningEnable_Restored" `
  -PointInTime "2026-01-09T10:30:00Z"

Recovery Time Objectives

Understanding your recovery targets helps plan appropriate backup strategies.

RTO (Recovery Time Objective)

Definition: Maximum acceptable time to restore service after an incident.

Scenario	Estimated RTO	Factors
Point-in-time restore (same region)	5-30 minutes	Database size, transaction log volume
Geo-restore (different region)	< 12 hours	Geo-replication lag, database size
LTR restore	30-60 minutes	Backup size, network throughput
Full redeployment	1-4 hours	App Service deployment, DNS propagation

To minimize RTO:

Use Premium/Business Critical tier for faster restores
Pre-configure deployment scripts and runbooks
Maintain infrastructure-as-code (Bicep/Terraform)
Test recovery procedures quarterly

RPO (Recovery Point Objective)

Definition: Maximum acceptable data loss measured in time.

Backup Type	RPO	Notes
Transaction log backups	5-10 minutes	Near real-time recovery
Differential backups	12-24 hours	Between differential backups
Geo-redundant replication	< 5 seconds	Async replication to secondary
LTR backups	Days to weeks	Depends on LTR policy

Lightning Enable effective RPO: ~10 minutes (transaction log backup frequency)

Payment Data Consideration

Lightning payments are settled instantly on the Bitcoin network. Even if database recovery loses recent records, the actual Bitcoin payments remain settled. Webhook delivery logs help reconcile any gaps.

Application Data Considerations

What Data Is Backed Up

Azure SQL backups include all Lightning Enable data:

Data Type	Table(s)	Criticality
Merchants	`Merchants`	Critical - merchant configurations
Payments	`Payments`	Critical - payment records
Webhooks	`WebhookSubscriptions`, `WebhookDeliveryLogs`	High - delivery tracking
Refunds	`Refunds`	Critical - refund records
L402 Tokens	`L402Tokens`	Medium - can be regenerated
Hangfire Jobs	`Hangfire.*`	Low - jobs will re-queue

Encrypted Fields

Lightning Enable encrypts sensitive data at rest using AES-256-GCM:

Table	Encrypted Fields
`Merchants`	`ApiKeyHash`, `OpenNodeApiKey`, `WebhookSecretHash`

Critical: Encryption Key Management

The DB_ENCRYPTION_KEY is required to decrypt merchant API keys and secrets.

If the encryption key is lost:

Encrypted merchant data is permanently unrecoverable
Merchants must re-register and regenerate all API keys
OpenNode keys must be re-entered

Backup your encryption key:

Store in Azure Key Vault with soft-delete enabled
Export to secure offline storage (hardware security module or encrypted backup)
Document key recovery procedure
Test key restoration annually

Hangfire Job State

Hangfire background jobs (webhook retries, cleanup tasks) are stored in SQL:

Pending jobs: Will execute after recovery
In-progress jobs: May duplicate; ensure idempotent handlers
Failed jobs: Retained for retry based on policy

Post-recovery action: Review Hangfire dashboard (/hangfire) for stale or duplicate jobs.

Disaster Recovery Procedures

Scenario 1: Database Corruption or Accidental Deletion

Symptoms: Application errors, missing data, database inaccessible

Procedure:

Assess the situation

# Check database status
az sql db show --resource-group "rg" --server "server" --name "LightningEnable"

Identify restore point
- Determine when corruption/deletion occurred
- Choose restore point 5-10 minutes before incident

Perform point-in-time restore

az sql db restore \
  --resource-group "your-rg" \
  --server "your-server" \
  --name "LightningEnable" \
  --dest-name "LightningEnable_Restored" \
  --time "2026-01-09T10:00:00Z"

Update connection string (if using restored database directly)
- Update App Service configuration
- Or rename databases (swap restored with original)
Verify data integrity
- Check merchant count
- Verify recent payments exist
- Test API functionality

Scenario 2: Region-Wide Outage

Symptoms: Azure region unavailable, all services down

Procedure:

Initiate geo-restore

az sql db restore \
  --resource-group "your-secondary-rg" \
  --server "your-secondary-server" \
  --name "LightningEnable" \
  --dest-name "LightningEnable" \
  --source-database-deletion-date "" \
  --restore-point-in-time "" \
  --geo-backup-policy-state "Enabled"

Deploy application to secondary region
- Use infrastructure-as-code templates
- Deploy to pre-provisioned App Service or create new
Update configuration
- Connection string for new database
- Verify all environment variables
Update DNS
- Point api.lightningenable.com to new App Service
- Consider Azure Traffic Manager for automatic failover
Verify SSL/TLS
- Ensure certificates are valid for domain
- App Service managed certificates may need re-creation

Scenario 3: Application Redeployment

Symptoms: Need to redeploy application (code update, configuration fix)

Procedure:

Database unchanged - No action needed for data

Redeploy via GitHub Actions

# Trigger deployment workflow
gh workflow run deploy-production.yml

Or manual deployment

# Build and publish
dotnet publish -c Release -o ./publish

# Deploy to Azure
az webapp deploy \
  --resource-group "your-rg" \
  --name "your-app-service" \
  --src-path ./publish.zip

Run database migrations (if any)

dotnet ef database update --connection "your-connection-string"

Verify deployment
- Health check endpoint: GET /health
- Test payment creation: POST /api/payments

DNS and SSL Considerations

DNS Failover

For production, configure DNS for resilience:

Option 1: Azure Traffic Manager

Automatic health monitoring
Failover to secondary endpoint
Geographic routing support

Option 2: Manual DNS Update

Update A/CNAME record in DNS provider
TTL should be low (300 seconds) for faster propagation

SSL/TLS Certificates

Deployment Type	Certificate Management
Azure App Service (managed)	Automatic renewal
Custom domain	Ensure certificate is backed up or re-issuable
Azure Front Door	Managed certificates available

Post-recovery:

Verify HTTPS works: curl -I https://api.lightningenable.com/health
Check certificate expiration: openssl s_client -connect api.lightningenable.com:443

Verification Checklist

After any recovery, complete this verification checklist:

Database Verification

Database is online and accessible
Connection string is correctly configured
Merchant records exist and count is expected
Recent payments are present (check last 24 hours)
Encrypted fields decrypt successfully (test merchant API key validation)

Application Verification

Health endpoint returns 200: GET /health
API authentication works: GET /api/merchants with valid key
Payment creation works: POST /api/payments (use test mode)
Webhook delivery is functional
Hangfire dashboard accessible: /hangfire
No duplicate/stuck Hangfire jobs

External Integration Verification

OpenNode connectivity: payments reach OpenNode
Stripe webhooks: subscription events processed
DNS resolves to correct endpoint
SSL certificate valid and not expiring soon

Post-Recovery Tasks

Notify stakeholders of recovery completion
Document incident and recovery timeline
Review and reconcile any payment discrepancies
Update runbook with lessons learned
Schedule post-incident review

Recommended Backup Configuration

For production Lightning Enable deployments:

# Set PITR retention to 14 days
az sql db update \
  --resource-group "your-rg" \
  --server "your-server" \
  --name "LightningEnable" \
  --backup-storage-redundancy "Geo"

# Configure LTR policy
az sql db ltr-policy set \
  --resource-group "your-rg" \
  --server "your-server" \
  --database "LightningEnable" \
  --weekly-retention "P4W" \
  --monthly-retention "P12M" \
  --yearly-retention "P5Y" \
  --week-of-year 1

Next Steps

Environment Variables - Secure your configuration
Webhooks - Ensure webhook resilience
Rate Limiting - Protect your API

Azure SQL Automatic Backups​

Point-in-Time Restore (PITR)​

Long-Term Retention (LTR)​

Geo-Redundant Backup Storage​

How to Restore from Backup​

Via Azure Portal​

Via Azure CLI​

Via PowerShell​

Recovery Time Objectives​

RTO (Recovery Time Objective)​

RPO (Recovery Point Objective)​

Application Data Considerations​

What Data Is Backed Up​

Encrypted Fields​

Hangfire Job State​

Disaster Recovery Procedures​

Scenario 1: Database Corruption or Accidental Deletion​

Scenario 2: Region-Wide Outage​

Scenario 3: Application Redeployment​

DNS and SSL Considerations​

DNS Failover​

SSL/TLS Certificates​

Verification Checklist​

Database Verification​

Application Verification​

External Integration Verification​

Post-Recovery Tasks​

Recommended Backup Configuration​

Next Steps​

Azure SQL Automatic Backups

Point-in-Time Restore (PITR)

Long-Term Retention (LTR)

Geo-Redundant Backup Storage

How to Restore from Backup

Via Azure Portal

Via Azure CLI

Via PowerShell

Recovery Time Objectives

RTO (Recovery Time Objective)

RPO (Recovery Point Objective)

Application Data Considerations

What Data Is Backed Up

Encrypted Fields

Hangfire Job State

Disaster Recovery Procedures

Scenario 1: Database Corruption or Accidental Deletion

Scenario 2: Region-Wide Outage

Scenario 3: Application Redeployment

DNS and SSL Considerations

DNS Failover

SSL/TLS Certificates

Verification Checklist

Database Verification

Application Verification

External Integration Verification

Post-Recovery Tasks

Recommended Backup Configuration

Next Steps