Backup & Disaster Recovery
This guide covers backup strategies, recovery procedures, and disaster recovery planning for Lightning Enable deployments.
Azure SQL Automatic Backups
Lightning Enable uses Azure SQL Database, which provides comprehensive automatic backup capabilities.
Point-in-Time Restore (PITR)
Azure SQL automatically creates backups that enable point-in-time restore:
| Backup Type | Frequency | Retention |
|---|---|---|
| Full backups | Weekly | 7-35 days (configurable) |
| Differential backups | Every 12-24 hours | 7-35 days |
| Transaction log backups | Every 5-10 minutes | 7-35 days |
Default retention: 7 days for Basic/Standard, 35 days for Premium/Business Critical tiers.
Increase PITR retention in Azure Portal under your SQL Database > Settings > Backups > Retention policies.
Long-Term Retention (LTR)
For compliance and archival needs, configure long-term retention:
| Policy | Description | Max Retention |
|---|---|---|
| Weekly (W) | Keep one backup per week | 10 years |
| Monthly (M) | Keep one backup per month | 10 years |
| Yearly (Y) | Keep one backup per year | 10 years |
Example LTR policy:
- Weekly backups retained for 4 weeks
- Monthly backups retained for 12 months
- Yearly backups retained for 5 years
Geo-Redundant Backup Storage
Azure SQL backups are stored in geo-redundant storage (GRS) by default:
- Primary region: Read-access geo-redundant storage (RA-GRS)
- Secondary region: Asynchronously replicated copies
- RPO: < 1 hour for geo-replication
Storage redundancy options:
| Option | Description | Use Case |
|---|---|---|
| LRS | Locally redundant (single datacenter) | Cost-sensitive, non-critical |
| ZRS | Zone-redundant (within region) | High availability within region |
| GRS | Geo-redundant (cross-region) | Disaster recovery (recommended) |
How to Restore from Backup
Via Azure Portal
Point-in-Time Restore:
- Navigate to Azure Portal > SQL databases > your database
- Click Restore in the toolbar
- Select Point-in-time restore type
- Choose the restore point (date/time)
- Enter a new database name (e.g.,
LightningEnable_Restored_20260109) - Select target server (same or different)
- Click Review + create > Create
- Wait for restore to complete (5-30 minutes depending on size)
Restore from LTR Backup:
- Navigate to SQL server > Backups > Long-term retention
- Select the database
- Choose the LTR backup to restore
- Click Restore
- Configure target database name and server
- Click Create
Via Azure CLI
Point-in-Time Restore:
# Restore to a specific point in time
az sql db restore \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--name "LightningEnable" \
--dest-name "LightningEnable_Restored" \
--time "2026-01-09T10:30:00Z"
Restore from Deleted Database:
# List deleted databases
az sql db list-deleted \
--resource-group "your-resource-group" \
--server "your-sql-server"
# Restore deleted database
az sql db restore \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--name "LightningEnable" \
--dest-name "LightningEnable_Restored" \
--deleted-time "2026-01-08T15:00:00Z"
Restore from LTR Backup:
# List available LTR backups
az sql db ltr-backup list \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--database "LightningEnable"
# Restore from specific LTR backup
az sql db ltr-restore \
--resource-group "your-resource-group" \
--server "your-sql-server" \
--dest-database "LightningEnable_Restored" \
--backup-id "/subscriptions/.../backups/..."
Via PowerShell
# Point-in-time restore
Restore-AzSqlDatabase `
-ResourceGroupName "your-resource-group" `
-ServerName "your-sql-server" `
-DatabaseName "LightningEnable" `
-TargetDatabaseName "LightningEnable_Restored" `
-PointInTime "2026-01-09T10:30:00Z"
Recovery Time Objectives
Understanding your recovery targets helps plan appropriate backup strategies.
RTO (Recovery Time Objective)
Definition: Maximum acceptable time to restore service after an incident.
| Scenario | Estimated RTO | Factors |
|---|---|---|
| Point-in-time restore (same region) | 5-30 minutes | Database size, transaction log volume |
| Geo-restore (different region) | < 12 hours | Geo-replication lag, database size |
| LTR restore | 30-60 minutes | Backup size, network throughput |
| Full redeployment | 1-4 hours | App Service deployment, DNS propagation |
To minimize RTO:
- Use Premium/Business Critical tier for faster restores
- Pre-configure deployment scripts and runbooks
- Maintain infrastructure-as-code (Bicep/Terraform)
- Test recovery procedures quarterly
RPO (Recovery Point Objective)
Definition: Maximum acceptable data loss measured in time.
| Backup Type | RPO | Notes |
|---|---|---|
| Transaction log backups | 5-10 minutes | Near real-time recovery |
| Differential backups | 12-24 hours | Between differential backups |
| Geo-redundant replication | < 5 seconds | Async replication to secondary |
| LTR backups | Days to weeks | Depends on LTR policy |
Lightning Enable effective RPO: ~10 minutes (transaction log backup frequency)
Lightning payments are settled instantly on the Bitcoin network. Even if database recovery loses recent records, the actual Bitcoin payments remain settled. Webhook delivery logs help reconcile any gaps.
Application Data Considerations
What Data Is Backed Up
Azure SQL backups include all Lightning Enable data:
| Data Type | Table(s) | Criticality |
|---|---|---|
| Merchants | Merchants | Critical - merchant configurations |
| Payments | Payments | Critical - payment records |
| Webhooks | WebhookSubscriptions, WebhookDeliveryLogs | High - delivery tracking |
| Refunds | Refunds | Critical - refund records |
| L402 Tokens | L402Tokens | Medium - can be regenerated |
| Hangfire Jobs | Hangfire.* | Low - jobs will re-queue |
Encrypted Fields
Lightning Enable encrypts sensitive data at rest using AES-256-GCM:
| Table | Encrypted Fields |
|---|---|
Merchants | ApiKeyHash, OpenNodeApiKey, WebhookSecretHash |
The DB_ENCRYPTION_KEY is required to decrypt merchant API keys and secrets.
If the encryption key is lost:
- Encrypted merchant data is permanently unrecoverable
- Merchants must re-register and regenerate all API keys
- OpenNode keys must be re-entered
Backup your encryption key:
- Store in Azure Key Vault with soft-delete enabled
- Export to secure offline storage (hardware security module or encrypted backup)
- Document key recovery procedure
- Test key restoration annually
Hangfire Job State
Hangfire background jobs (webhook retries, cleanup tasks) are stored in SQL:
- Pending jobs: Will execute after recovery
- In-progress jobs: May duplicate; ensure idempotent handlers
- Failed jobs: Retained for retry based on policy
Post-recovery action: Review Hangfire dashboard (/hangfire) for stale or duplicate jobs.
Disaster Recovery Procedures
Scenario 1: Database Corruption or Accidental Deletion
Symptoms: Application errors, missing data, database inaccessible
Procedure:
-
Assess the situation
# Check database status
az sql db show --resource-group "rg" --server "server" --name "LightningEnable" -
Identify restore point
- Determine when corruption/deletion occurred
- Choose restore point 5-10 minutes before incident
-
Perform point-in-time restore
az sql db restore \
--resource-group "your-rg" \
--server "your-server" \
--name "LightningEnable" \
--dest-name "LightningEnable_Restored" \
--time "2026-01-09T10:00:00Z" -
Update connection string (if using restored database directly)
- Update App Service configuration
- Or rename databases (swap restored with original)
-
Verify data integrity
- Check merchant count
- Verify recent payments exist
- Test API functionality
Scenario 2: Region-Wide Outage
Symptoms: Azure region unavailable, all services down
Procedure:
-
Initiate geo-restore
az sql db restore \
--resource-group "your-secondary-rg" \
--server "your-secondary-server" \
--name "LightningEnable" \
--dest-name "LightningEnable" \
--source-database-deletion-date "" \
--restore-point-in-time "" \
--geo-backup-policy-state "Enabled" -
Deploy application to secondary region
- Use infrastructure-as-code templates
- Deploy to pre-provisioned App Service or create new
-
Update configuration
- Connection string for new database
- Verify all environment variables
-
Update DNS
- Point
api.lightningenable.comto new App Service - Consider Azure Traffic Manager for automatic failover
- Point
-
Verify SSL/TLS
- Ensure certificates are valid for domain
- App Service managed certificates may need re-creation
Scenario 3: Application Redeployment
Symptoms: Need to redeploy application (code update, configuration fix)
Procedure:
-
Database unchanged - No action needed for data
-
Redeploy via GitHub Actions
# Trigger deployment workflow
gh workflow run deploy-production.yml -
Or manual deployment
# Build and publish
dotnet publish -c Release -o ./publish
# Deploy to Azure
az webapp deploy \
--resource-group "your-rg" \
--name "your-app-service" \
--src-path ./publish.zip -
Run database migrations (if any)
dotnet ef database update --connection "your-connection-string" -
Verify deployment
- Health check endpoint:
GET /health - Test payment creation:
POST /api/payments
- Health check endpoint:
DNS and SSL Considerations
DNS Failover
For production, configure DNS for resilience:
Option 1: Azure Traffic Manager
- Automatic health monitoring
- Failover to secondary endpoint
- Geographic routing support
Option 2: Manual DNS Update
- Update A/CNAME record in DNS provider
- TTL should be low (300 seconds) for faster propagation
SSL/TLS Certificates
| Deployment Type | Certificate Management |
|---|---|
| Azure App Service (managed) | Automatic renewal |
| Custom domain | Ensure certificate is backed up or re-issuable |
| Azure Front Door | Managed certificates available |
Post-recovery:
- Verify HTTPS works:
curl -I https://api.lightningenable.com/health - Check certificate expiration:
openssl s_client -connect api.lightningenable.com:443
Verification Checklist
After any recovery, complete this verification checklist:
Database Verification
- Database is online and accessible
- Connection string is correctly configured
- Merchant records exist and count is expected
- Recent payments are present (check last 24 hours)
- Encrypted fields decrypt successfully (test merchant API key validation)
Application Verification
- Health endpoint returns 200:
GET /health - API authentication works:
GET /api/merchantswith valid key - Payment creation works:
POST /api/payments(use test mode) - Webhook delivery is functional
- Hangfire dashboard accessible:
/hangfire - No duplicate/stuck Hangfire jobs
External Integration Verification
- OpenNode connectivity: payments reach OpenNode
- Stripe webhooks: subscription events processed
- DNS resolves to correct endpoint
- SSL certificate valid and not expiring soon
Post-Recovery Tasks
- Notify stakeholders of recovery completion
- Document incident and recovery timeline
- Review and reconcile any payment discrepancies
- Update runbook with lessons learned
- Schedule post-incident review
Recommended Backup Configuration
For production Lightning Enable deployments:
# Set PITR retention to 14 days
az sql db update \
--resource-group "your-rg" \
--server "your-server" \
--name "LightningEnable" \
--backup-storage-redundancy "Geo"
# Configure LTR policy
az sql db ltr-policy set \
--resource-group "your-rg" \
--server "your-server" \
--database "LightningEnable" \
--weekly-retention "P4W" \
--monthly-retention "P12M" \
--yearly-retention "P5Y" \
--week-of-year 1
Next Steps
- Environment Variables - Secure your configuration
- Webhooks - Ensure webhook resilience
- Rate Limiting - Protect your API