Deployment Guide: Update and Migration Procedures¶
Quick Reference¶
Use this guide when: - Updating lesson content - Deploying code changes - Performing schema migrations - Rolling back deployments
Pre-Deployment Checklist¶
For Lesson Updates¶
- Update lesson YAML file
- Increment version number appropriately:
- Patch (1.0.0 → 1.0.1): Typos, small fixes
- Minor (1.0.0 → 1.1.0): New content, improved examples
- Major (1.0.0 → 2.0.0): Complete rewrite, breaking changes
- Add changelog entry with date and changes
- Archive old version if major change
- Test lesson rendering locally
- Validate YAML syntax
- Test quiz questions and answers
For Code Updates¶
- All tests passing locally
- Code review completed
- Security scan passed (CodeQL)
- Documentation updated
- Migration scripts ready (if schema change)
- Rollback plan documented
- Monitoring alerts configured
Deployment Procedures¶
Procedure 1: Deploy Lesson Content Update (Patch/Minor)¶
Time Required: 5-10 minutes
Risk Level: Low
Rollback: Automatic via Git
# 1. Update lesson file
vim fiml/bot/content/lessons/01_understanding_stock_prices.yaml
# Update version and changelog
# version: "1.0.1" # or "1.1.0" for minor
# changelog:
# - version: "1.0.1"
# date: "2024-11-24"
# changes:
# - "Fixed typo in explanation"
# 2. Validate lesson
python scripts/validate_lessons.py --lesson 01_understanding_stock_prices.yaml
# 3. Test rendering
python examples/lesson_content_demo.py
# 4. Commit and push
git add fiml/bot/content/lessons/01_understanding_stock_prices.yaml
git commit -m "Update lesson stock_basics_001 to v1.0.1: Fix typo"
git push origin main
# 5. Deploy (blue-green)
./scripts/deploy.sh --environment production --strategy blue-green
# 6. Monitor for 10 minutes
./scripts/monitor_deployment.sh --duration 10m
# Done! Lesson auto-updates for all users.
User Impact: - In-progress lessons: Continue seamlessly - Completed lessons: Can review updated version - Patch updates: No notification - Minor updates: Optional "Updated" badge
Procedure 2: Deploy Major Lesson Update¶
Time Required: 30 minutes
Risk Level: Medium
Rollback: Via version rollback
# 1. Archive current version
mkdir -p fiml/bot/content/lessons/archive/1.0/
cp fiml/bot/content/lessons/01_understanding_stock_prices.yaml \
fiml/bot/content/lessons/archive/1.0/
# 2. Update lesson with major changes
vim fiml/bot/content/lessons/01_understanding_stock_prices.yaml
# Update to major version
# version: "2.0"
# changelog:
# - version: "2.0"
# date: "2024-11-24"
# changes:
# - "Complete rewrite with new examples"
# - "Updated quiz questions"
# - "New FIML data integration"
# 3. Test thoroughly
python scripts/validate_lessons.py --lesson 01_understanding_stock_prices.yaml
python scripts/test_lesson_migration.py --lesson stock_basics_001 \
--from-version 1.0 --to-version 2.0
# 4. Deploy with feature flag
./scripts/deploy.sh --environment production \
--feature-flag lesson_v2_stock_basics_001=10%
# 5. Monitor beta rollout (10% of users)
./scripts/monitor_deployment.sh --duration 1h --alert-threshold 2%
# 6. If healthy, increase rollout
./scripts/feature_flag.sh lesson_v2_stock_basics_001 50% # 50% users
# Monitor 2 hours
./scripts/monitor_deployment.sh --duration 2h
# 7. Full rollout
./scripts/feature_flag.sh lesson_v2_stock_basics_001 100%
# 8. Remove feature flag after 24h
./scripts/feature_flag.sh lesson_v2_stock_basics_001 --remove
User Impact: - In-progress (old version): Prompted to continue or restart - Completed: Can review new version - New users: Get v2.0 automatically
Procedure 3: Deploy Code Update (No Schema Change)¶
Time Required: 20 minutes
Risk Level: Low-Medium
Rollback: Automatic
# 1. Merge PR to main
# (After code review, tests, security scan)
# 2. Tag release
git tag v1.5.0
git push origin v1.5.0
# 3. Build Docker image
docker build -t fiml-bot:v1.5.0 .
docker push fiml-bot:v1.5.0
# 4. Deploy to green environment
./scripts/deploy.sh --environment green --version v1.5.0
# 5. Run smoke tests on green
./scripts/smoke_test.sh green
# 6. Canary deployment (10% traffic)
./scripts/traffic_switch.sh green 10%
# 7. Monitor for 30 minutes
./scripts/monitor_deployment.sh --duration 30m --rollback-on-error 5%
# 8. If healthy, gradual rollout
./scripts/traffic_switch.sh green 50% # Monitor 1h
./scripts/traffic_switch.sh green 100% # Full cutover
# 9. Mark blue as old
./scripts/mark_environment.sh blue old
Automatic Rollback Triggers: - Error rate > 5% - Response time > 2s p95 - User complaints > 10 in 5min - Health check failures
Procedure 4: Deploy Schema Migration¶
Time Required: 1-2 hours
Risk Level: High
Rollback: Manual (with snapshot restore)
# 1. Create database snapshot
./scripts/db_snapshot.sh production --name pre_migration_v1_1
# 2. Test migration on copy
./scripts/db_copy.sh production test_migration
./scripts/db_migrate.sh test_migration --version 1.1 --dry-run
./scripts/db_migrate.sh test_migration --version 1.1 --execute
# 3. Validate test migration
./scripts/validate_migrated_data.sh test_migration
# 4. If test successful, schedule production migration
# (During low-traffic window: 2-4 AM UTC)
# 5. Put app in maintenance mode
./scripts/maintenance_mode.sh enable \
--message "Upgrading for new features. Back in 10 minutes!"
# 6. Run migration
./scripts/db_migrate.sh production --version 1.1 --execute
# 7. Validate migration
./scripts/validate_migrated_data.sh production
# 8. Deploy new code (compatible with both schemas)
./scripts/deploy.sh --environment production --version v1.6.0
# 9. Smoke test
./scripts/smoke_test.sh production
# 10. Disable maintenance mode
./scripts/maintenance_mode.sh disable
# 11. Monitor closely for 4 hours
./scripts/monitor_deployment.sh --duration 4h --page-on-error
Rollback (if issues):
# 1. Enable maintenance mode
./scripts/maintenance_mode.sh enable
# 2. Restore database snapshot
./scripts/db_restore.sh production pre_migration_v1_1
# 3. Deploy old code version
./scripts/deploy.sh --environment production --version v1.5.0 --force
# 4. Validate data integrity
./scripts/validate_user_progress.sh
# 5. Disable maintenance mode
./scripts/maintenance_mode.sh disable
# 6. Post-mortem
./scripts/generate_incident_report.sh
Monitoring Commands¶
Real-Time Metrics¶
# Watch error rate
./scripts/metrics.sh error_rate --live
# Watch lesson completion rate
./scripts/metrics.sh lesson_completions --live
# Watch XP calculations
./scripts/metrics.sh xp_awards --live --validate
# Watch user feedback
./scripts/metrics.sh user_feedback --live --alert-negative
Health Checks¶
# Check all services
./scripts/health_check.sh all
# Check specific component
./scripts/health_check.sh lesson_engine
./scripts/health_check.sh quiz_system
./scripts/health_check.sh gamification
# Check database
./scripts/health_check.sh database --include-migrations
Common Scenarios¶
Scenario: Fix Typo in Lesson¶
# 1. Edit lesson (increment patch version)
# 2. Validate and test
# 3. Deploy
./scripts/quick_deploy_lesson.sh 01_understanding_stock_prices.yaml
# Auto-updates for all users, no notification
Scenario: Add New Lesson Section¶
# 1. Edit lesson (increment minor version)
# 2. Add changelog entry
# 3. Validate and test
# 4. Deploy with notification
./scripts/deploy_lesson_update.sh 01_understanding_stock_prices.yaml --notify
# Users see "Updated" badge, can review new content
Scenario: Complete Lesson Rewrite¶
# 1. Archive old version
# 2. Create new version (major increment)
# 3. Test thoroughly
# 4. Beta rollout
./scripts/deploy_lesson_update.sh 01_understanding_stock_prices.yaml \
--major --beta-percentage 10
# In-progress users: Choice to continue or restart
# Completed users: Can review new version
# New users: Get new version
Scenario: Emergency Bug Fix¶
# 1. Fix bug
# 2. Fast-track review
# 3. Emergency deploy
./scripts/emergency_deploy.sh --version v1.5.1 \
--reason "Fix XP calculation bug" \
--skip-canary
# Deploys immediately to all users
# Monitoring alerts active for 2 hours
Scenario: Rollback Deployment¶
# If automatic rollback didn't trigger:
./scripts/manual_rollback.sh --to-version v1.5.0 \
--reason "High error rate on v1.5.1"
# Switches all traffic back to old version
# Preserves user data
# Generates incident report
Best Practices¶
Version Numbering¶
- ✅ Use semantic versioning
- ✅ Update changelog with every version
- ✅ Archive major versions
- ✅ Test version migrations
Deployments¶
- ✅ Always deploy to staging first
- ✅ Use blue-green for zero downtime
- ✅ Start with 10% canary
- ✅ Monitor for 30min before increasing
- ✅ Have rollback plan ready
User Data¶
- ✅ Never delete user progress
- ✅ Snapshot before schema changes
- ✅ Validate migrations thoroughly
- ✅ Preserve XP even on rollback
Communication¶
- ✅ Notify users of major changes
- ✅ Be transparent about improvements
- ✅ Offer choices for breaking changes
- ✅ Update in-app help
Emergency Contacts¶
On-Call Engineer: (Use PagerDuty)
Database Admin: (For migration issues)
Product Owner: (For user communication)
Incident Response: 1. Assess severity 2. Trigger rollback if needed 3. Page on-call if critical 4. Communicate to users 5. Post-mortem within 24h
Automation Scripts Location¶
All deployment scripts:
scripts/
├── deploy.sh # Main deployment
├── rollback.sh # Rollback deployment
├── traffic_switch.sh # Traffic management
├── db_migrate.sh # Database migration
├── db_snapshot.sh # Backup creation
├── health_check.sh # Health monitoring
├── validate_lessons.sh # Lesson validation
├── monitor_deployment.sh # Deployment monitoring
└── emergency_deploy.sh # Emergency procedures
Note: Create these scripts based on your infrastructure (K8s, Docker, etc.)
Summary¶
✅ Patch updates: Auto-deploy, no user impact
✅ Minor updates: Auto-deploy with notification
✅ Major updates: Beta rollout, user choice
✅ Code updates: Blue-green deployment
✅ Schema changes: Maintenance window, snapshots
✅ Rollbacks: Automatic on errors, manual available
✅ User data: Always preserved, never lost
Key Principle: User progress and XP are sacred - never lose data, always preserve achievements.