Database Connection Pool Management
Last Updated: 2025-01-10
Status: Active
Owner: Engineering Team
Overview
This document describes the connection pool management strategy for FlowPOS services connecting to Cloud SQL PostgreSQL instances. Proper connection pool configuration is critical to prevent connection exhaustion errors that can cause service failures.
Architecture
Services and Their Connection Requirements
| Service | Max Instances | Connections per Instance | Total Max Connections |
|---|---|---|---|
| Metabase | 2 | 10-15 (HikariCP default) | ~20-30 |
| Backend (Staging) | 10 | 3 (DB_POOL_MAX) | 30 |
| Backend (Production) | 10 | 3 (DB_POOL_MAX) | 30 |
| Reserved/Admin | - | - | ~10 |
| Total Estimated | - | - | ~80-90 |
Cloud SQL Instance Configuration
- Staging:
metabase-db-staging(hosts bothmetabaseandflowpos_stagingdatabases) - Production:
metabase-db-production(hosts bothmetabaseandflowpos_productiondatabases)
Important: Both Metabase metadata and application databases share the same Cloud SQL instance, so total connections are pooled across both databases.
Connection Pool Configuration
Metabase
Metabase uses HikariCP for connection pooling. The primary control mechanism is instance scaling, not per-instance pool size.
Configuration:
- Max Instances: 2 (configured in
deploy-metabase.yml) - Min Instances: 1
- Pool Size: Managed by HikariCP (default ~10-15 connections per instance)
- Control: Via
--max-instancesflag in Cloud Run deployment
Notes:
- Metabase's HikariCP pool size is not easily configurable via environment variables when using
MB_DB_*connection variables - The primary control mechanism is limiting the number of Metabase instances
- Each instance typically maintains 10-15 connections in its pool
To adjust Metabase connections:
- Reduce
--max-instancesindeploy-metabase.yml(currently 2) - Consider using
MB_DB_CONNECTION_URIwith pool parameters if Metabase version supports it
Backend Services
Backend services use pg (node-postgres) connection pooling with configurable limits.
Configuration:
- Max Instances: 10 (configured in deployment workflows)
- Connections per Instance: 3 (via
DB_POOL_MAXenvironment variable) - Default: 3 (if
DB_POOL_MAXsecret not set) - Control: Via
DB_POOL_MAXGitHub secret or environment variable
Environment Variables:
DB_POOL_MAX=3 # Max connections per instance (default: 3)
DB_POOL_MIN=0 # Min idle connections (default: 0)
DB_POOL_IDLE_TIMEOUT=300000 # Idle timeout in ms (default: 5 minutes)
DB_POOL_CONNECTION_TIMEOUT=10000 # Connection timeout in ms (default: 10 seconds)
DB_POOL_MONITORING=true # Enable pool monitoring (default: true)
To adjust backend connections:
- Set
DB_POOL_MAXsecret in GitHub Actions (staging/production environments) - Reduce
--max-instancesin deployment workflows if needed - Adjust timeout values via environment variables
Example: To reduce backend connections:
# In GitHub Secrets → staging environment
DB_POOL_MAX=2 # Reduces from 3 to 2 per instance (10 instances × 2 = 20 total)
Connection Exhaustion Prevention
Warning Signs
-
Log Errors:
FATAL: remaining connection slots are reserved for non-replication superuser connections -
Symptoms:
- Database connection failures
- Metabase queries timing out
- Backend API errors when accessing database
- Statement cancellation errors
-
Monitoring:
-- Check active connections
SELECT count(*) as active_connections, datname
FROM pg_stat_activity
GROUP BY datname;
-- Check connections by user
SELECT count(*) as connections, usename
FROM pg_stat_activity
GROUP BY usename;
Immediate Fix
Use the emergency fix script:
cd scripts
./fix-metabase-connection-pool.sh
Or manually reduce instances:
# Reduce Metabase instances
gcloud run services update flowpos-metabase-staging \
--region=us-central1 \
--project=barto-dev \
--max-instances=2
# Reduce backend instances (if needed)
gcloud run services update flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--max-instances=5
Long-term Solutions
-
Increase Cloud SQL Max Connections:
gcloud sql instances patch metabase-db-staging \
--database-flags=max_connections=200 \
--project=barto-devNote: Cloud SQL max connections depends on instance machine type. Check current limit first:
gcloud sql instances describe metabase-db-staging \
--project=barto-dev \
--format="value(settings.databaseFlags[?(@.name=='max_connections')].value)" -
Optimize Connection Pool Settings:
- Review actual connection usage patterns
- Adjust
DB_POOL_MAXbased on real usage (not peak theoretical) - Consider reducing max instances if connection usage is consistently high
-
Implement Connection Monitoring:
- Set up alerts for connection usage > 80%
- Monitor connection pool metrics
- Track connection creation/destruction patterns
Monitoring
Cloud SQL Connection Usage
The deployment workflows include connection monitoring steps that:
- Check Cloud SQL max_connections setting
- Calculate estimated connection allocation
- Warn if usage exceeds 60% or 80% thresholds
- Provide recommendations for optimization
Manual Monitoring
Check current connection usage:
-- Total active connections
SELECT count(*) as total_connections FROM pg_stat_activity;
-- Connections by database
SELECT
datname,
count(*) as connections,
count(*) FILTER (WHERE state = 'active') as active,
count(*) FILTER (WHERE state = 'idle') as idle
FROM pg_stat_activity
GROUP BY datname
ORDER BY connections DESC;
-- Connections by application
SELECT
application_name,
count(*) as connections
FROM pg_stat_activity
GROUP BY application_name
ORDER BY connections DESC;
Check Cloud SQL instance settings:
gcloud sql instances describe metabase-db-staging \
--project=barto-dev \
--format="yaml(settings.databaseFlags)"
Alerts
Set up monitoring alerts for:
- Connection usage > 80% of max_connections
- Connection exhaustion errors in logs
- Connection pool errors from applications
Best Practices
-
Conservative Pool Sizing:
- Start with lower connection limits and increase only if needed
- Account for multiple instances (instances × connections_per_instance)
- Reserve connections for admin/maintenance (typically 10-15%)
-
Connection Timeouts:
- Set reasonable connection timeouts
- Use idle timeouts to release unused connections
- Implement retry logic with exponential backoff
-
Instance Scaling:
- Balance between instance scaling and connection limits
- Consider reducing instances before increasing connection pools
- Monitor both instance count and connection usage
-
Regular Review:
- Review connection usage patterns monthly
- Adjust limits based on actual usage, not theoretical maximums
- Document changes and rationale
Troubleshooting
Connection Exhaustion Error
Error:
FATAL: remaining connection slots are reserved for non-replication superuser connections
Steps:
- Check current connection usage (see Monitoring section)
- Identify which service is using most connections
- Temporarily reduce instances or connection pool size
- Increase Cloud SQL max_connections if appropriate
- Review connection pool configuration
Metabase Connection Issues
Symptoms:
- Metabase queries fail
- "Failed to connect to database" errors
Steps:
-
Verify Metabase instances are running:
gcloud run services describe flowpos-metabase-staging \
--region=us-central1 --project=barto-dev -
Check Metabase logs for connection errors:
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-metabase-staging" \
--limit=50 --project=barto-dev -
Verify Cloud SQL instance is accessible
-
Check if connection pool is exhausted (see Monitoring)
Backend Connection Issues
Symptoms:
- Database query timeouts
- Connection pool errors in logs
Steps:
- Check
DB_POOL_MAXsetting in deployment - Review backend connection pool metrics
- Verify database connection string is correct
- Check if total connections exceed Cloud SQL limits
Related Documentation
Critical Finding: Cloud SQL Instance Sizing
Important: The staging Cloud SQL instance (metabase-db-staging) is currently db-f1-micro, which has a maximum of ~25 connections. However, the estimated connection usage is ~64 connections (256% of capacity). This is the root cause of connection exhaustion errors.
Solutions:
-
Upgrade instance tier (recommended):
./scripts/upgrade-cloudsql-instance.shOr manually:
gcloud sql instances patch metabase-db-staging \
--tier=db-g1-small \
--project=barto-dev -
Increase max_connections (if instance supports it):
gcloud sql instances patch metabase-db-staging \
--database-flags=max_connections=100 \
--project=barto-dev -
Reduce connection usage (temporary mitigation):
- Reduce backend
DB_POOL_MAXto 2 (saves 10 connections) - Reduce backend max-instances to 5 (saves 15 connections)
- Keep Metabase at 2 instances (already optimized)
- Reduce backend
Cloud SQL Connection Limits by Tier:
db-f1-micro: ~25 connections (current staging)db-g1-small: ~100 connections (recommended minimum)db-n1-standard-1: ~100-200 connectionsdb-n1-standard-2+: 200+ connections
Change Log
- 2025-01-10: Initial documentation created
- Added connection pool configuration details
- Documented Metabase and Backend pool settings
- Added monitoring and troubleshooting sections
- Created emergency fix procedures
- CRITICAL: Identified root cause - Cloud SQL instance is too small (db-f1-micro)
- Added instance upgrade procedures and connection limits by tier