FEL Network Troubleshooting Guide
Problem Summary
The FEL service is experiencing ETIMEDOUT errors when trying to connect to external FEL provider endpoints. This indicates network-level connectivity issues between Google Cloud Run and the FEL provider's servers.
Error Details
- Error Code:
ETIMEDOUT - HTTP Status: 502 Bad Gateway
- Message: "Network timeout - Could not reach FEL provider"
- Observed Behavior: Request times out in ~550ms (much faster than the configured 30s timeout)
Solution 1: Code Changes (Implemented ✅)
Updated HTTP/HTTPS Agent Configuration
The fel.modules.ts has been updated with explicit TCP socket timeouts and connection management:
HttpModule.register({
timeout: 30000,
maxRedirects: 5,
httpAgent: new http.Agent({
keepAlive: true,
keepAliveMsecs: 30000,
timeout: 30000,
scheduling: "lifo",
}),
httpsAgent: new https.Agent({
keepAlive: true,
keepAliveMsecs: 30000,
timeout: 30000,
scheduling: "lifo",
rejectUnauthorized: true,
}),
})
Benefits:
- Explicit socket-level timeout configuration
- Connection keepAlive for better performance
- LIFO scheduling for improved request handling
Solution 2: Network Configuration Checks
Step 1: Verify FEL Provider Endpoint
First, verify that the FEL provider's endpoint is accessible:
# Check if the FEL provider endpoint is reachable
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo?NIT=000017195594&DATA1=SHARED_GETINFONITcom&DATA2=NIT|17195594&USERNAME=<USERNAME> \
-H "Authorization: Bearer <TOKEN>"
Step 2: Test from Cloud Run Container
Deploy a debug container to test connectivity from within Cloud Run:
# Deploy a debug container
gcloud run deploy debug-container \
--image=gcr.io/google.com/cloudsdktool/cloud-sdk:alpine \
--region=us-central1 \
--project=barto-dev \
--command=/bin/sh \
--args=-c,"sleep 3600"
# Execute a command in the running container
gcloud run services proxy debug-container --region=us-central1 --project=barto-dev
Then test connectivity:
# From inside the container
apk add curl
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo
Step 3: Check Cloud Run Network Settings
3.1 Check Egress Settings
Verify that Cloud Run has proper egress configuration:
# Check current Cloud Run service configuration
gcloud run services describe flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--format=json | jq '.spec.template.spec.containers[0]'
3.2 Configure VPC Connector (if needed)
If the FEL provider requires VPC connectivity:
# Create a VPC connector
gcloud compute networks vpc-access connectors create fel-connector \
--region=us-central1 \
--network=default \
--range=10.8.0.0/28 \
--project=barto-dev
# Update Cloud Run service to use the VPC connector
gcloud run services update flowpos-backend \
--vpc-connector=fel-connector \
--vpc-egress=all-traffic \
--region=us-central1 \
--project=barto-dev
3.3 Check Firewall Rules
Ensure there are no firewall rules blocking outbound traffic:
# List firewall rules
gcloud compute firewall-rules list --project=barto-dev
# If needed, create a rule to allow outbound traffic
gcloud compute firewall-rules create allow-fel-outbound \
--direction=EGRESS \
--priority=1000 \
--network=default \
--action=ALLOW \
--rules=tcp:443,tcp:80 \
--destination-ranges=0.0.0.0/0 \
--project=barto-dev
Step 4: Configure Cloud NAT (Recommended)
Cloud NAT provides a stable outbound IP address that can be whitelisted by the FEL provider:
# Create a Cloud Router
gcloud compute routers create fel-router \
--network=default \
--region=us-central1 \
--project=barto-dev
# Create a Cloud NAT configuration
gcloud compute routers nats create fel-nat \
--router=fel-router \
--region=us-central1 \
--auto-allocate-nat-external-ips \
--nat-all-subnet-ip-ranges \
--enable-logging \
--project=barto-dev
# Get the allocated NAT IP addresses
gcloud compute routers describe fel-router \
--region=us-central1 \
--project=barto-dev \
--format="get(nats[0].natIps)"
Provide these IP addresses to the FEL provider for whitelisting.
Step 5: DNS Resolution Check
Verify DNS resolution works correctly:
# Check DNS resolution for FEL provider
nslookup <FEL_PROVIDER_DOMAIN>
dig <FEL_PROVIDER_DOMAIN>
# From Cloud Run (if possible)
gcloud run services proxy flowpos-backend --region=us-central1 --project=barto-dev
# Then inside the container:
nslookup <FEL_PROVIDER_DOMAIN>
Step 6: Check for SSL/TLS Issues
If the FEL provider uses custom certificates:
# Test SSL certificate
openssl s_client -connect <FEL_PROVIDER_ENDPOINT>:443 -servername <FEL_PROVIDER_DOMAIN>
# Check certificate validity
curl -v https://<FEL_PROVIDER_ENDPOINT>
Step 7: Enable Cloud Run Logging
Ensure detailed logging is enabled to capture network issues:
# Update Cloud Run service with more verbose logging
gcloud run services update flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--set-env-vars="LOG_LEVEL=debug"
Step 8: Monitor and Alert
Set up monitoring for FEL endpoint availability:
# Create an uptime check in Cloud Monitoring
gcloud monitoring uptime create fel-provider-check \
--resource-type=uptime-url \
--host=<FEL_PROVIDER_DOMAIN> \
--path=/sharedInfo \
--project=barto-dev
Checklist for Network Configuration
- Verify FEL provider endpoint is accessible from your location
- Test connectivity from Cloud Run container
- Check Cloud Run egress settings
- Verify firewall rules allow outbound traffic
- Configure VPC connector if private network access is needed
- Set up Cloud NAT for stable outbound IP
- Provide NAT IP addresses to FEL provider for whitelisting
- Verify DNS resolution works correctly
- Check SSL/TLS certificate validity
- Enable detailed logging
- Set up uptime monitoring and alerts
Common Issues and Solutions
Issue 1: FEL Provider Blocks Cloud Run IPs
Solution: Use Cloud NAT to provide a stable outbound IP and have it whitelisted.
Issue 2: Intermittent Timeouts
Solution:
- Enable keepAlive connections (already implemented)
- Increase retry attempts with exponential backoff
- Use connection pooling
Issue 3: DNS Resolution Failures
Solution:
- Use Cloud DNS for reliable DNS resolution
- Add custom DNS configuration to Cloud Run
Issue 4: Certificate Validation Errors
Solution:
- Ensure the FEL provider uses valid SSL certificates
- If using self-signed certificates, configure trust store
Testing the Fix
After implementing the changes:
- Test from Postman/curl:
curl --location --request POST 'https://flowpos-backend-723334209984.us-central1.run.app/fel/get-shared-info' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_TOKEN>' \
--data '{
"businessId": "097f8743-a317-4169-a793-c2a0db8fba2b",
"data1": "SHARED_GETINFONITcom",
"data2": "NIT|17195594"
}'
- Monitor logs:
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-backend AND severity>=ERROR" \
--limit=50 \
--project=barto-dev \
--format=json
- Check for successful responses:
- Look for HTTP 200 status codes
- Verify no ETIMEDOUT errors in logs
- Confirm FEL provider returns expected data
Additional Resources
Contact
If issues persist after following this guide:
- Check with the FEL provider for any service outages
- Review their API documentation for any network requirements
- Contact their support to verify your IP addresses are whitelisted
- Check their rate limiting policies