Skip to main content

FEL Network Troubleshooting Guide

Problem Summary

The FEL service is experiencing ETIMEDOUT errors when trying to connect to external FEL provider endpoints. This indicates network-level connectivity issues between Google Cloud Run and the FEL provider's servers.

Error Details

  • Error Code: ETIMEDOUT
  • HTTP Status: 502 Bad Gateway
  • Message: "Network timeout - Could not reach FEL provider"
  • Observed Behavior: Request times out in ~550ms (much faster than the configured 30s timeout)

Solution 1: Code Changes (Implemented ✅)

Updated HTTP/HTTPS Agent Configuration

The fel.modules.ts has been updated with explicit TCP socket timeouts and connection management:

HttpModule.register({
timeout: 30000,
maxRedirects: 5,
httpAgent: new http.Agent({
keepAlive: true,
keepAliveMsecs: 30000,
timeout: 30000,
scheduling: "lifo",
}),
httpsAgent: new https.Agent({
keepAlive: true,
keepAliveMsecs: 30000,
timeout: 30000,
scheduling: "lifo",
rejectUnauthorized: true,
}),
})

Benefits:

  • Explicit socket-level timeout configuration
  • Connection keepAlive for better performance
  • LIFO scheduling for improved request handling

Solution 2: Network Configuration Checks

Step 1: Verify FEL Provider Endpoint

First, verify that the FEL provider's endpoint is accessible:

# Check if the FEL provider endpoint is reachable
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo?NIT=000017195594&DATA1=SHARED_GETINFONITcom&DATA2=NIT|17195594&USERNAME=<USERNAME> \
-H "Authorization: Bearer <TOKEN>"

Step 2: Test from Cloud Run Container

Deploy a debug container to test connectivity from within Cloud Run:

# Deploy a debug container
gcloud run deploy debug-container \
--image=gcr.io/google.com/cloudsdktool/cloud-sdk:alpine \
--region=us-central1 \
--project=barto-dev \
--command=/bin/sh \
--args=-c,"sleep 3600"

# Execute a command in the running container
gcloud run services proxy debug-container --region=us-central1 --project=barto-dev

Then test connectivity:

# From inside the container
apk add curl
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo

Step 3: Check Cloud Run Network Settings

3.1 Check Egress Settings

Verify that Cloud Run has proper egress configuration:

# Check current Cloud Run service configuration
gcloud run services describe flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--format=json | jq '.spec.template.spec.containers[0]'

3.2 Configure VPC Connector (if needed)

If the FEL provider requires VPC connectivity:

# Create a VPC connector
gcloud compute networks vpc-access connectors create fel-connector \
--region=us-central1 \
--network=default \
--range=10.8.0.0/28 \
--project=barto-dev

# Update Cloud Run service to use the VPC connector
gcloud run services update flowpos-backend \
--vpc-connector=fel-connector \
--vpc-egress=all-traffic \
--region=us-central1 \
--project=barto-dev

3.3 Check Firewall Rules

Ensure there are no firewall rules blocking outbound traffic:

# List firewall rules
gcloud compute firewall-rules list --project=barto-dev

# If needed, create a rule to allow outbound traffic
gcloud compute firewall-rules create allow-fel-outbound \
--direction=EGRESS \
--priority=1000 \
--network=default \
--action=ALLOW \
--rules=tcp:443,tcp:80 \
--destination-ranges=0.0.0.0/0 \
--project=barto-dev

Cloud NAT provides a stable outbound IP address that can be whitelisted by the FEL provider:

# Create a Cloud Router
gcloud compute routers create fel-router \
--network=default \
--region=us-central1 \
--project=barto-dev

# Create a Cloud NAT configuration
gcloud compute routers nats create fel-nat \
--router=fel-router \
--region=us-central1 \
--auto-allocate-nat-external-ips \
--nat-all-subnet-ip-ranges \
--enable-logging \
--project=barto-dev

# Get the allocated NAT IP addresses
gcloud compute routers describe fel-router \
--region=us-central1 \
--project=barto-dev \
--format="get(nats[0].natIps)"

Provide these IP addresses to the FEL provider for whitelisting.

Step 5: DNS Resolution Check

Verify DNS resolution works correctly:

# Check DNS resolution for FEL provider
nslookup <FEL_PROVIDER_DOMAIN>
dig <FEL_PROVIDER_DOMAIN>

# From Cloud Run (if possible)
gcloud run services proxy flowpos-backend --region=us-central1 --project=barto-dev
# Then inside the container:
nslookup <FEL_PROVIDER_DOMAIN>

Step 6: Check for SSL/TLS Issues

If the FEL provider uses custom certificates:

# Test SSL certificate
openssl s_client -connect <FEL_PROVIDER_ENDPOINT>:443 -servername <FEL_PROVIDER_DOMAIN>

# Check certificate validity
curl -v https://<FEL_PROVIDER_ENDPOINT>

Step 7: Enable Cloud Run Logging

Ensure detailed logging is enabled to capture network issues:

# Update Cloud Run service with more verbose logging
gcloud run services update flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--set-env-vars="LOG_LEVEL=debug"

Step 8: Monitor and Alert

Set up monitoring for FEL endpoint availability:

# Create an uptime check in Cloud Monitoring
gcloud monitoring uptime create fel-provider-check \
--resource-type=uptime-url \
--host=<FEL_PROVIDER_DOMAIN> \
--path=/sharedInfo \
--project=barto-dev

Checklist for Network Configuration

  • Verify FEL provider endpoint is accessible from your location
  • Test connectivity from Cloud Run container
  • Check Cloud Run egress settings
  • Verify firewall rules allow outbound traffic
  • Configure VPC connector if private network access is needed
  • Set up Cloud NAT for stable outbound IP
  • Provide NAT IP addresses to FEL provider for whitelisting
  • Verify DNS resolution works correctly
  • Check SSL/TLS certificate validity
  • Enable detailed logging
  • Set up uptime monitoring and alerts

Common Issues and Solutions

Issue 1: FEL Provider Blocks Cloud Run IPs

Solution: Use Cloud NAT to provide a stable outbound IP and have it whitelisted.

Issue 2: Intermittent Timeouts

Solution:

  • Enable keepAlive connections (already implemented)
  • Increase retry attempts with exponential backoff
  • Use connection pooling

Issue 3: DNS Resolution Failures

Solution:

  • Use Cloud DNS for reliable DNS resolution
  • Add custom DNS configuration to Cloud Run

Issue 4: Certificate Validation Errors

Solution:

  • Ensure the FEL provider uses valid SSL certificates
  • If using self-signed certificates, configure trust store

Testing the Fix

After implementing the changes:

  1. Test from Postman/curl:
curl --location --request POST 'https://flowpos-backend-723334209984.us-central1.run.app/fel/get-shared-info' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_TOKEN>' \
--data '{
"businessId": "097f8743-a317-4169-a793-c2a0db8fba2b",
"data1": "SHARED_GETINFONITcom",
"data2": "NIT|17195594"
}'
  1. Monitor logs:
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-backend AND severity>=ERROR" \
--limit=50 \
--project=barto-dev \
--format=json
  1. Check for successful responses:
  • Look for HTTP 200 status codes
  • Verify no ETIMEDOUT errors in logs
  • Confirm FEL provider returns expected data

Additional Resources

Contact

If issues persist after following this guide:

  1. Check with the FEL provider for any service outages
  2. Review their API documentation for any network requirements
  3. Contact their support to verify your IP addresses are whitelisted
  4. Check their rate limiting policies