FEL Network Troubleshooting Guide

Problem Summary

The FEL service is experiencing ETIMEDOUT errors when trying to connect to external FEL provider endpoints. This indicates network-level connectivity issues between Google Cloud Run and the FEL provider's servers.

Error Details

Error Code: ETIMEDOUT
HTTP Status: 502 Bad Gateway
Message: "Network timeout - Could not reach FEL provider"
Observed Behavior: Request times out in ~550ms (much faster than the configured 30s timeout)

Solution 1: Code Changes (Implemented ✅)

Updated HTTP/HTTPS Agent Configuration

The fel.modules.ts has been updated with explicit TCP socket timeouts and connection management:

HttpModule.register({
  timeout: 30000,
  maxRedirects: 5,
  httpAgent: new http.Agent({
    keepAlive: true,
    keepAliveMsecs: 30000,
    timeout: 30000,
    scheduling: "lifo",
  }),
  httpsAgent: new https.Agent({
    keepAlive: true,
    keepAliveMsecs: 30000,
    timeout: 30000,
    scheduling: "lifo",
    rejectUnauthorized: true,
  }),
})

Benefits:

Explicit socket-level timeout configuration
Connection keepAlive for better performance
LIFO scheduling for improved request handling

Solution 2: Network Configuration Checks

Step 1: Verify FEL Provider Endpoint

First, verify that the FEL provider's endpoint is accessible:

# Check if the FEL provider endpoint is reachable
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo?NIT=000017195594&DATA1=SHARED_GETINFONITcom&DATA2=NIT|17195594&USERNAME=<USERNAME> \
  -H "Authorization: Bearer <TOKEN>"

Step 2: Test from Cloud Run Container

Deploy a debug container to test connectivity from within Cloud Run:

# Deploy a debug container
gcloud run deploy debug-container \
  --image=gcr.io/google.com/cloudsdktool/cloud-sdk:alpine \
  --region=us-central1 \
  --project=barto-dev \
  --command=/bin/sh \
  --args=-c,"sleep 3600"

# Execute a command in the running container
gcloud run services proxy debug-container --region=us-central1 --project=barto-dev

Then test connectivity:

# From inside the container
apk add curl
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo

Step 3: Check Cloud Run Network Settings

3.1 Check Egress Settings

Verify that Cloud Run has proper egress configuration:

# Check current Cloud Run service configuration
gcloud run services describe flowpos-backend \
  --region=us-central1 \
  --project=barto-dev \
  --format=json | jq '.spec.template.spec.containers[0]'

3.2 Configure VPC Connector (if needed)

If the FEL provider requires VPC connectivity:

# Create a VPC connector
gcloud compute networks vpc-access connectors create fel-connector \
  --region=us-central1 \
  --network=default \
  --range=10.8.0.0/28 \
  --project=barto-dev

# Update Cloud Run service to use the VPC connector
gcloud run services update flowpos-backend \
  --vpc-connector=fel-connector \
  --vpc-egress=all-traffic \
  --region=us-central1 \
  --project=barto-dev

3.3 Check Firewall Rules

Ensure there are no firewall rules blocking outbound traffic:

# List firewall rules
gcloud compute firewall-rules list --project=barto-dev

# If needed, create a rule to allow outbound traffic
gcloud compute firewall-rules create allow-fel-outbound \
  --direction=EGRESS \
  --priority=1000 \
  --network=default \
  --action=ALLOW \
  --rules=tcp:443,tcp:80 \
  --destination-ranges=0.0.0.0/0 \
  --project=barto-dev

Step 4: Configure Cloud NAT (Recommended)

Cloud NAT provides a stable outbound IP address that can be whitelisted by the FEL provider:

# Create a Cloud Router
gcloud compute routers create fel-router \
  --network=default \
  --region=us-central1 \
  --project=barto-dev

# Create a Cloud NAT configuration
gcloud compute routers nats create fel-nat \
  --router=fel-router \
  --region=us-central1 \
  --auto-allocate-nat-external-ips \
  --nat-all-subnet-ip-ranges \
  --enable-logging \
  --project=barto-dev

# Get the allocated NAT IP addresses
gcloud compute routers describe fel-router \
  --region=us-central1 \
  --project=barto-dev \
  --format="get(nats[0].natIps)"

Provide these IP addresses to the FEL provider for whitelisting.

Step 5: DNS Resolution Check

Verify DNS resolution works correctly:

# Check DNS resolution for FEL provider
nslookup <FEL_PROVIDER_DOMAIN>
dig <FEL_PROVIDER_DOMAIN>

# From Cloud Run (if possible)
gcloud run services proxy flowpos-backend --region=us-central1 --project=barto-dev
# Then inside the container:
nslookup <FEL_PROVIDER_DOMAIN>

Step 6: Check for SSL/TLS Issues

If the FEL provider uses custom certificates:

# Test SSL certificate
openssl s_client -connect <FEL_PROVIDER_ENDPOINT>:443 -servername <FEL_PROVIDER_DOMAIN>

# Check certificate validity
curl -v https://<FEL_PROVIDER_ENDPOINT>

Step 7: Enable Cloud Run Logging

Ensure detailed logging is enabled to capture network issues:

# Update Cloud Run service with more verbose logging
gcloud run services update flowpos-backend \
  --region=us-central1 \
  --project=barto-dev \
  --set-env-vars="LOG_LEVEL=debug"

Step 8: Monitor and Alert

Set up monitoring for FEL endpoint availability:

# Create an uptime check in Cloud Monitoring
gcloud monitoring uptime create fel-provider-check \
  --resource-type=uptime-url \
  --host=<FEL_PROVIDER_DOMAIN> \
  --path=/sharedInfo \
  --project=barto-dev

Checklist for Network Configuration

Common Issues and Solutions

Issue 1: FEL Provider Blocks Cloud Run IPs

Solution: Use Cloud NAT to provide a stable outbound IP and have it whitelisted.

Issue 2: Intermittent Timeouts

Solution:

Enable keepAlive connections (already implemented)
Increase retry attempts with exponential backoff
Use connection pooling

Issue 3: DNS Resolution Failures

Solution:

Use Cloud DNS for reliable DNS resolution
Add custom DNS configuration to Cloud Run

Issue 4: Certificate Validation Errors

Solution:

Ensure the FEL provider uses valid SSL certificates
If using self-signed certificates, configure trust store

Testing the Fix

After implementing the changes:

Test from Postman/curl:

curl --location --request POST 'https://flowpos-backend-723334209984.us-central1.run.app/fel/get-shared-info' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_TOKEN>' \
--data '{
    "businessId": "097f8743-a317-4169-a793-c2a0db8fba2b",
    "data1": "SHARED_GETINFONITcom",
    "data2": "NIT|17195594"
}'

Monitor logs:

gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-backend AND severity>=ERROR" \
  --limit=50 \
  --project=barto-dev \
  --format=json

Check for successful responses:

Look for HTTP 200 status codes
Verify no ETIMEDOUT errors in logs
Confirm FEL provider returns expected data

Additional Resources

Contact

If issues persist after following this guide:

Check with the FEL provider for any service outages
Review their API documentation for any network requirements
Contact their support to verify your IP addresses are whitelisted
Check their rate limiting policies

URL routing and 406 errors on `getSharedInfo`

This section covers the June 2026 production incident where POST /fel/get-shared-info returned 406 for business 51ebb168 with NIT|3435555.

Root cause summary

Three compounding defects:

#	Defect	Symptom
1	`getSharedInfo` ignored `USE_RPA_FEL_API`	Always hit direct Digifact, never the RPA proxy
2	Direct Digifact URL hardcoded to test host	Wrong endpoint even without the proxy
3	Catch block read `errorDetails?.Mensaje`; Digifact returns `REQUEST[0].Mensaje`	"Error fetching shared info" instead of the real provider message

URL resolution (how it works after the fix)

POST /fel/get-shared-info
         │
         ▼
configService.get("USE_RPA_FEL_API") === "true"?
         │
   YES   │   NO
    ┌────┘    └────────────────────────────────────────┐
    ▼                                                   ▼
getSharedInfoViaRpaFelApi()              getCertifierApiUrl(certifier, NODE_ENV)
→ ProviderRpaFelApiService                + optional DIGIFACT_API_URL override
→ ${baseUrl}/QueryPayerInfo                       │
    │                                              ▼
    │ baseUrl resolved by:          NODE_ENV=production|beta → felgtaws.digifact.com.gt
    │ RPA_FEL_API_URL override      anything else            → felgttestaws.digifact.com.gt
    │ or NODE_ENV=production
    │   → fel.rpapos.com/api/fel
    │ else
    │   → fel-dev.rpapos.com/api/fel

Doppler prd: USE_RPA_FEL_API=true, NODE_ENV=production → RPA path, fel.rpapos.com.

Doppler stg: same flag values → RPA path, fel.rpapos.com.

Emitter NIT vs certifier registry NIT

These are different NITs and the distinction matters when diagnosing provider errors.

Field	Where it lives	Purpose
Emitter NIT	`business.tax_id`	The merchant's tax ID — who is issuing the document
Certifier NIT	`fel_certifier.nit`	Digifact's own SAT-registered NIT

The error "El NIT 000017677254, no cuenta con acceso API" refers to the emitter NIT (17677254). Digifact's API requires the issuing merchant to be explicitly enabled for API access on their platform. This is separate from having a valid token.

When you see this error:

The token and URL are correct.
Digifact has not enabled API access for that specific emitter NIT on production.
Fix: contact Digifact support and ask them to enable API access for emitter NIT 17677254 on felgtaws.digifact.com.gt.

Reading Digifact error responses

Digifact returns errors in a REQUEST array, not at the top level:

{
  "REQUEST": [
    {
      "Mensaje": "El NIT 000017677254, no cuenta con acceso API",
      "Codigo": "1",
      "Procesador": "Digifact",
      "Descripcion": "NIT sin acceso API",
      "Fecha": "2026-06-12"
    }
  ]
}

extractFelProviderErrorDetails() in apps/backend/src/fel/domain/fel-provider-error.utils.ts parses this and surfaces REQUEST[0].Mensaje as the details field in the 406 response body. The PWA's formatApiErrorForToast then shows it as the toast title instead of the generic "FEL API request failed".

Diagnosing a 406 on production

# 1. Check what the backend actually returned
curl -s -w "\nHTTP %{http_code}\n" \
  -X POST 'https://api.flowandgrow.tech/fel/get-shared-info' \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{"businessId":"51ebb168-fcf1-4f9d-9428-4b28f6ffc102","data1":"SHARED_GETINFONITcom","data2":"NIT|3435555"}'

# 2. Read the response body — check the `details` field
# If details == "Error fetching shared info" → error parsing bug (Phase 1 not deployed)
# If details == "El NIT ... no cuenta con acceso API" → Digifact API access not enabled (ops issue)
# If details mentions URL / connection → check USE_RPA_FEL_API and RPA_FEL_API_URL

# 3. Check Cloud Logging for the full provider response
gcloud logging read \
  'resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-backend AND jsonPayload.message:"getSharedInfo"' \
  --limit=10 --project=barto-dev --format=json | jq '.[].jsonPayload'

Emergency URL override

If Digifact changes their production host before the next deploy, override without a code change:

# Doppler prd — emergency override
doppler secrets set DIGIFACT_API_URL=https://felgtaws.digifact.com.gt/gt.com.fel.api.v3/api \
  --project flowpos --config prd

This is only used on the direct path (USE_RPA_FEL_API=false). When routing through RPA, use RPA_FEL_API_URL instead.

Deploy order

Phase 1 (error parsing + PWA) — safe to ship independently, improves error messages immediately
Ops prerequisite — Digifact enables API access for emitter NIT 17677254 on production
Phase 2 (RPA routing) — ship after NIT access is confirmed; this is the actual fix

Rollback

If Phase 2 causes regressions, set USE_RPA_FEL_API=false in Doppler prd and redeploy. The direct path code is unchanged and will continue to work (modulo the Digifact API access requirement).

To force the test host temporarily (direct path only):

doppler secrets set DIGIFACT_API_URL=https://felgttestaws.digifact.com.gt/gt.com.fel.api.v3/api \
  --project flowpos --config prd

Problem Summary​

Error Details​

Solution 1: Code Changes (Implemented ✅)​

Updated HTTP/HTTPS Agent Configuration​

Solution 2: Network Configuration Checks​

Step 1: Verify FEL Provider Endpoint​

Step 2: Test from Cloud Run Container​

Step 3: Check Cloud Run Network Settings​

3.1 Check Egress Settings​

3.2 Configure VPC Connector (if needed)​

3.3 Check Firewall Rules​

Step 4: Configure Cloud NAT (Recommended)​

Step 5: DNS Resolution Check​

Step 6: Check for SSL/TLS Issues​

Step 7: Enable Cloud Run Logging​

Step 8: Monitor and Alert​

Checklist for Network Configuration​

Common Issues and Solutions​

Issue 1: FEL Provider Blocks Cloud Run IPs​

Issue 2: Intermittent Timeouts​

Issue 3: DNS Resolution Failures​

Issue 4: Certificate Validation Errors​

Testing the Fix​

Additional Resources​

Contact​

URL routing and 406 errors on getSharedInfo​

Root cause summary​

URL resolution (how it works after the fix)​

Emitter NIT vs certifier registry NIT​

Reading Digifact error responses​

Diagnosing a 406 on production​

Emergency URL override​

Deploy order​

Rollback​

Problem Summary

Error Details

Solution 1: Code Changes (Implemented ✅)

Updated HTTP/HTTPS Agent Configuration

Solution 2: Network Configuration Checks

Step 1: Verify FEL Provider Endpoint

Step 2: Test from Cloud Run Container

Step 3: Check Cloud Run Network Settings

3.1 Check Egress Settings

3.2 Configure VPC Connector (if needed)

3.3 Check Firewall Rules

Step 4: Configure Cloud NAT (Recommended)

Step 5: DNS Resolution Check

Step 6: Check for SSL/TLS Issues

Step 7: Enable Cloud Run Logging

Step 8: Monitor and Alert

Checklist for Network Configuration

Common Issues and Solutions

Issue 1: FEL Provider Blocks Cloud Run IPs

Issue 2: Intermittent Timeouts

Issue 3: DNS Resolution Failures

Issue 4: Certificate Validation Errors

Testing the Fix

Additional Resources

Contact

URL routing and 406 errors on `getSharedInfo`

Root cause summary

URL resolution (how it works after the fix)

Emitter NIT vs certifier registry NIT

Reading Digifact error responses

Diagnosing a 406 on production

Emergency URL override

Deploy order

Rollback