Skip to main content

FEL Network Troubleshooting Guide

Problem Summary

The FEL service is experiencing ETIMEDOUT errors when trying to connect to external FEL provider endpoints. This indicates network-level connectivity issues between Google Cloud Run and the FEL provider's servers.

Error Details

  • Error Code: ETIMEDOUT
  • HTTP Status: 502 Bad Gateway
  • Message: "Network timeout - Could not reach FEL provider"
  • Observed Behavior: Request times out in ~550ms (much faster than the configured 30s timeout)

Solution 1: Code Changes (Implemented ✅)

Updated HTTP/HTTPS Agent Configuration

The fel.modules.ts has been updated with explicit TCP socket timeouts and connection management:

HttpModule.register({
timeout: 30000,
maxRedirects: 5,
httpAgent: new http.Agent({
keepAlive: true,
keepAliveMsecs: 30000,
timeout: 30000,
scheduling: "lifo",
}),
httpsAgent: new https.Agent({
keepAlive: true,
keepAliveMsecs: 30000,
timeout: 30000,
scheduling: "lifo",
rejectUnauthorized: true,
}),
})

Benefits:

  • Explicit socket-level timeout configuration
  • Connection keepAlive for better performance
  • LIFO scheduling for improved request handling

Solution 2: Network Configuration Checks

Step 1: Verify FEL Provider Endpoint

First, verify that the FEL provider's endpoint is accessible:

# Check if the FEL provider endpoint is reachable
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo?NIT=000017195594&DATA1=SHARED_GETINFONITcom&DATA2=NIT|17195594&USERNAME=<USERNAME> \
-H "Authorization: Bearer <TOKEN>"

Step 2: Test from Cloud Run Container

Deploy a debug container to test connectivity from within Cloud Run:

# Deploy a debug container
gcloud run deploy debug-container \
--image=gcr.io/google.com/cloudsdktool/cloud-sdk:alpine \
--region=us-central1 \
--project=barto-dev \
--command=/bin/sh \
--args=-c,"sleep 3600"

# Execute a command in the running container
gcloud run services proxy debug-container --region=us-central1 --project=barto-dev

Then test connectivity:

# From inside the container
apk add curl
curl -v https://<FEL_PROVIDER_ENDPOINT>/sharedInfo

Step 3: Check Cloud Run Network Settings

3.1 Check Egress Settings

Verify that Cloud Run has proper egress configuration:

# Check current Cloud Run service configuration
gcloud run services describe flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--format=json | jq '.spec.template.spec.containers[0]'

3.2 Configure VPC Connector (if needed)

If the FEL provider requires VPC connectivity:

# Create a VPC connector
gcloud compute networks vpc-access connectors create fel-connector \
--region=us-central1 \
--network=default \
--range=10.8.0.0/28 \
--project=barto-dev

# Update Cloud Run service to use the VPC connector
gcloud run services update flowpos-backend \
--vpc-connector=fel-connector \
--vpc-egress=all-traffic \
--region=us-central1 \
--project=barto-dev

3.3 Check Firewall Rules

Ensure there are no firewall rules blocking outbound traffic:

# List firewall rules
gcloud compute firewall-rules list --project=barto-dev

# If needed, create a rule to allow outbound traffic
gcloud compute firewall-rules create allow-fel-outbound \
--direction=EGRESS \
--priority=1000 \
--network=default \
--action=ALLOW \
--rules=tcp:443,tcp:80 \
--destination-ranges=0.0.0.0/0 \
--project=barto-dev

Cloud NAT provides a stable outbound IP address that can be whitelisted by the FEL provider:

# Create a Cloud Router
gcloud compute routers create fel-router \
--network=default \
--region=us-central1 \
--project=barto-dev

# Create a Cloud NAT configuration
gcloud compute routers nats create fel-nat \
--router=fel-router \
--region=us-central1 \
--auto-allocate-nat-external-ips \
--nat-all-subnet-ip-ranges \
--enable-logging \
--project=barto-dev

# Get the allocated NAT IP addresses
gcloud compute routers describe fel-router \
--region=us-central1 \
--project=barto-dev \
--format="get(nats[0].natIps)"

Provide these IP addresses to the FEL provider for whitelisting.

Step 5: DNS Resolution Check

Verify DNS resolution works correctly:

# Check DNS resolution for FEL provider
nslookup <FEL_PROVIDER_DOMAIN>
dig <FEL_PROVIDER_DOMAIN>

# From Cloud Run (if possible)
gcloud run services proxy flowpos-backend --region=us-central1 --project=barto-dev
# Then inside the container:
nslookup <FEL_PROVIDER_DOMAIN>

Step 6: Check for SSL/TLS Issues

If the FEL provider uses custom certificates:

# Test SSL certificate
openssl s_client -connect <FEL_PROVIDER_ENDPOINT>:443 -servername <FEL_PROVIDER_DOMAIN>

# Check certificate validity
curl -v https://<FEL_PROVIDER_ENDPOINT>

Step 7: Enable Cloud Run Logging

Ensure detailed logging is enabled to capture network issues:

# Update Cloud Run service with more verbose logging
gcloud run services update flowpos-backend \
--region=us-central1 \
--project=barto-dev \
--set-env-vars="LOG_LEVEL=debug"

Step 8: Monitor and Alert

Set up monitoring for FEL endpoint availability:

# Create an uptime check in Cloud Monitoring
gcloud monitoring uptime create fel-provider-check \
--resource-type=uptime-url \
--host=<FEL_PROVIDER_DOMAIN> \
--path=/sharedInfo \
--project=barto-dev

Checklist for Network Configuration

  • Verify FEL provider endpoint is accessible from your location
  • Test connectivity from Cloud Run container
  • Check Cloud Run egress settings
  • Verify firewall rules allow outbound traffic
  • Configure VPC connector if private network access is needed
  • Set up Cloud NAT for stable outbound IP
  • Provide NAT IP addresses to FEL provider for whitelisting
  • Verify DNS resolution works correctly
  • Check SSL/TLS certificate validity
  • Enable detailed logging
  • Set up uptime monitoring and alerts

Common Issues and Solutions

Issue 1: FEL Provider Blocks Cloud Run IPs

Solution: Use Cloud NAT to provide a stable outbound IP and have it whitelisted.

Issue 2: Intermittent Timeouts

Solution:

  • Enable keepAlive connections (already implemented)
  • Increase retry attempts with exponential backoff
  • Use connection pooling

Issue 3: DNS Resolution Failures

Solution:

  • Use Cloud DNS for reliable DNS resolution
  • Add custom DNS configuration to Cloud Run

Issue 4: Certificate Validation Errors

Solution:

  • Ensure the FEL provider uses valid SSL certificates
  • If using self-signed certificates, configure trust store

Testing the Fix

After implementing the changes:

  1. Test from Postman/curl:
curl --location --request POST 'https://flowpos-backend-723334209984.us-central1.run.app/fel/get-shared-info' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_TOKEN>' \
--data '{
"businessId": "097f8743-a317-4169-a793-c2a0db8fba2b",
"data1": "SHARED_GETINFONITcom",
"data2": "NIT|17195594"
}'
  1. Monitor logs:
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-backend AND severity>=ERROR" \
--limit=50 \
--project=barto-dev \
--format=json
  1. Check for successful responses:
  • Look for HTTP 200 status codes
  • Verify no ETIMEDOUT errors in logs
  • Confirm FEL provider returns expected data

Additional Resources

Contact

If issues persist after following this guide:

  1. Check with the FEL provider for any service outages
  2. Review their API documentation for any network requirements
  3. Contact their support to verify your IP addresses are whitelisted
  4. Check their rate limiting policies

URL routing and 406 errors on getSharedInfo

This section covers the June 2026 production incident where POST /fel/get-shared-info returned 406 for business 51ebb168 with NIT|3435555.

Root cause summary

Three compounding defects:

#DefectSymptom
1getSharedInfo ignored USE_RPA_FEL_APIAlways hit direct Digifact, never the RPA proxy
2Direct Digifact URL hardcoded to test hostWrong endpoint even without the proxy
3Catch block read errorDetails?.Mensaje; Digifact returns REQUEST[0].Mensaje"Error fetching shared info" instead of the real provider message

URL resolution (how it works after the fix)

POST /fel/get-shared-info


configService.get("USE_RPA_FEL_API") === "true"?

YES │ NO
┌────┘ └────────────────────────────────────────┐
▼ ▼
getSharedInfoViaRpaFelApi() getCertifierApiUrl(certifier, NODE_ENV)
→ ProviderRpaFelApiService + optional DIGIFACT_API_URL override
→ ${baseUrl}/QueryPayerInfo │
│ ▼
│ baseUrl resolved by: NODE_ENV=production|beta → felgtaws.digifact.com.gt
│ RPA_FEL_API_URL override anything else → felgttestaws.digifact.com.gt
│ or NODE_ENV=production
│ → fel.rpapos.com/api/fel
│ else
│ → fel-dev.rpapos.com/api/fel

Doppler prd: USE_RPA_FEL_API=true, NODE_ENV=production → RPA path, fel.rpapos.com.

Doppler stg: same flag values → RPA path, fel.rpapos.com.

Emitter NIT vs certifier registry NIT

These are different NITs and the distinction matters when diagnosing provider errors.

FieldWhere it livesPurpose
Emitter NITbusiness.tax_idThe merchant's tax ID — who is issuing the document
Certifier NITfel_certifier.nitDigifact's own SAT-registered NIT

The error "El NIT 000017677254, no cuenta con acceso API" refers to the emitter NIT (17677254). Digifact's API requires the issuing merchant to be explicitly enabled for API access on their platform. This is separate from having a valid token.

When you see this error:

  1. The token and URL are correct.
  2. Digifact has not enabled API access for that specific emitter NIT on production.
  3. Fix: contact Digifact support and ask them to enable API access for emitter NIT 17677254 on felgtaws.digifact.com.gt.

Reading Digifact error responses

Digifact returns errors in a REQUEST array, not at the top level:

{
"REQUEST": [
{
"Mensaje": "El NIT 000017677254, no cuenta con acceso API",
"Codigo": "1",
"Procesador": "Digifact",
"Descripcion": "NIT sin acceso API",
"Fecha": "2026-06-12"
}
]
}

extractFelProviderErrorDetails() in apps/backend/src/fel/domain/fel-provider-error.utils.ts parses this and surfaces REQUEST[0].Mensaje as the details field in the 406 response body. The PWA's formatApiErrorForToast then shows it as the toast title instead of the generic "FEL API request failed".

Diagnosing a 406 on production

# 1. Check what the backend actually returned
curl -s -w "\nHTTP %{http_code}\n" \
-X POST 'https://api.flowandgrow.tech/fel/get-shared-info' \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{"businessId":"51ebb168-fcf1-4f9d-9428-4b28f6ffc102","data1":"SHARED_GETINFONITcom","data2":"NIT|3435555"}'

# 2. Read the response body — check the `details` field
# If details == "Error fetching shared info" → error parsing bug (Phase 1 not deployed)
# If details == "El NIT ... no cuenta con acceso API" → Digifact API access not enabled (ops issue)
# If details mentions URL / connection → check USE_RPA_FEL_API and RPA_FEL_API_URL

# 3. Check Cloud Logging for the full provider response
gcloud logging read \
'resource.type=cloud_run_revision AND resource.labels.service_name=flowpos-backend AND jsonPayload.message:"getSharedInfo"' \
--limit=10 --project=barto-dev --format=json | jq '.[].jsonPayload'

Emergency URL override

If Digifact changes their production host before the next deploy, override without a code change:

# Doppler prd — emergency override
doppler secrets set DIGIFACT_API_URL=https://felgtaws.digifact.com.gt/gt.com.fel.api.v3/api \
--project flowpos --config prd

This is only used on the direct path (USE_RPA_FEL_API=false). When routing through RPA, use RPA_FEL_API_URL instead.

Deploy order

  1. Phase 1 (error parsing + PWA) — safe to ship independently, improves error messages immediately
  2. Ops prerequisite — Digifact enables API access for emitter NIT 17677254 on production
  3. Phase 2 (RPA routing) — ship after NIT access is confirmed; this is the actual fix

Rollback

If Phase 2 causes regressions, set USE_RPA_FEL_API=false in Doppler prd and redeploy. The direct path code is unchanged and will continue to work (modulo the Digifact API access requirement).

To force the test host temporarily (direct path only):

doppler secrets set DIGIFACT_API_URL=https://felgttestaws.digifact.com.gt/gt.com.fel.api.v3/api \
--project flowpos --config prd