Skip to main content

FlowPOS Public Status Page

Public URL: https://flowandgrow.tech/status


Architecture

The status page uses a hybrid probe + manual incident model:

  • Automated probes drive the colored component grid and 90-day uptime timeline.
  • Ops-authored incidents provide the human narrative (investigating → identified → monitoring → resolved).
Probes (BullMQ, per-interval)
└─► hysteresis evaluator (3-fail / 2-pass)
└─► status_component.current_status (override wins if not expired)
├─► GCS snapshot (on state transition only) ──► CDN ──► /status page
├─► Socket.IO /status namespace ──► PWA StatusBanner
└─► status_audit_log

A GCS snapshot (flowpos-status-snapshots/current.json) is written only on state transitions (not every probe). The landing page reads this snapshot first, falling back to the live API. This means the status page remains visible even when the backend is down.


Adding a new component

  1. Add a row to status_component (via migration or admin API).
  2. Add one or more rows to status_component_probe with the appropriate probe_type.
  3. If needed, add a new probe strategy (see probe-types.md).

Probe types and default intervals:

Probe typeDefault intervalUse case
http_get60sHTTP health check on a URL
db_query60sPostgreSQL SELECT 1
redis_ping60sRedis PING
vendor_rss300sIngest third-party status RSS
passiven/aHeartbeat from external client

Authoring an incident (v1 — REST)

All admin routes require the MANAGE_STATUS CASL ability (mapped to OWNER + ADMIN roles).

1. Create a draft

POST /api/v1/admin/status/incidents
Authorization: Bearer <token>
Content-Type: application/json

{
"title": "Elevated error rate on Stripe Payments",
"severity": "major",
"componentIds": ["<stripe-component-id>"]
}

2. Publish (fans out emails + Socket.IO push)

POST /api/v1/admin/status/incidents/<id>/publish
Authorization: Bearer <token>

3. Add updates as the incident progresses

POST /api/v1/admin/status/incidents/<id>/updates
Authorization: Bearer <token>
Content-Type: application/json

{
"state": "identified",
"bodyMd": "We have identified the root cause as a misconfigured webhook endpoint. A fix is being deployed."
}

4. Resolve

POST /api/v1/admin/status/incidents/<id>/resolve
Authorization: Bearer <token>

Valid state transitions: investigating → identified → monitoring → resolved.


Manual override

Pin a component to a specific status (e.g., during planned maintenance):

PATCH /api/v1/admin/status/components/<id>/override
Authorization: Bearer <token>
Content-Type: application/json

{
"status": "maintenance",
"reason": "Planned database upgrade",
"expiresAt": "2026-04-26T03:00:00Z"
}

expiresAt is required and must be ≤ 24h in the future. The hourly expiry job clears it automatically.

Clear early:

DELETE /api/v1/admin/status/components/<id>/override

Feeds

FormatURL
RSS 2.0/api/v1/status/feed.rss
JSON/api/v1/status/summary.json

Audit log

Every status transition, override set/clear, and incident lifecycle event writes an append-only row to status_audit_log. The table has REVOKE UPDATE, DELETE so the application cannot mutate history.

Query:

GET /api/v1/admin/status/audit?entityType=component&entityId=<id>

Data retention

DataRetention
Raw status_health_check rows25 hours (daily prune job)
status_uptime_bucket hourly aggregates90 days
status_audit_logPermanent (append-only)

v2 deferrals

  • Admin authoring UI — v1 ops uses REST/curl.
  • Spanish localization — schema includes language column; templates add .es.html in v2.
  • Auto-detect drafts — probes never auto-create incidents in v1; auto_detected column reserved.
  • Independent hosting — landing page stays on Cloud Run for v1; GCS snapshot covers the outage case.