SRE Manual

GitHubEdit on GitHub

This page introduces observability and operational entry points for Univer services, including metrics, logs, and common diagnostics.

Observability

Univer provides:

  • Prometheus metrics
  • Service logs (stdout + persistent volumes)

If you enable the built-in observability stack, you can view dashboards in Grafana. Otherwise, export dashboards into your own system.

Metrics

Built-in Grafana dashboards cover:

  • Infrastructure: RabbitMQ, Redis, RDS
  • Collaboration: collaboration-server
  • Core service: universer
  • Import/export: exchange

You can use these dashboards to monitor performance, availability, and business KPIs. For external systems, download dashboards here:

https://release-univer.oss-cn-shenzhen.aliyuncs.com/release/univer-grafana-dashboards.tar.gz

Dashboard Samples

  1. Golden signals (QPS, SLI, error distribution)

Golden signals

  1. Infrastructure performance & availability

RDS metrics

Redis & MQ metrics

  1. Collaboration metrics

Collaboration metrics

  1. USIP integration metrics

USIP metrics

Service Logs

Docker Compose

Logs are written to stdout and persistent volumes (kept for up to 30 days or 1GB).

  • universer: /logs/universer/
  • exchange: /logs/worker-exchange/

K8s / Loki

With built-in observability, you can query logs in Grafana -> Explore using Loki:

Loki search

Common Diagnostics

  1. Feature outage: check SLI for RDS/MQ/Redis
  2. API errors: inspect universer SLI and error codes
  3. Log search: query biz_code=xxx in Loki
  4. Capacity issues: monitor QPS, latency, CPU, and memory

How is this guide?