Factory.ai

Open-Source Wikis

/

Grafana

/

Backend

/

Alerting

grafana/grafana

Alerting (ngalert)

pkg/services/ngalert/ is the unified alerting backend introduced in Grafana 8 (June 2021) and now the only supported alerting engine. The package name ngalert ("next-gen alert") is a historical artifact — there is no other engine.

What ngalert does

  • Stores alert rules (a query + condition + interval).
  • Evaluates rules on schedule via the schedule package.
  • Computes per-rule alert state and stores it in state.
  • Routes firing/resolved alerts through a built-in (or remote) Alertmanager — notifier.
  • Exposes both Grafana-native and Prometheus-compatible APIs.

Layout

pkg/services/ngalert/
├── ngalert.go               # Top-level service constructor (~3.6k LOC)
├── api/                     # /api/v1/provisioning, /api/ruler, /api/alertmanager
├── eval/                    # Rule evaluation (data → result)
├── schedule/                # Scheduler that ticks rules at their interval
├── state/                   # Alert state machine + history
├── store/                   # SQL store (rules, am config, instance state)
├── models/                  # Shared types
├── notifier/                # Alertmanager (built-in + remote variants)
├── remote/                  # Remote Alertmanager client
├── sender/                  # Sends alerts to Alertmanager
├── provisioning/            # File-based provisioning
├── prom/                    # Prometheus-style API compatibility
├── image/                   # Screenshot rendering for notifications
├── writer/                  # Recording rule writer (to remote write)
├── lokiconfig/              # Loki config helpers (history backend)
├── backtesting/             # Replay rules against historical data
├── cluster/                 # HA coordination
├── metrics/                 # Prometheus metrics
└── limits.go                # Rule limits & quotas

High-level flow

graph TD
    Schedule[schedule<br/>scheduler.Run] -->|tick| Eval[eval<br/>EvaluateRule]
    Eval -->|query datasources| Plugins[Plugin host]
    Eval --> StateMgr[state<br/>StateManager]
    StateMgr --> Store[store<br/>instance state DB]
    StateMgr -->|state changes| Sender[sender]
    Sender --> AM[Alertmanager<br/>notifier or remote]
    AM --> Notifiers[Contact points<br/>email, slack, pd, ...]
    StateMgr -->|history| Loki[(Optional Loki<br/>history backend)]

pkg/services/ngalert/schedule/ is the heart of the engine: it owns one scheduler per organization that wakes up at a fixed tick (default 10s), pulls rules due for evaluation, and dispatches each to an evaluator goroutine. The evaluator runs the rule's queries through pkg/services/query/ (which talks to the plugin host), evaluates the condition, and produces a set of alert instances.

The state manager in pkg/services/ngalert/state/ maintains the per-instance state machine (PendingFiringNormalNoData / Error) and writes transitions to the database and to the Alertmanager.

Alertmanager

notifier/ embeds Prometheus' Alertmanager directly so that Grafana can serve as a complete alerting pipeline. Each org has its own AM config (routing tree + receivers) stored in the database.

For users who prefer to run their own external Alertmanager, the remote/ and sender/ packages forward alerts via HTTP to a remote Alertmanager and proxy the API back through Grafana for the UI.

Provisioning

pkg/services/ngalert/provisioning/ implements the file- and API-based provisioning of rules, contact points, mute timings, notification policies, and templates. The /api/v1/provisioning/* endpoints are the canonical interface for IaC tools (e.g. Terraform).

Provisioned objects can be marked disable_provenance: true (or false, which makes them read-only in the UI) — provenance tracking is in pkg/services/ngalert/provisioning/provenance.go.

Recording rules

Recording rules — pre-computed queries written to a remote-write endpoint — are evaluated by the same scheduler and emitted via pkg/services/ngalert/writer/ to a Prometheus-compatible target.

Multi-tenancy and HA

ngalert respects orgs: each org has its own rule set, AM config, and metrics. The HA cluster code in pkg/services/ngalert/cluster/ lets multiple Grafana replicas share the load using gossip (memberlist) so that each rule is evaluated exactly once per tick across the cluster.

Backtesting

backtesting/ runs rules against arbitrary historical data ranges to preview alert behavior before a rule is committed. Used by the rule editor's "Preview" feature.

API surface

  • /api/v1/provisioning/* — Grafana-native provisioning API.
  • /api/ruler/* — Prometheus-compatible Ruler API for rule CRUD.
  • /api/alertmanager/* — Alertmanager API (silences, status, …).
  • /api/v1/eval — Evaluate a query+condition without saving.
  • /api/v1/rules/*, /api/v1/alerts — Prometheus-compatible read API.

All of these mount through api/api.go and check RBAC actions like alert.rules:read, alert.rules:write, alert.notifications:read defined in pkg/services/ngalert/accesscontrol/.

Frontend counterpart

The unified alerting UI lives in public/app/features/alerting/unified/ — that directory has its own AGENTS.md and is one of the largest features in the repo. It uses RTK Query against the APIs above.

Key source files

File Purpose
pkg/services/ngalert/ngalert.go Service constructor + DI assembly
pkg/services/ngalert/schedule/schedule.go Per-org scheduler
pkg/services/ngalert/eval/ Rule evaluation
pkg/services/ngalert/state/manager.go State machine
pkg/services/ngalert/notifier/ Embedded Alertmanager
pkg/services/ngalert/remote/ Remote Alertmanager client
pkg/services/ngalert/provisioning/ Provisioning + provenance
pkg/services/ngalert/api/ HTTP handlers
apps/alerting/ Newer app-platform resources for alerting

Where to start when modifying alerting

  • Add a new contact-point type: implement the receiver in pkg/services/ngalert/notifier/channels/ (re-export from the embedded Alertmanager fork) and update the frontend's contact-point editor.
  • Change rule evaluation behavior: read schedule/schedule.go and the eval pipeline in eval/.
  • Tune the state machine: state/manager.go and state/state.go.
  • Add a new provisioning endpoint: edit provisioning/<resource>.go and register the route in api/.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Alerting – Grafana wiki | Factory