Open-Source Wikis

/

Consul

/

How to contribute

/

Debugging

hashicorp/consul

Debugging

How to figure out what a running Consul cluster is doing.

Logging

  • The agent uses github.com/hashicorp/go-hclog. Configuration lives in logging/ and is wired in agent/setup.go.
  • Log level is configurable via the log_level agent config (or -log-level=). Common values: trace, debug, info, warn, error.
  • Per-subsystem loggers are obtained with logger.Named("rpc"), logger.Named("xds"), etc. Search the codebase for Named( to find all the subsystem names.
  • For high-volume xDS traffic, the agent supports log dropping via agent/log-drop/ to avoid saturating disks.

consul debug

consul debug gathers a bundle of Raft state, metrics, profiles, agent config, and logs over a configurable duration. Code: command/debug/.

consul debug -duration=2m -interval=10s -archive=true

Open the archive with any tar tool. The bundle is what HashiCorp Support typically asks for.

consul troubleshoot

The troubleshoot/ Go module powers the consul troubleshoot CLI for mesh wiring problems:

consul troubleshoot upstreams      # List Envoy upstreams visible from a sidecar
consul troubleshoot proxy           # Check that an Envoy proxy is healthy
consul troubleshoot ports           # Verify mTLS port reachability

Code in troubleshoot/proxy/ and troubleshoot/connect/. CLI commands in command/troubleshoot/.

Profiling and pprof

The HTTP API exposes pprof when enable_debug=true is set on the agent. Endpoints are mounted in agent/http_register.go under /debug/pprof/.

go tool pprof http://127.0.0.1:8500/debug/pprof/profile?seconds=30

Common error patterns

Symptom Likely cause / where to look
No cluster leader Raft hasn't elected. Check agent/consul/leader.go start-up, gossip connectivity, time skew
connection refused from clients to servers RPC port (8300) blocked, or agent not started; see agent/consul/server.go:listen
RPC failed to server X repeatedly Server is down or partitioned. Autopilot logs in agent/consul/autopilot.go
permission denied on KV/catalog ACL token missing or insufficient. Trace acl/acl.go and agent/consul/acl_endpoint.go
Envoy reports xds: stream config update failed agent/xds/delta.go rejects an update; usually a stale or invalid resource
Mesh mTLS handshake errors Leaf cert expired or root rotated. See agent/leafcert/ and agent/consul/leader_connect_ca.go
intention denied A service-intentions config entry blocks the call. Inspect via consul intention list
Anti-entropy spam in logs Local services drifting from server view. Check agent/local/ and agent/ae/
peering: invalid token Peering bootstrap token mismatch; see agent/consul/peering_backend.go

Tracing slow queries

agent/blockingquery/ and agent/consul/rpc.go contain the blocking query plumbing. To see what's slow:

  • Enable enable_agent_tls_for_checks and metrics.prometheus_retention_time and scrape the agent's /v1/agent/metrics?format=prometheus.
  • Look at consul.rpc.query.* and consul.fsm.* metrics.
  • For xDS, watch consul.xds.server.streamDrained and the per-resource metrics in agent/xds/server.go.

Reproducing in a unit test

Most subsystems have a testing.go with builders. Pattern:

a := agent.NewTestAgent(t, `
    server = true
    bootstrap = true
`)
defer a.Shutdown()

// Hit the HTTP API
req := httptest.NewRequest("GET", "/v1/catalog/services", nil)
resp := httptest.NewRecorder()
a.HTTPAPI().ServeHTTP(resp, req)

Reach into the FSM directly when isolating Raft or state-store regressions:

fsm := fsm.New(...)
fsm.Apply(&raft.Log{Data: encodedRequest})
fsm.State().<index>.<getter>(...)

See Patterns and conventions for the standard ways tests construct dependencies.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Debugging – Consul wiki | Factory