Open-Source Wikis

/

Consul

/

Systems

/

Gossip and federation

hashicorp/consul

Gossip and federation

Consul nodes find each other via Serf gossip. There are two pools: a LAN pool per datacenter that includes every agent (clients and servers), and a WAN pool that includes only servers and federates datacenters. Both are built on github.com/hashicorp/serf and github.com/hashicorp/memberlist.

Purpose

  • LAN gossip — failure detection, member discovery, and lightweight event broadcast within a DC. Every node hears about every other node within seconds. Used by client agents to find a server to RPC to.
  • WAN gossip — connects servers across datacenters so RPCs can be forwarded to the right DC. Plus federation state (gateway registration, intentions, CA roots) replication.
  • User events — a publish/subscribe layer on top of gossip used by consul event and consul exec.

Directory layout

agent/consul/
├── server_serf.go              # Server-side LAN + WAN serf wiring
├── client_serf.go              # Client-side LAN serf wiring
├── flood.go                    # Periodic re-broadcast helpers
├── filter.go                   # Serf event filtering
├── merge.go, merge_ce.go       # Member metadata merge logic (e.g., DC validation)
├── gateway_locator.go          # Pick mesh gateway for cross-DC traffic
├── leader_federation_state_ae.go  # WAN federation state anti-entropy (leader loop)
├── federation_state_endpoint.go   # RPC endpoints for federation state
├── federation_state_replication.go # Cross-DC FSM replication
├── wanfed/                     # WAN federation forwarder for in-process RPC
agent/router/                   # Server selection and routing area helpers
agent/pool/                     # TCP connection pool used over gossip-discovered addresses
internal/gossip/                # Optional gossip-related code (e.g., consul-specific encryption)

Key abstractions

Type File Purpose
*serf.Serf agent/consul/server_serf.go One per pool (LAN, WAN); owns the gossip state machine
Server.handleSerfEvents agent/consul/server_serf.go Translates Serf events into Raft membership and routing updates
agent/router.Router agent/router/router.go Routing area + server preference (per-DC, per-area)
pool.ConnPool agent/pool/pool.go TCP+TLS connection pool for outbound RPC
wanfed.Transport agent/consul/wanfed/wanfed.go Forwards RPCs across DCs over the WAN connection pool
FederationStateRequest agent/structs/federation_state.go Replicated record holding mesh-gateway addresses per DC
gateway_locator agent/consul/gateway_locator.go Picks a mesh gateway target for cross-DC forwarding

Lifecycle of a join

sequenceDiagram
    participant New as New agent
    participant Bootstrap as -retry-join target
    participant LAN as LAN serf members

    New->>Bootstrap: Serf join (gossip ping)
    Bootstrap-->>New: alive members + cluster meta
    New->>LAN: gossip its presence (UDP + TCP)
    LAN-->>New: heartbeat, suspicion, refute cycles
    New->>Bootstrap: server discovery (consul.LANEvent)
    Note over New: agent caches server addresses<br/>via agent/router/
    Bootstrap->>LAN: announce new node (Member event)
    LAN-->>LAN: each node updates router.Router

Multi-datacenter routing

When an RPC targets a non-local DC:

  1. agent/consul/rpc.go checks the request's Datacenter field.
  2. If non-local, it resolves a server in that DC via the WAN serf member list.
  3. wanfed.Transport opens a TLS connection (using the WAN address from federation state), forwards the request, and returns the response.
  4. If a mesh gateway is required (when WAN federation runs over mesh gateways instead of direct WAN), gateway_locator picks one.

Federation state

Federation state is a per-DC record stored as a Raft log entry containing:

  • The DC's mesh gateway addresses.
  • Replication tokens for cross-DC ACL replication.
  • Other ops-level metadata.

The leader periodically pushes its DC's federation state to the primary DC, and pulls every other DC's state back. Code: agent/consul/leader_federation_state_ae.go. This is essentially anti-entropy at a DC granularity.

User events

agent/userevent.go and agent/consul/event_endpoint.go (plus the operator-side command/event/) use Serf's user-event broadcast:

  • Events are tagged with name, payload, node filter, service filter, tag filter.
  • They propagate via gossip and are de-duplicated using a leader-emitted identifier.
  • Used by consul exec for fan-out remote command execution and by the now-discouraged consul event workflow.

Failure detection

Serf uses the SWIM-derived gossip protocol. Operationally:

  • Suspicion windows are governed by gossip_lan and gossip_wan agent config blocks.
  • Dead members are kept around for a tombstone window so race conditions during reconnect don't oscillate.
  • Rate limits (gossip nodes per packet, retransmit multiplier) are tunable but the defaults are usually correct.

Encryption

Optional symmetric encryption keys, generated by consul keygen, are propagated through the keyring (agent/keyring.go, command/keyring/). Multiple keys can co-exist during rotation. Encryption is per-pool: LAN, WAN, and any user-defined Enterprise areas can have their own key.

Integration points

  • Autopilot (agent/consul/autopilot.go) consumes Serf events to detect dead servers and recommend removal from Raft.
  • Anti-entropy uses the LAN pool to reach a server quickly for catalog reconciliation.
  • xDS capacity (agent/consul/xdscapacity/) tracks how many proxies each server is feeding so the WAN forwarding layer can balance.
  • Cluster peering is the alternative to WAN gossip-based federation; new deployments are encouraged to use peering for cross-cluster connectivity. See Cluster peering.

Entry points for modification

  • Tune gossip parameters: see agent/config/source_default.go (gossip_lan, gossip_wan blocks).
  • Add a new event type: Serf user events for ops broadcasts; otherwise prefer streaming subscribe over gossip for new features.
  • Change cross-DC routing: look at wanfed/ for the transport, gateway_locator.go for the selection logic.
  • Add a member metadata field: extend merge.go and ensure all servers tolerate the field via the merge function.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Gossip and federation – Consul wiki | Factory