hashicorp/consul
Gossip and federation
Consul nodes find each other via Serf gossip. There are two pools: a LAN pool per datacenter that includes every agent (clients and servers), and a WAN pool that includes only servers and federates datacenters. Both are built on github.com/hashicorp/serf and github.com/hashicorp/memberlist.
Purpose
- LAN gossip — failure detection, member discovery, and lightweight event broadcast within a DC. Every node hears about every other node within seconds. Used by client agents to find a server to RPC to.
- WAN gossip — connects servers across datacenters so RPCs can be forwarded to the right DC. Plus federation state (gateway registration, intentions, CA roots) replication.
- User events — a publish/subscribe layer on top of gossip used by
consul eventandconsul exec.
Directory layout
agent/consul/
├── server_serf.go # Server-side LAN + WAN serf wiring
├── client_serf.go # Client-side LAN serf wiring
├── flood.go # Periodic re-broadcast helpers
├── filter.go # Serf event filtering
├── merge.go, merge_ce.go # Member metadata merge logic (e.g., DC validation)
├── gateway_locator.go # Pick mesh gateway for cross-DC traffic
├── leader_federation_state_ae.go # WAN federation state anti-entropy (leader loop)
├── federation_state_endpoint.go # RPC endpoints for federation state
├── federation_state_replication.go # Cross-DC FSM replication
├── wanfed/ # WAN federation forwarder for in-process RPC
agent/router/ # Server selection and routing area helpers
agent/pool/ # TCP connection pool used over gossip-discovered addresses
internal/gossip/ # Optional gossip-related code (e.g., consul-specific encryption)Key abstractions
| Type | File | Purpose |
|---|---|---|
*serf.Serf |
agent/consul/server_serf.go |
One per pool (LAN, WAN); owns the gossip state machine |
Server.handleSerfEvents |
agent/consul/server_serf.go |
Translates Serf events into Raft membership and routing updates |
agent/router.Router |
agent/router/router.go |
Routing area + server preference (per-DC, per-area) |
pool.ConnPool |
agent/pool/pool.go |
TCP+TLS connection pool for outbound RPC |
wanfed.Transport |
agent/consul/wanfed/wanfed.go |
Forwards RPCs across DCs over the WAN connection pool |
FederationStateRequest |
agent/structs/federation_state.go |
Replicated record holding mesh-gateway addresses per DC |
gateway_locator |
agent/consul/gateway_locator.go |
Picks a mesh gateway target for cross-DC forwarding |
Lifecycle of a join
sequenceDiagram
participant New as New agent
participant Bootstrap as -retry-join target
participant LAN as LAN serf members
New->>Bootstrap: Serf join (gossip ping)
Bootstrap-->>New: alive members + cluster meta
New->>LAN: gossip its presence (UDP + TCP)
LAN-->>New: heartbeat, suspicion, refute cycles
New->>Bootstrap: server discovery (consul.LANEvent)
Note over New: agent caches server addresses<br/>via agent/router/
Bootstrap->>LAN: announce new node (Member event)
LAN-->>LAN: each node updates router.RouterMulti-datacenter routing
When an RPC targets a non-local DC:
agent/consul/rpc.gochecks the request'sDatacenterfield.- If non-local, it resolves a server in that DC via the WAN serf member list.
wanfed.Transportopens a TLS connection (using the WAN address from federation state), forwards the request, and returns the response.- If a mesh gateway is required (when WAN federation runs over mesh gateways instead of direct WAN),
gateway_locatorpicks one.
Federation state
Federation state is a per-DC record stored as a Raft log entry containing:
- The DC's mesh gateway addresses.
- Replication tokens for cross-DC ACL replication.
- Other ops-level metadata.
The leader periodically pushes its DC's federation state to the primary DC, and pulls every other DC's state back. Code: agent/consul/leader_federation_state_ae.go. This is essentially anti-entropy at a DC granularity.
User events
agent/userevent.go and agent/consul/event_endpoint.go (plus the operator-side command/event/) use Serf's user-event broadcast:
- Events are tagged with name, payload, node filter, service filter, tag filter.
- They propagate via gossip and are de-duplicated using a leader-emitted identifier.
- Used by
consul execfor fan-out remote command execution and by the now-discouragedconsul eventworkflow.
Failure detection
Serf uses the SWIM-derived gossip protocol. Operationally:
- Suspicion windows are governed by
gossip_lanandgossip_wanagent config blocks. - Dead members are kept around for a tombstone window so race conditions during reconnect don't oscillate.
- Rate limits (gossip nodes per packet, retransmit multiplier) are tunable but the defaults are usually correct.
Encryption
Optional symmetric encryption keys, generated by consul keygen, are propagated through the keyring (agent/keyring.go, command/keyring/). Multiple keys can co-exist during rotation. Encryption is per-pool: LAN, WAN, and any user-defined Enterprise areas can have their own key.
Integration points
- Autopilot (
agent/consul/autopilot.go) consumes Serf events to detect dead servers and recommend removal from Raft. - Anti-entropy uses the LAN pool to reach a server quickly for catalog reconciliation.
- xDS capacity (
agent/consul/xdscapacity/) tracks how many proxies each server is feeding so the WAN forwarding layer can balance. - Cluster peering is the alternative to WAN gossip-based federation; new deployments are encouraged to use peering for cross-cluster connectivity. See Cluster peering.
Entry points for modification
- Tune gossip parameters: see
agent/config/source_default.go(gossip_lan,gossip_wanblocks). - Add a new event type: Serf user events for ops broadcasts; otherwise prefer streaming subscribe over gossip for new features.
- Change cross-DC routing: look at
wanfed/for the transport,gateway_locator.gofor the selection logic. - Add a member metadata field: extend
merge.goand ensure all servers tolerate the field via the merge function.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.