Horizontal Scaling

Horizontal Scaling of the Builder Vault TSM

Horizontal scaling is achieved by running multiple instances of the same MPC node for a given player and distributing requests across them, improving both throughput and fault tolerance. The key challenge is that all state for an MPC session is held in memory, so a session must be handled by the same node throughout its lifetime.

The Builder Vault TSM supports four horizontal scaling strategies:

StrategyLoad Balancer RequiredSession AffinityScale Granularity
Replicated TSMNoN/A (SDK-managed)Full TSM
Message Broker CommunicationStandard (round-robin)Handled by brokerIndividual node
Direct CommunicationSession-affinity capableHTTP header or query paramIndividual node
Intra-Node RoutingStandard (round-robin)Handled by nodesIndividual node
📘

DKLS23

If you use the DKLS23 protocol and do horizontal scaling with more than five instances of the same MPC node, then make sure to set the DeactivatedPlayersCache = databasein the MPC node configuration files in the [DKLS23] section as explained here.

Replicated TSM

The simplest approach. You deploy multiple complete TSMs, where all nodes of a player share the same database. Any SDK can connect to any TSM and access the same keys.

How to scale: Add or remove entire TSMs. You cannot add a single node independently.

Trade-offs:

  • Simple to set up with no load balancer required.
  • The SDK is aware of multiple TSMs; your application must decide which one to use.
  • Scaling requires coordinating all players simultaneously, which can be difficult when players are controlled by different organizations.

Replicated MPC Nodes with Message Broker Communication

If your MPC nodes communicate via a message broker (such as Redis or AMQP), you can place a standard round-robin load balancer (e.g., HAProxy) in front of multiple nodes representing the same player. Because all inter-node MPC communication goes through the broker, a session always continues on the node that started it — no special session affinity configuration is needed on the load balancer.

How to scale: Add or remove individual node instances at any time, independently of other players.

Trade-offs:

  • The SDK sees a single TSM endpoint, with no awareness of the underlying instances.
  • No additional MPC node configuration is required.
  • Requires a message broker as part of your infrastructure.

Replicated MPC Nodes with Direct Communication

When nodes communicate directly rather than through a message broker, session affinity must be handled by the load balancer. Since most load balancers can only apply affinity on a single port, MPC communication must be routed through the SDK port rather than a separate MPC port.

Configuration — enable MPC over the SDK port:

[SDKServer]
MPCWebSocketPath = "/mpc"

Also configure the other nodes to connect to this player via WebSocket.

Configure the load balancer to apply session affinity based on one of the following:

  1. The MPC-RoutingID HTTP header
  2. The mpc_routing_id HTTP query parameter

Trade-offs:

  • The SDK sees a single TSM endpoint.
  • Requires a load balancer that supports header- or query-parameter-based session affinity.
  • MPC and SDK traffic must share the same port.

Replicated MPC Nodes with Intra-Node Routing

If you cannot use a message broker and cannot route MPC traffic over the SDK port, nodes can manage session affinity themselves. In multi-instance mode, each node checks incoming requests and forwards any that belong to a different node's session.

How to scale: Add or remove individual node instances at any time, independently of other players.

Trade-offs:

  • The SDK sees a single TSM endpoint.
  • No special load balancer configuration required — a standard round-robin balancer works.
  • The intra-node forwarding adds overhead; do not expect the same throughput as the other approaches.

Configuration:

[MultiInstance]

# IP address used to reach this instance from other instances.
# If unset, an address is auto-detected, which may not be correct
# when multiple network interfaces are present.
Address = ""

# SDK port announced to other instances.
# Defaults to the port defined in [SDKServer].
SDKPort = 0

# MPC port announced to other instances.
# Defaults to the port defined in [MPCTCPServer].
MPCPort = 0

# How often to run the cleanup job that removes stale routing entries.
CleanupInterval = "5m"

# Probability (0–100) that the cleanup job runs on each interval.
# 0 = never, 100 = always.
CleanupProbability = 25

Contact Blockdaemon for performance benchmarks and guidance on choosing the right strategy for your deployment.



What’s Next