Horizontal scaling can be achieved by balancing the load between many instances of each MPC node in the Builder Vault TSM. This allows for improved throughput and fail-over. We support three different types of horizontal scaling of the Builder Vault TSM:

Replicated TSM Allows for the most efficient scaling, but it is visible from the outside (i.e., from the SDKs) that there are multiple TSMs. And to scale up/down, you need to add/remove all of the node instances in a TSM instance at the same time.
Replicated MPC Nodes with Message Broker If your MPC nodes use a message broker, such as AMQP or Redis, to communicate with each other, horizontal scaling is easy. In this case, you can simply put a standard load balancer in front of an MPC node and let it route requests to a number of MPC node instances (each instance connecting to the same database). In contrast to the Replicated TSM approach, it looks from the point of view of the SDK as if there is only one TSM. Another benefit is that you can add or remove individual MPC node instances independently.
Replicated MPC Nodes with Direct Communication If you don't have access to a message broker, it is also possible to configure the MPC nodes to scale up horizontally while communicating with each other using direct TLS connections. This gives the same benefits as the message broker approach above, but requires some additional configuration of the MPC nodes (detailed below).

The three options are explained in more detail below. Contact Blockdaemon for more information about the performance of these scaling methods.

📘
DKLS23
If you use the DKLS23 protocol and do horizontal scaling with more than five instances of the same MPC node, then make sure to set the DeactivatedPlayersCache = databasein the MPC node configuration files in the [DKLS23] section as explained here.

Replicated TSM

In this setup you deploy several identical TSMs, each TSM consisting of one set of MPC nodes. All instances of MPC Node 1 (one in each TSM) are configured to share the same database. Similarly, all instances of MPC Node 2 share the same database, and so forth. With this setup, an SDK can now use any of the TSMs.

As an example, the following figure shows a setup with two TSMs (TSM A and TSM B), each consisting of two MPC nodes. A number of SDKs are configured such that some of them connect to TSM A while the other SDKs connect to TSM B.

This is a simple way of improving throughput and/or providing failover. The application can spin up new TSMs if needed, and the SDKs can dynamically be configured to use a random TSM, or the TSM with the most free ressources, etc.

In this setup, however, it is up to your application to figure out which TSM a given SDK should use. To increase performance, you have to add a complete TSM to the setup, i.e., you cannot just add one MPC node at a time. This may be difficult to coordinate if the TSM nodes are controlled by different organizations.

Replicated MPC Nodes with Message Broker Communication

Another way to scale up horizontally is to run multiple instances of a single MPC node behind a load balancer. One way to do this is to configure your Builder Vault MPC nodes to communicate with each other using using a message broker. Then you can simply replace a node by a setup consisting of a load balancer that routes the calls to one of any number of MPC node instances running behind the load balancer (all nodes being instances of the same MPC player, and connecting to the same database). This approach requires no additional changes in the MPC node configuration, and you can add or remove MPC node instances when needed. The load balancer can be a standard (round robin) load balancer such as HAProxy.

Replicated MPC Nodes with Direct Communication

If you don't want to use a message broker for node-to-node communication, you can still achieve horizontal scaling, by configuring the MPC nodes to run in "multi-instance" mode. The MPC nodes will then connect directly to each other using TLS (without a message broker), and the MPC nodes will do some internal routing to make sure that each MPC session "sticks" to a certain set of MPC node instances.

The benefit of this configuration is that from the outside, i.e., seen from the SDK, there is only a single TSM. Also, the replication can be scaled up/down at each MPC node independently of how the other MPC nodes are scaled.

The following example shows a setup with a TSM consisting of three MPC nodes. Node 1 is configured with two node instances, and TSM Node 2 is configured with three node instances. Node 3 runs in the standard configuration, with a single node instance.

Each MPC node instance must be configured to run in "multi-instance" mode:

# This setting enables multiple instances of the same player to be placed behind a load balancer. Each instance will
# either handle sessions itself or route the traffic to other instances.
[MultiInstance]
 
  # IP address where this instance can be reached from other the instances. If not specified an auto-detected address is
  # used and this might not be the address you want if there are multiple IP addresses associated with the system.
  Address = ""
  
  # SDK port announced to the other nodes. If not specified it defaults to the SDK port from the [SDKServer] section.
  SDKPort = 0
  
  # MPC port announced to the other nodes. If not specified it defaults to the SDK port from the [MPCTCPServer] section.
  MPCPort = 0
  
  # How often should we run a cleanup job that purges old routing entries from the database.
  CleanupInterval = "5m"
  
  # Every CleanupInteval the cleanup job will run with this probability. 0 means never and 100 means always.
  CleanupProbability = 25

📘
Horizontal Scaling and Client Certificates
Horizontal scaling with direct TLS connections is not compatible with the use of client certificates for SDK authentication. In this case you need to either choose one of the other options above (replicated TSM or broker-based scaling) or one of the other options for SDK authentication.