Inter-Node Communication Problems
This will typically be encountered when starting to use the SDK to call the TSM nodes, or when making changes. This will typically result in the SDK returning the error:
tsm operation failed ; node 0 returned 500: Internal Server Error\n sessionID=<SessionID>
To get more information on what the underlying problem is, the logs should be extracted from the TSM nodes.
The communication errors that are covered on this page will typically give one or more of the following log messages
endpoint error: an error occurred during key generation: timed out while creating channels sessionID=<SessionID>
endpoint error: an error occurred during key generation: EOF sessionID=<SessionID>
closing unclaimed channel for session id <SessionID>
The first means that a node was waiting for another node to connect to it, but no one ever did.
The second means that the connection was closed when trying to read from it. This typically happens if a node has timed out and is closing down, but the connection has not finished closing down completely, or if a firewall cuts the connection for some reason.
The third indicates that the connection was made, but the channel was never used. This happens if e.g. session IDs are not the same so the operations are not linked together correctly, or if the TSM nodes are called at different times (e.g. at the other end of an EOF).
If running in a multi-tenant setting, i.e. where several instances of Node 0 run on multiple mobile nodes using dynamic node configuration, the following log entry should also be present in the non-mobile nodes:
tenant public key for sessionID <SessionID> registered
This means that the public key for the mobile phone was registered successfully. If this is not present in the multi-tenant setup, then it should be checked that RegisterTenantPublicKey is called with the correct session ID and key for each session ID operation that is invoked.
Suggested actions
The following actions can be performed to try and fix the issues.
Configuration
First action should be to check that connections are configured correctly. Example (partial) configuration:
[Player]
Index = 1
PrivateKey = "<Private Key.1>"
[Players.0]
Address = "<Node.0 Address>"
PublicKey = "<Public Key.0>"
[Players.1]
Address = "<Node.1 Address>"
PublicKey = "<Public Key.1>"
[Players.2]
Address = "<Node.2 Address>"
PublicKey = "<Public Key.2>"
Note that [Players.0]
is not present when using multi tenant setup (Node 0 running on a mobile phone), and the
[Players.1]
is optional on node 1.
Things to check:
- The
Index
in the[Player]
section is the correct index for the node. - That the Address in
Address
of the[Players.<id>]
contains the correct host and port. The format istsm-node0:9000
. - That the public key matches the private key. This means that the
PrivateKey
in[Player]
must match the public key in the[Players.<id>]
on the other nodes. Here<id>
should match theIndex
in the[Player]
with the private key.
Code
There are several things that need to be done to get things to work.
Note:
This is only for SDK v2
If running in a setup where a single SDK controls all nodes, then calling the e.g. Keygen
on the SDK should call all nodes, which should work.
If running with multiple SDKs each controlling one (or a few) node(s), then there are a lot of pitfalls. The general
process that needs to be followed:
- Generate a session ID.
- Distribute the session ID.
- If running node 0 on a mobile: Call Register Tenant with the session ID and the public key of the mobile phone on all non-mobile nodes.
If this fails, then there multiple things to check:
- Try to find the logs mentioned. Check that the session ID is consistent across nodes.
- The default connection time is 10 seconds. If this is too low it can be increased by setting the
ConnectionTimeout
in the [MPC] section. - Make sure that things are actually called in parallel. Often the main thread will block when calling, so the following call to the server to trigger the call against the server SDK will not be made until the mobile SDK call have timed out.
- If running mobile nodes, then make sure the
tenant public key for sessionID <SessionID> registered
log is present on the server nodes.
Updated 2 months ago