AWS Builder Vault Operations
Health System
This section assists you in understanding the wallet metrics displayed on the AWS CloudWatch dashboard and provides insights into maintaining the overall health of the Builder Vault system.
Cloudwatch Dashboard
The AWS Marketplace offering comes standard with AWS CloudWatch dashboards for your nodes. Here is an example of a static 2-node deployment configuration. You can access your dashboard using the Cloudwatch dashboard for your node infrastructure. Depending on if you are looking for the core node stack or the expansion node stack, you’d navigate to:
- Core Stack -
<namespace>-<env>-core-nodes
- Expansion Node Stack -
<namespace>-<env>-node-<index>
Note:
A nitro status of
1
means that your nodes are online and health checks passing0
or lower would signify that the nodes are not healthy.
Health Status for the Builder Vault service
To maintain smooth operations, each service within the deployment provides a /ping
endpoint. This endpoint is responsible for indicating the health status of the service through HTTP status codes. Follow the steps outlined below to ensure the system's reliability.
1. Enable Service Health Monitoring
For optimal service performance, consistently check the /ping
endpoint, which reflects the current status.
2. Decode Status Codes
The /ping
endpoint status codes reveal service health. 200 - OK indicates success; others like 503 - service unavailable suggest possible issues.
3. Set up Instant Alerts
Arrange to receive notifications when /ping
status codes deviate from 200 for rapid issue resolution.
4. Apply Auto-Restart
When a service repeatedly fails health checks, trigger an automatic restart. Fine-tune the number of failures needed to invoke this to ensure system stability.
5. Connect Health Checks and Logs
Link health checks with logs for additional failure cause insight, aiding in swift problem resolution.
Builder Vault Node Troubleshooting
Troubleshooting a confidential computing environment differs from a regular application because of the challenges of running an enclave container in protected mode. However, we have compiled a series of operations that can be done given the various scenarios to make troubleshooting a possibility without compromising the underlying security.
Troubleshoot a Node
This tutorial will show you a step-by-step guide to troubleshooting a node that might go down for various reasons.
1. Check a node health
- Check the status of a node by logging into the EC 2 instance.
- Select an EC2 instance to troubleshoot.
- Click the Connect button to open an SSM terminal session.
- To have permission to troubleshoot the instance use the following
sudo
command:
sudo su
journalctl -f -u enclave.service
Note:
You can also create alarms in the CloudWatch dashboard for when the node status is outside of your respective levels.
2. Edit the Configuration Files
Due to the sensitive nature of the Builder Vault MPC configuration files and the multi-party requirement, these configurations are stored within the KMS Stack resources. Upon startup, the infrastructure accesses these assets. For troubleshooting a node, access to the corresponding KMS Stack is essential for entering debug mode for the specific configuration. However, as an admin, you can view these configuration files without secrets to identify any configuration issues, as they do not contain secret material. To edit the files:
- Search the specific configuration to troubleshoot a specific node by using the following command:
#!/bin/bash
# Retrieve the EC2 instance metadata token
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
# Get SSM parameter and Enclave name from instance metadata
SSM_PARAM=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" --silent http://169.254.169.254/latest/meta-data/tags/instance/CONFIG_SSM_PARAM)
ENCLAVE_NAME=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" --silent http://169.254.169.254/latest/meta-data/tags/instance/EnclaveName)
# Retrieve the KMS stack SSM parameter ARN and then extract the region from it
KMS_STACK_SSM_PARAM=$(aws ssm get-parameter --name "$SSM_PARAM" --query "Parameter.Value" --output text | jq -r .SHARD_SSM_PARAM_ARN)
KMS_STACK_REGION=$(echo "$KMS_STACK_SSM_PARAM" | cut -d':' -f4)
# Get the configuration bucket name from the SSM parameter
CONFIG_BUCKET=$(aws ssm get-parameter --name "$KMS_STACK_SSM_PARAM" --region "$KMS_STACK_REGION" --output text --query "Parameter.Value" | jq -r .CONFIG_BUCKET)
# Copy the config file from the S3 bucket and output to the console
aws s3 cp "s3://$CONFIG_BUCKET/$ENCLAVE_NAME/config.toml" - --region "$KMS_STACK_REGION"
- Navigate to the S3 bucket in the deployed KMS stack using the following command:
Note:
You need to have access to the deployed KMS stack to edit the instance configuration.
s3://<namespace>-<accountId>-<region>-node-<index>-bucket/node-<index>/config.toml
Note:
Node index is the player index of the Builder Vault instance. These will all be unique for each MPC player.
- Download, edit, and re-upload the custom configuration files to the specified location.
- After uploading, restart each node via SSM and execute the restart command for every node.
sudo systemctl restart enclave.service
3. Enter the Nitro Debug Mode
AWS Nitro secure enclaves operate in a protected environment, isolating memory, storage, and network, preventing standard logging to stdout. However, operators can enable logging by accessing the node's KMS Stack and placing the node into Nitro debug mode, facilitating visibility into log outputs during debugging. To place a node in Nitro Debug mode, follow the steps below:
- Find the node's KMS Stack and locate the
ShardKMSKeyNitro
resource, which secures the Nitro envelope encryption key. Adjust the policy to enable debug mode for the node. - Click on the Link to navigate to the specific KMS key.
- Click the Edit button.
- Change the key policy value to:
"kms:RecipientAttestation:PCR8": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
- To access debug mode on the nitro instance, log in to the Nitro EC2 instance you selected.
- Shut down and restart the selected nitro by using the following command:
cd /app
sudo DEBUG_MODE="--debug-mode" ./restart-enclave.sh; nitro-cli console --enclave-name enclave
- This will open a nitro console that will tail the application logs.
- Start the enclave service in
protected mode
, in the node EC2 instance, type in the following command:
sudo ./restart-enclave.sh; nitro-cli console --enclave-name enclave
Note:
When finished debugging, revert the KMS policy back to the initial settings, and restart the enclave service.
AWS ERS Instructions
This section provides step-by-step instructions for configuring Emergency Recovery Service (ERS) for TSM deployed via AWS Marketplace.
- Stop all the static and dynamic nodes currently running.
- Update the config file for each node (static 0, 1, and any dynamic) and add the ERS configuration (see below). Example s3 file path: 123456789012-dev-tsmdemo-tsm-config-bucket/tsmdemo-0/config.toml.
- Turn all the nodes back on.
- Follow the below process to use the ERS.
- Turn the nodes off.
- Remove the ERS config from each config file.
- Turn nodes back on.
Updated 8 months ago