Dive into zkLogin’s Salt Server Architecture

Salt servers needed for zkLogin are crucial in maintaining the integrity and privacy of user identities.

Dive into zkLogin’s Salt Server Architecture

zkLogin is a Sui primitive that stands as the first of its kind in Web3—a truly trustless, secure, and user-friendly authentication mechanism. With zkLogin, developers can create seamless onboarding experiences allowing users to sign on with familiar Web2 credentials, such as Google or Facebook, to create and manage Sui addresses effortlessly.

When signing a transaction for an address using zkLogin, a unique salt value must be provided to represent the OAuth, or Web2, credentials associated with the Sui address. This salt value is crucial for ensuring that onchain addresses cannot be traced back to the user’s Web2 credentials. The salt server is responsible for generating, storing, and supplying the salt predictably whenever a transaction is initiated. Developers have several options for generating and storing this salt value, whether on the client side or the server side.

At Mysten Labs, we operate a salt server that uses a master seed in combination with the user's JSON Web Token (JWT) to derive a reproducible salt value per user per app. This process includes verifying the JWT against the provider. Given the sensitivity of the master seed, hosting a salt server in a typical computing environment would be irresponsible. Protecting the master seed is crucial for maintaining the separation of Web2 identities from Sui addresses. 

To safeguard our customers’ and partners’ user identities, our salt server operates in a secure computing environment, ensuring protection from accidental or malicious exposure. We will outline our approach here as a reference for others looking to implement a similar solution.

zkLogin and the salt server

As discussed above, the salt server plays an important part in maintaining privacy and security for users’ Web2 credentials when using zkLogin. Using a secret master seed and the user’s JWT, the salt server produces a salt value that is unique to that user for that app, but hides the connection from the user’s identity to their Sui activity, cryptographically ensuring privacy. The salt value is required before generating a zkLogin proof and therefore before issuing transactions onchain.

When someone uses an app backed by the Mysten Labs salt server, they enter their Web2 credentials and the application requests a JWT from the auth provider. The app then sends the JWT to the salt server to get the salt value. Each time the Sui address is derived from the user’s identity, including computing the original proof to get the user’s zkLogin signature, the salt is used to ensure that the user’s address can always deterministically be computed from their token without revealing the binding between the two.

Diving into the salt server

There are three main goals in protecting the master seed used by the salt server. It must be generated securely, protected from exposure to any person internally at Mysten Labs, and protected from external exposure to the internet through the service or via side channel attacks in our cloud provider. If no person or system sees the key except the salt server, we can be confident in its secrecy and therefore the validity of hashed mapping from JWT to onchain addresses.

Trusted computing systems

A variety of options exist for hosting trusted compute infrastructure. In an ideal world, all hash computation would happen within a trusted hardware module like a Hardware Security Module (HSM) or Trusted Platform Module (TPM). 

However, binding the master seed to a single piece of hardware means that that hardware becomes a single point of failure in the system – loss of access to the HSM or TPM would mean permanent loss of the master seed. The risk of seed loss under these circumstances is too high for our salt server, so we needed something that traded off the absolute security of trusted hardware for more flexibility. The solution we settled on was to use trusted compute environments, where we can run the server in an isolated environment with container attestation and allow access only over TCP directly through to the service’s endpoints.

All three major cloud providers offer trusted compute solutions; Azure Confidential Computing, GCP Confidential VMs, and AWS Nitro Enclaves are all environments that enable isolated computing. We opted to use Nitro Enclaves on EC2. With Nitro Enclaves, we can bring our own container image and existing build tooling and add the Enclave layer on top of it.

Seed generation

Because the master seed is permanent and not rotatable, generation only happens once. The seed could be any byte sequence because randomness is all that matters. However, if, for example, I myself produced the master seed then it would be vulnerable. Even if I don’t reveal the master seed to any other person, I am a fundamental vulnerability to the functionality of the Mysten Labs zkLogin implementation and any app built on it because I am aware of the master seed. Instead we prefer to generate the key in an automated way.

Generating the seed on a machine that is not air gapped introduces additional vulnerabilities because it is exposed to the rest of the system and possibly the internet. Nitro Enclaves were used to produce an isolated environment where we could conduct key generation for zkLogin’s master key securely.

Inside the enclave, we generate the master seed from randomness. We encrypt the seed with an encryption key and store the key in a secrets store. The secrets store is configured so that only the enclave can get the secret, and not even the administrator has access to the plaintext secret. We also split the key and store the shards as described in the “Seed Recovery” section below.

The shell script that runs inside the Nitro Enclave is relatively simple. 

./generate-random-seed > seed.json
secrets-store put --name SEED --file seed.json

Seed usage

The master seed secret has a policy that only allows the enclave identity to access it, which means that even an administrator can’t read it incidentally. We also maintain a separate cloud provider account for the salt server secrets to separate administrator access from other Mysten Labs projects and limit the number of people with access.

When the key is read by the salt server in the Nitro Enclave, it is kept in memory in plaintext. We rely on the protections of the isolated environment to prevent access on the same host and therefore there wouldn’t be a benefit in encrypting the seed at rest and decrypting it for each request. The seed is used on every request, so regular traffic would undermine the protection encryption at rest would offer.

In order for the service to still receive traffic, we had to allow a narrow subset of network access through to the salt server enclave environment. We use a vsock proxy to limit traffic in through the singular application port and traffic out to OAuth providers for JWT verification and to a gateway address for observability publishing. This allows the salt server to do its job of verification and salt generation and lets us monitor the service while disallowing all other network access.

Here is some sample code that demonstrates how we provision the enclave using Pulumi and initialize the enclave.

def get_enclave_instances(args: EnclaveInstanceArgs) -> List[aws.ec2.Instance]:
    instances = []
    for index in range(args.instance_count):
        name = f"{pulumi.get_stack()}-salt-server-ec2-{index}"
        instance = aws.ec2.Instance(
            name,
            tags={
                "Name": name,
            },
            instance_type="m5.xlarge",
            subnet_id=args.network.subnets[0].id,
            vpc_security_group_ids=[
                args.security_group.id
            ],
            ami="ami-mysten123",
            user_data=get_startup_script(
                args.role_name, args.role_arn, args.image_name, index
            ),
            user_data_replace_on_change=True,
            iam_instance_profile=args.instance_profile.name,
            enclave_options=aws.ec2.InstanceEnclaveOptionsArgs(enabled=True),
            opts=ResourceOptions(depends_on=[args.ecr_image]),
        )

        pulumi.export(f"publicIp-{index}", instance.public_ip)
        pulumi.export(f"publicHostName-{index}", instance.public_dns)
        instances.append(instance)
    return instances


def get_startup_script(
    role_name: str, role_arn: Output[str], image_name: Output[str], index: int
):
    aws_config = pulumi.Config("aws")
    return Output.all(role_arn, image_name).apply(
        lambda args: f"""#!/bin/bash 

# print each command
set -o xtrace
yum install awscli aws-nitro-enclaves-cli-devel aws-nitro-enclaves-cli docker nano socat -y

nitro-cli terminate-enclave --all
killall socat

nitro-cli build-enclave --docker-uri '{args[1]}' --output-file salt.eif

EIF_SIZE=$(du -b --block-size=1M "salt.eif" | cut -f 1)
ENCLAVE_MEMORY_SIZE=$(((($EIF_SIZE * 4 + 1024 - 1)/1024) * 1024))

cat >> /etc/nitro_enclaves/vsock-proxy.yaml <<EOF
- {{address: accounts.google.com, port: 443}}

- {{address: secretsmanager.us-west-2.amazonaws.com, port: 443}}
- {{address: kms.us-west-2.amazonaws.com, port: 443}}
EOF


# providers
vsock-proxy 8001 accounts.google.com 443 --config /etc/nitro_enclaves/vsock-proxy.yaml &

# kms
vsock-proxy 8101 secretsmanager.us-west-2.amazonaws.com 443 --config /etc/nitro_enclaves/vsock-proxy.yaml &
vsock-proxy 8102 kms.us-west-2.amazonaws.com 443 --config /etc/nitro_enclaves/vsock-proxy.yaml &

nitro-cli run-enclave --cpu-count 2 --memory $ENCLAVE_MEMORY_SIZE --eif-path salt.eif
ENCLAVE_ID=$(nitro-cli describe-enclaves | jq -r ".[0].EnclaveID")
ENCLAVE_CID=$(nitro-cli describe-enclaves | jq -r ".[0].EnclaveCID")

sleep 5
unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
# Sends creds from ec2 instance to enclave
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/{role_name} > assumed-role.json
# Parse the JSON output and set environment variables
cat assumed-role.json | socat - VSOCK-CONNECT:$ENCLAVE_CID:7777

# This is run after the enclave is up so traffic from port 8888 -> can redirect to the vsock
# Listens on all Ports 8888 to forward from outside into ENCLAVE CID->Port
socat TCP4-LISTEN:8888,reuseaddr,fork VSOCK-CONNECT:$ENCLAVE_CID:3000 &
    """
    )

This Python Pulumi code creates an EC2 instance for each replica of our service, attaches it to the network we set up, and enables Nitro Enclaves for it. The user_data we insert builds the enclave, sets up the vsock proxies, and starts the enclave.

We make every attempt to successfully protect the usage of the master seed, but there remains a risk in leaving our secrets with a cloud provider in a single service. A seed recovery plan in place makes sure that we have multiple ways to recover the master seed in a disaster scenario.

Seed recovery

Part of our initial seed generation process involved splitting the seed into multiple encrypted shards to enable recovery by a group of people involved in the design of the salt server system. We used Unit 410’s Horcrux utility to accomplish this.

Horcrux uses Shamir’s Secret Sharing to generate multiple encrypted parts of the seed and allows decryption with a specified subset of the shards. Each of the shards was encrypted with a hardware key owned by each individual in the group, adding a geographically distributed level of security.

The encrypted shards are stored redundantly in multiple remote servers. Since they are encrypted, we store them together and add the redundancy to avoid a single point of failure. In a scenario where the key was lost from the server, recovery is made swift and secrecy is maintained through Horcrux.

Trade-offs

Our salt server production system looks quite different from the many other services we run. The security constraints we placed on the service by running it in Nitro Enclaves makes operationalizing the salt server its own challenge.

Every network proxy addition we make to the enclave is an additional potential surface for exfiltration of the master seed. As long as we continue to follow the same practices for each new OAuth provider, we expect the attack surface to remain the same. We don’t anticipate needing more network traffic into and out of the enclave itself.

The salt server by necessity must remain simple; communication with other Mysten Labs' services and integration with new tooling over time will likely require individual specific consideration for the security trade-offs involved. We will maintain the system in a severely constrained environment in an effort to maintain what is ultimately a critical piece of the system for applications built on our zkLogin implementation.

Security first

At Mysten Labs we focus on solving foundational problems and zkLogin is evidence of that focus in action. As we continue to build other new constructs similar to and on top of zkLogin, we will hold our systems to the highest standards of security in cryptographically provable ways. Our implementation, combined with the power of Nitro Enclaves and Horcrux, demonstrates commitment to those standards and brings the benefits of Web3 to everyone.