Skip to content

Tencent Cloud Cluster Deployment (Terraform)

This guide explains how to use the Terraform deployer shipped in the release bundle to stand up a clustered Cube Sandbox on Tencent Cloud in one shot: a managed TKE control plane running cube-master / cube-api / cube-proxy / cube-webui, backed by cloud MySQL + Redis, with one or more CVM PVM compute nodes. A jumpserver (SSH on port 443) acts as the build host and the bastion for the otherwise-private VPC.

Network Hardening

Network hardening for the cluster deployment is handled by Tencent Cloud security groups: the deployer creates 4 per-role security groups (jumpserver / compute / TKE pod / CLB), each opening only the ingress that role actually needs on a least-privilege basis (compute and TKE nodes get no public ingress at all), and assigns no public IP to compute nodes. The default is internal mode (TENCENTCLOUD_ENABLE_PUBLIC_NETWORK='false'): the three user-facing services — WebUI / cube-api / cube-proxy — are fronted by VPC-internal CLBs, reachable only from inside the VPC (via the jumpserver / VPN) with no public exposure. To allow public access, explicitly set TENCENTCLOUD_ENABLE_PUBLIC_NETWORK='true' to switch them to public CLBs. To tighten further, adjust the inbound/outbound rules of each individual group (cubesandbox-sg-jumpserver / cubesandbox-sg-compute / cubesandbox-sg-tke-pod / cubesandbox-sg-clb) as needed in the Tencent Cloud security group console. When public mode is enabled, also see Hardening the Public-Facing Services.

When to use this

This deployment uses cloud resources to quickly stand up a highly-available CubeSandbox service: all cloud resources default to pay-as-you-go billing (see Billing Mode below) and can be released in one shot with destroy.sh. For long-term use, switch to prepaid (monthly/yearly subscription) resources for better cost savings (see Billing Mode). If you only need a single-machine deployment for validation, see the earlier deployment guides: PVM Deployment or Bare-Metal Deployment.

Note: The default configuration is a POC / functional validation setup (2× SA9.MEDIUM8 compute nodes, single-replica control plane, no CFS). It can host only a limited number of sandboxes. For production or load testing, adjust compute node and TKE worker specs and counts — see Node Specifications & Capacity Planning and Default Deployment Mode.

Architecture Overview

                          Internet

               ┌────────────┴────────────────────┐
               │                                 │
      ┌────────┴────────┐             ┌──────────┴─────────┐
      │  Jumpserver CVM │             │ CLB (internal/     │
      │  (public IP)    │             │      public)       │
      │  SSH:443        │             │  cube-api :3000    │
      │  build & push   │             │  cube-proxy :80/443 │
      └────────┬────────┘             │  cube-webui :80    │
               │                      └──────────┬─────────┘
               │                                 │
  ┌────────────┼─────────────────────────────────┼──────────┐
  │            │            VPC internal          │           │
  │  ┌─────────┴────────┐       ┌───────────────┴─────┐    │
  │  │ Compute CVM ×N   │       │  TKE managed cluster │    │
  │  │  Cubelet         │       │  cube-master (×1)   │    │
  │  │  network-agent   │       │  cube-api (×1)      │    │
  │  │  CubeEgress      │       │  cube-proxy (×1)    │    │
  │  └──────────────────┘       │  cube-webui (×1)    │    │
  │                             └───┬─────────────┬───┘    │
  │                                 │             │         │
  │                    ┌────────────┴───┐ (opt.)  │         │
  │                    │  CFS (NFS)     │         │         │
  │                    │  shared store  │         │         │
  │                    └────────────────┘         │         │
  │                                               │         │
  │                         ┌─────────────────────┴───┐     │
  │                         │  Cloud DB: MySQL + Redis │     │
  │                         └─────────────────────────┘     │
  │                                                         │
  │  ┌──────────────┐       ┌──────────────┐                │
  │  │ TCR (opt.)   │       │  NAT + EIP   │→ public egress │
  │  └──────────────┘       └──────────────┘                │
  └─────────────────────────────────────────────────────────┘
ComponentFormNotes
JumpserverCVM (public IP, SSH 443)Build host (TCR mode) and bastion into the private VPC
Load balancerCLB (internal/public)Fronts cube-api / cube-proxy / cube-webui; internal mode by default, user traffic entry point
Control planeManaged TKE clusterRuns cube-master / cube-api / cube-proxy / cube-webui
Compute nodeCVM PVMRuns Cubelet / network-agent / CubeEgress; actually hosts sandboxes
DatabaseCloud MySQL 8.0 + Redis 7.0VPC-internal only, no public access
Shared storageCFS (General Standard NFS, optional)When USE_CFS=true and cubemaster has multiple replicas, ReadWriteMany share for /data/CubeMaster/storage
RegistryTCR (basic, optional)When USE_TCR=true; the jumpserver builds and pushes component images
EgressNAT gateway + EIPThe whole VPC reaches the internet through NAT

TKE workers and PVM compute nodes are separate resources

  • TENCENTCLOUD_TKE_NODE_COUNT: number of TKE workers running control-plane Pods (cubemaster / cube-api / etc.).
  • TENCENTCLOUD_COMPUTE_NODE_COUNT: number of PVM compute nodes running Cubelet; sandboxes execute here. Both default to 2 but serve completely different roles.

Default Deployment Mode

Matching env.example / variables.tf, the default is public images + single-replica control plane + no CFS — a POC configuration:

SettingDefaultNotes
TENCENTCLOUD_USE_TCRfalseNo TCR; uses public pre-built images, no build on the jumpserver
TENCENTCLOUD_USE_CFSfalseNo CFS; cubemaster uses Pod-local storage
TENCENTCLOUD_CUBEMASTER_REPLICAS etc.1Control-plane components default to single replica
TENCENTCLOUD_COMPUTE_NODE_COUNT2PVM compute nodes
TENCENTCLOUD_TKE_NODE_COUNT2TKE workers (worker_config.count)
TENCENTCLOUD_ENABLE_PUBLIC_NETWORKfalsecube-api / cube-proxy / cube-webui use VPC-internal CLBs

Advanced modes:

  • TENCENTCLOUD_USE_TCR=true: create TCR and build/push the four component images on the jumpserver.
  • TENCENTCLOUD_USE_CFS=true with TENCENTCLOUD_CUBEMASTER_REPLICAS>1: create CFS for cubemaster multi-replica shared storage.

cube-proxy defaults to single replica (TENCENTCLOUD_CUBE_PROXY_REPLICAS=1). Auto-pause / auto-resume is reliable only with one replica (current sidecar model; after PR #705 merges this becomes a standalone cube-lifecycle-manager). For multiple replicas the front-end LB must hash on SandboxID (session affinity).

Resources Created by the Default Configuration

The table below lists cloud resources created under the default configuration (region ap-guangzhou, zone ap-guangzhou-6, 2 compute nodes, 2 TKE workers, no CFS/TCR). All resources are pay-as-you-go (see Billing Mode).

ResourceCountSpec / Configuration
VPC1CIDR 10.0.0.0/16
Subnet110.0.1.0/24 (primary zone; extra /24 subnets only when a role lands in another zone)
NAT gateway + EIP1 + 1200 Mbps bandwidth, pay-by-traffic
Route table entry10.0.0.0/0 → NAT gateway
Security group4Per-role (jumpserver / compute / TKE pod / CLB), least privilege, see below
SSH key pair1Auto-generated under terraform/tencentcloud/.ssh/
Jumpserver CVM1SA9.MEDIUM4 (2C4G), 50GB general-purpose SSD system disk, 200 Mbps public bandwidth, SSH on port 443
Compute CVM2SA9.MEDIUM8 (4C8G), 50GB system disk + 200GB CBS data disk (XFS, /data/cubelet), no public IP
TKE managed cluster1Managed L5, Kubernetes 1.34.1, Pod CIDR 10.200.0.0/16, Service CIDR 192.168.0.0/20, VPC-internal API only
TKE worker nodes2SA9.LARGE8 (4C8G), created via worker_config.count (no separate node pool)
Cloud MySQL18.0 InnoDB universal, 4GB memory / 200GB storage, multi-AZ (when the region has ≥2 zones) / semi-sync, intranet 3306 only
Cloud Redis17.0 standard architecture (master/replica), 1GB memory, port 6379, intranet only
CFS file system0 (default)1 General Standard NFS when USE_CFS=true
TCR registry0 (default)Basic + namespace + VPC attachment when USE_TCR=true
OS imageOpenCloudOS Server 9 (public image, reused by all CVMs)

OS image

All CVMs (jumpserver / compute / TKE workers) default to the OpenCloudOS Server 9 public image; override with TENCENTCLOUD_IMAGE_NAME.

Security Group Ingress Ports

The deployer creates 4 per-role security groups on a least-privilege basis; each opens only the ingress that role actually needs, so compromising one role never inherits the inbound surface of the others.

1. cubesandbox-sg-jumpserver (jumpserver)

Port / RangeSourcePurpose
TCP 4430.0.0.0/0Jumpserver SSH (cloud-init moves sshd to 443)
ALL10.0.0.0/16VPC-internal traffic

2. cubesandbox-sg-compute (compute nodes) — no public ingress

Port / RangeSourcePurpose
ALLTKE Pod CIDRcube-proxy (pod) → all ports on compute nodes (sandbox dynamic ports 20000-29999)
ALL10.0.0.0/16VPC-internal traffic (jumpserver management, cube-master scheduling)

3. cubesandbox-sg-tke-pod (TKE workers) — no public ingress

Port / RangeSourcePurpose
ALLTKE Pod CIDRPod-to-pod communication
ALL10.0.0.0/16VPC-internal (CLB health checks, jumpserver management, CFS NFS)

4. cubesandbox-sg-clb (load balancers)

The source for the 80 / 443 / 3000 entrypoints below depends on TENCENTCLOUD_ENABLE_PUBLIC_NETWORK: in the default internal mode (false) the source is the VPC CIDR 10.0.0.0/16 (VPC-internal only); in public mode (true) the source is 0.0.0.0/0 (open to the internet).

Port / RangeSource (internal / public mode)Purpose
TCP 8010.0.0.0/16 / 0.0.0.0/0CLB for cube-proxy + cube-webui (HTTP)
TCP 44310.0.0.0/16 / 0.0.0.0/0CLB for cube-proxy (HTTPS)
TCP 300010.0.0.0/16 / 0.0.0.0/0CLB for cube-api
TCP 808910.0.0.0/16 (always VPC-internal only)Internal CLB for cube-master (unaffected by the flag)

Egress allows all (0.0.0.0/0 ALL) by default on every group. Databases, the TKE API server, etc. are not exposed publicly — VPC-internal access only.

Hardening the Public-Facing Services

This section applies only when public mode is enabled (TENCENTCLOUD_ENABLE_PUBLIC_NETWORK='true'). In the default internal mode none of the three services are exposed publicly, so you can skip it.

With public mode enabled, cubesandbox-sg-clb opens three public entrypoints to 0.0.0.0/0: WebUI (80), cube-proxy (80 / 443), and cube-api (3000). Their security models differ, so harden each one separately:

  • WebUI (CLB 80): the WebUI console currently ships without any authentication or access control — anyone who can reach it can operate sandboxes. Strongly consider creating a dedicated security group for the WebUI CLB and configuring a strict source-IP allowlist (only your office / management egress IPs) instead of reusing the publicly-open cubesandbox-sg-clb. Create the group in the Tencent Cloud security group console and bind it to the WebUI CLB instance.
  • cube-api (CLB 3000): cube-api lets every request through without credential checks by default. Before exposing it publicly, enable Auth Callback authentication to delegate authorization decisions to your own auth service — see Authentication.
  • cube-proxy (CLB 80 / 443): cube-proxy is the public ingress for sandbox traffic and is public-facing by design. To restrict public access to sandboxes, see Restrict Public Access to enable mechanisms such as per-sandbox inbound tokens.

Prerequisites

The machine running create.sh only needs the following — no pre-installed Terraform required:

  • Tencent Cloud API credentials: create an API key pair in the CAM console and export TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY.
  • Local tools: ssh, scp, nc, plus network access to the Tencent Cloud APIs. terraform and jq are auto-installed when missing — into /usr/local/bin when writable (e.g. running as root), otherwise into a local .bin/. terraform is fetched from the HashiCorp release site (needs curl/wget + unzip); jq comes from the system package manager or a static binary from GitHub.
  • mkcert / openssl are not needed locally — the cube-proxy certificate is produced on the jumpserver.
  • Build / deploy host: Linux recommended; create.sh / destroy.sh also support macOS (including Bash 3.2). Use WSL2 on Windows.

Pre-deployment setup (service activation & authorization)

Complete these before the first create.sh apply to avoid half-created resources mid-run.

1. Account basics

2. TKE service role authorization (required)

Before first use of TKE you must grant service roles, otherwise cluster creation, CLB-type Services, etc. will fail.

  1. Log in to the TKE console and complete service authorization when prompted.
  2. Confirm roles such as TKE_QCSRole / IPAMDofTKE_QCSRole are authorized.

Official docs: Service authorization role permissions, TKE quick start.

3. Private DNS (as needed)

ScenarioRequired?
Default USE_TCR=falseUsually not
USE_TCR=trueRecommended
E2B SDK access to *.cube.appYes (see E2B and the cube.app domain)

Without activation the API may return ResourceNotFound.ServiceNotSubscribed (Private DNS service not subscribed).

Activation: DNSPod Private DNS → accept agreement → activate. Docs: Private DNS product overview, Activate Private DNS.

4. Cloud File Storage CFS (as needed)

Only when TENCENTCLOUD_USE_CFS=true and cubemaster runs multiple replicas.

  1. Log in to the CFS console and activate CFS when prompted.
  2. Terraform creates the file system during apply (no manual creation needed).

Docs: CFS quick start. If the account is not activated the console may show no instances; prefer running destroy.sh's CFS stage on teardown.

5. Other service preflight

ServiceConsole
VPC / NATVPC
CVMCVM
TKETKE
MySQL / RedisMySQL / Redis
CLBCLB
TCR (USE_TCR=true)TCR

Quick Start

The deployer is surfaced at the top level of the extracted bundle, so you can run it directly after extracting:

bash
tar -xzf cube-sandbox-one-click-<version>.tar.gz
cd cube-sandbox-one-click-<version>

# Copy the environment template, fill in your credentials (TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY) and edit the rest as needed
cp terraform/tencentcloud/env.example terraform/tencentcloud/.env
# $EDITOR terraform/tencentcloud/.env

./terraform/tencentcloud/create.sh

create.sh auto-loads the .env next to it (filling in only variables not already set in the current shell), so credentials can either go straight into .env or be injected via export — the latter takes precedence and overrides the matching value in .env:

bash
export TENCENTCLOUD_SECRET_ID="your-secret-id"
export TENCENTCLOUD_SECRET_KEY="your-secret-key"

See Configuration below for the meaning and defaults of each .env option; anything left unset falls back to the defaults.

create.sh runs entirely from the extracted bundle and automatically:

  1. Auto-detects the local bundle (the outer cube-sandbox-one-click-<version>.tar.gz, or re-packs the extracted directory if the tarball is gone) and uses it as the offline source for component images and compute-node installation. When detected — or set via TENCENTCLOUD_LOCAL_BUNDLE=/path/to.tar.gz — no public download is required; otherwise the jumpserver falls back to an online install (needs public network).
  2. Generates an SSH key pair under terraform/tencentcloud/.ssh/ if none exists.
  3. Generates the cube-proxy TLS certificate (cube.app / *.cube.app) on the jumpserver with the bundled mkcert, downloading it to terraform/tencentcloud/cubeproxy-certs/ for the Secret mount.
  4. Default mode (USE_TCR=false): pull public pre-built images and deploy TKE addons and CVM compute nodes (2 by default).
  5. TCR mode (USE_TCR=true): create TCR, build and push the four component images on the jumpserver, then deploy TKE and compute nodes.

Configuration

Common environment variables (matching the create.sh / variables.tf defaults) are listed in terraform/tencentcloud/env.example, which you can copy to .env and fill in:

bash
export TENCENTCLOUD_REGION=ap-guangzhou
export TENCENTCLOUD_AVAILABILITY_ZONE=ap-guangzhou-6
export TENCENTCLOUD_COMPUTE_NODE_COUNT=2              # CVM PVM compute nodes
export TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE=200        # CBS data disk per compute node (GB)
export TENCENTCLOUD_TKE_NODE_COUNT=2                 # TKE worker nodes (control-plane Pods)
export TENCENTCLOUD_COMPUTE_INSTANCE_TYPE=SA9.MEDIUM8
export TENCENTCLOUD_USE_TCR=false                    # default: public pre-built images
export TENCENTCLOUD_USE_CFS=false                    # default: no CFS
export TENCENTCLOUD_CUBE_IMAGE_TAG=cubesandbox002-20260630

Common Variables

When running create.sh interactively, the "Deployment configuration" step walks you through most of the settings below (region / VPC / instance types / passwords / image tag, etc.), including a [y/N] prompt for whether to enable public-network mode (default N = internal). Set any of these variables beforehand to skip the matching prompt; a non-interactive run (no TTY) silently uses the defaults.

VariableDefaultDescription
TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEYnoneRequired. Tencent Cloud API credentials
TENCENTCLOUD_REGIONap-guangzhouRegion
TENCENTCLOUD_AVAILABILITY_ZONEap-guangzhou-6Primary zone (subnet / MySQL / Redis / TKE control plane)
TENCENTCLOUD_JUMPSERVER_INSTANCE_TYPESA9.MEDIUM4Jumpserver instance type
TENCENTCLOUD_COMPUTE_INSTANCE_TYPESA9.MEDIUM8Preferred compute-node instance type
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPESA9.LARGE8TKE worker instance type (4C8G)
TENCENTCLOUD_COMPUTE_NODE_COUNT2PVM compute nodes (run Cubelet / sandboxes)
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE200CBS data disk size per compute node (GB, XFS, mounted at /data/cubelet). Sandbox image templates, snapshots and runtime data on the compute node all live under this directory — size it to your actual needs
TENCENTCLOUD_TKE_NODE_COUNT2TKE workers (run control-plane Pods; maps to worker_config.count)
TENCENTCLOUD_USE_TCRfalseWhen true, create TCR and build/push images on the jumpserver; when false, use public pre-built images
TENCENTCLOUD_USE_CFSfalseWhen true and cubemaster has multiple replicas, create CFS NFS shared storage
TENCENTCLOUD_TKE_CLUSTER_VERSION1.34.1TKE Kubernetes version
TENCENTCLOUD_MYSQL_PASSWORDinsecure demo valueMySQL root password (change for real use)
TENCENTCLOUD_REDIS_PASSWORDinsecure demo valueRedis password (change for real use)
TENCENTCLOUD_CUBE_DB / TENCENTCLOUD_CUBE_USER / TENCENTCLOUD_CUBE_PASSWORDcube_mvp / cube / demoApplication DB name / account / password
TENCENTCLOUD_CUBEMASTER_REPLICAS1cube-master replica count
TENCENTCLOUD_CUBE_API_REPLICAS1cube-api replica count
TENCENTCLOUD_CUBE_PROXY_REPLICAS1cube-proxy replica count. Default 1: auto-pause/auto-resume only works in single-replica mode. Going >1 requires the LB to hash on SandboxID
TENCENTCLOUD_CUBE_WEBUI_REPLICAS1cube-webui replica count
TENCENTCLOUD_ENABLE_PUBLIC_NETWORKfalseNetwork exposure mode for cube-api / cube-proxy / cube-webui. Default false: VPC-internal CLBs, reachable only from inside the VPC (via jumpserver / VPN). Set to true for public CLBs reachable from the internet, with the security group opening 0.0.0.0/0 accordingly. cube-master always stays VPC-internal. Read Hardening the Public-Facing Services before enabling

Non-interactive / CI runs

Without a TTY the interactive menus fall back to defaults, so set them explicitly to stay in control. The password variables are the exception: a non-interactive run refuses to start with the built-in, publicly-known demo passwords and requires them to be set — or set TENCENTCLOUD_ALLOW_INSECURE_DEFAULTS=1 to opt into the insecure defaults for a throwaway sandbox.

bash
export TENCENTCLOUD_AVAILABILITY_ZONE=ap-guangzhou-6
export TENCENTCLOUD_COMPUTE_INSTANCE_TYPE=SA9.MEDIUM8
export TENCENTCLOUD_LOCAL_BUNDLE=/path/to/cube-sandbox-one-click-<version>.tar.gz  # auto-detected inside an extracted bundle
export TENCENTCLOUD_PVM_KERNEL_VMLINUX=/path/to/vmlinux-pvm  # only if the bundle ships no vmlinux-pvm
export TENCENTCLOUD_MYSQL_PASSWORD=...    # required for non-interactive runs (no insecure fallback)
export TENCENTCLOUD_REDIS_PASSWORD=...    # required for non-interactive runs
export TENCENTCLOUD_CUBE_PASSWORD=...     # required for non-interactive runs
export TENCENTCLOUD_BUILD_IMAGES=0        # TCR mode: reuse already-pushed images

More advanced toggles (TENCENTCLOUD_VERBOSE, TENCENTCLOUD_REINSTALL, TENCENTCLOUD_RESET_DB, SSH port/key paths, etc.) are documented in the create.sh header comments.

Node Specifications & Capacity Planning

Default Configuration

The default deployment provisions 2× 4C8G compute nodes (SA9.MEDIUM8, 200GB data disk each) and 2 TKE workers — suitable for POC / functional validation with limited sandbox capacity. Default specs per role:

RoleDefault TypeSpecsCount
Compute node (PVM)SA9.MEDIUM84C8G + 200GB data disk2
TKE WorkerSA9.LARGE84C8G2 (worker_config.count)
JumpserverSA9.MEDIUM42C4G1
Control-plane Podscubemaster / cube-api / cube-webui ×1 eachon TKE workers

WARNING

The default configuration is only suitable for functional validation and small-scale evaluation. For large-scale production usage, you must adjust the compute node and TKE node specs and counts, otherwise you will encounter CPU/memory exhaustion, Pod scheduling failures, and sandbox creation timeouts.

Adjusting Compute Nodes

Compute nodes are the machines that actually run sandbox containers. Use the following environment variables to adjust their specs, count, and disk size:

Environment VariableDescription
TENCENTCLOUD_COMPUTE_INSTANCE_TYPECompute node instance type
TENCENTCLOUD_COMPUTE_NODE_COUNTNumber of compute nodes
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZEData disk size (GB)

Example: scale up specs and count (.env configuration):

bash
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE='SA9.4XLARGE32'
TENCENTCLOUD_COMPUTE_NODE_COUNT='4'
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE='500'

Example: use higher specs (.env configuration):

bash
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE='SA9.16XLARGE128'
TENCENTCLOUD_COMPUTE_NODE_COUNT='8'
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE='1000'

Bare Metal Servers

For stronger compute isolation and performance (avoiding virtualization overhead), you can use Tencent Cloud Bare Metal Servers as compute nodes. Simply set TENCENTCLOUD_COMPUTE_INSTANCE_TYPE to a bare metal instance type (e.g. BMS5.12XLARGE192) — the rest of the deployment process remains the same.

Example: heterogeneous mix (using Terraform variables directly, per-node instance types):

bash
export TF_VAR_compute_node_count=5
export TF_VAR_compute_instance_types='["SA9.8XLARGE64","SA9.8XLARGE64","SA9.4XLARGE32","SA9.4XLARGE32","SA9.4XLARGE32"]'
export TF_VAR_compute_availability_zones='["ap-guangzhou-6","ap-guangzhou-7","ap-guangzhou-6","ap-guangzhou-7","ap-guangzhou-3"]'
export TF_VAR_compute_data_disk_size=1000

Adjusting TKE Worker Nodes

TKE Workers run cube-master, cube-api, cube-proxy, cube-webui and other control-plane Pods. At larger scale, control-plane request volume and scheduling pressure increase significantly, requiring more resources.

Environment VariableDescription
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPETKE Worker node instance type
TENCENTCLOUD_TKE_NODE_COUNTNumber of TKE Worker nodes

Example: upgrade TKE Worker specs:

bash
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPE='SA9.2XLARGE16'
TENCENTCLOUD_TKE_NODE_COUNT='4'

Also consider increasing control-plane replica counts for higher throughput and availability (multi-replica cubemaster requires TENCENTCLOUD_USE_CFS=true):

bash
TENCENTCLOUD_USE_CFS=true
TENCENTCLOUD_CUBEMASTER_REPLICAS='3'
TENCENTCLOUD_CUBE_API_REPLICAS='3'
TENCENTCLOUD_CUBE_WEBUI_REPLICAS='2'

Full Example: Production Configuration

bash
# --- Compute nodes
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE='SA9.8XLARGE64'
TENCENTCLOUD_COMPUTE_NODE_COUNT='6'
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE='1000'

# --- TKE control plane
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPE='SA9.2XLARGE16'
TENCENTCLOUD_TKE_NODE_COUNT='4'

# --- Control-plane replicas (requires CFS)
TENCENTCLOUD_USE_CFS=true
TENCENTCLOUD_CUBEMASTER_REPLICAS='3'
TENCENTCLOUD_CUBE_API_REPLICAS='3'
TENCENTCLOUD_CUBE_WEBUI_REPLICAS='2'

# --- Jumpserver can stay at default (build-only, no runtime load)
TENCENTCLOUD_JUMPSERVER_INSTANCE_TYPE='SA9.MEDIUM4'

Billing Mode

Currently pay-as-you-go (POSTPAID) only

All cloud resources created by this deployer are hard-coded to pay-as-you-go billing. There is currently no environment or Terraform variable to switch to prepaid (monthly/yearly subscription, PREPAID).

The charge type of each resource is fixed as follows:

ResourceCharge fieldValueMeaning
Jumpserver CVMinstance_charge_typePOSTPAID_BY_HOURPay by hour
Compute CVMinstance_charge_typePOSTPAID_BY_HOURPay by hour
Cloud MySQLcharge_typePOSTPAIDPay-as-you-go
Cloud Redischarge_typePOSTPAIDPay-as-you-go
NAT gateway EIPinternet_charge_typeTRAFFIC_POSTPAID_BY_HOURPay by traffic
CLB (cube-proxy)annotationTRAFFIC_POSTPAID_BY_HOURPay by traffic (only in public mode; the annotation is absent in internal mode)

Pay-as-you-go is the default because this deployment targets fast evaluation: everything can be fully released with destroy.sh (terraform destroy) when you are done, avoiding ongoing charges.

Switching to prepaid (subscription)

For confirmed long-term use you can manually edit the charge fields of the matching resources in terraform/tencentcloud/main.tf to get better pricing. For example:

hcl
# CVM → prepaid
resource "tencentcloud_instance" "compute" {
  instance_charge_type                    = "PREPAID"
  instance_charge_type_prepaid_period     = 1                       # 1 month
  instance_charge_type_prepaid_renew_flag = "NOTIFY_AND_AUTO_RENEW" # auto-renew on expiry
  # ...
}

# MySQL → prepaid
resource "tencentcloud_mysql_instance" "mysql" {
  charge_type     = "PREPAID"
  prepaid_period  = 1   # 1 month
  auto_renew_flag = 1   # auto-renew
  # ...
}

# Redis → prepaid
resource "tencentcloud_redis_instance" "redis" {
  charge_type     = "PREPAID"
  prepaid_period  = 1
  auto_renew_flag = 1
  # ...
}

Prepaid caveats

  • Prepaid resources are not auto-refunded/released by destroy.sh (terraform destroy); you must let them expire or refund them manually.
  • After switching, destroy.sh may error or leave residual resources that must be cleaned up by hand in the Tencent Cloud console.
  • Only switch when long-term use is confirmed; keep the default pay-as-you-go for evaluation/validation.

Cost Estimate

List prices vary by region, spec tier, promotions, and over time, so this guide does not embed concrete numbers. Use Tencent Cloud's official price calculators and estimate against the specs in Resources Created by the Default Configuration:

Controlling cost

All resources default to pay-as-you-go (billed by the second, settled hourly). Run destroy.sh right after evaluation to avoid idle charges; for long-term use, switch to prepaid as described in Billing Mode.

Deployment Flow (Phased, Fail-Fast)

Resources are created in this order; the run stops at the first failed stage:

Network (VPC / subnet / NAT) → (when USE_TCR=true) TCR → CVMs (jumpserver + compute) → (TCR mode) image build/push on the jumpserver → MySQL / Redis → (when USE_CFS=true) CFS shared storage → TKE cluster + Kubernetes addons → health checks → compute-node setup.

  • The Kubernetes provider is only engaged after the TKE API server exists.
  • On teardown, if CFS was created, the share is destroyed before its subnet (its NFS mount target is an ENI in that subnet).
  • Terraform state lives locally under terraform/tencentcloud/ (*.tfstate, gitignored — no remote backend). Keep that directory and the generated .env so a later destroy.sh or re-run can find and manage the same resources.
  • Resolved selections are saved to terraform/tencentcloud/.env and auto-loaded on the next run; explicit environment variables always win.

Retrying After a Partial Failure

If a stage fails part-way (e.g. an instance type / availability zone sold out in the chosen region/zone, an account quota limit, or a transient API error), you do not have to destroy everything and start over:

  • Fix the cause — most often by changing configuration: pick a different TENCENTCLOUD_AVAILABILITY_ZONE / TENCENTCLOUD_COMPUTE_INSTANCE_TYPE / TENCENTCLOUD_REGION, raise the quota, set a password, etc. — then simply re-run ./terraform/tencentcloud/create.sh.
  • On a re-run, create.sh reloads the saved selections from .env, reconciles state with what already exists in the cloud (refreshing and importing stateful resources rather than recreating them), and continues from where it left off. Existing compute nodes are kept (it never scales down).
  • Availability genuinely varies by region and availability zone — a type offered in one zone may be unavailable in another. The interactive zone / instance-type menus are queried live for your region, and the final choice is validated at apply time.
  • Only run destroy.sh when you actually want to tear the deployment down; it is not required between ordinary retries.

E2B and the cube.app Domain

Single-machine one-click install configures *.cube.app resolution via CoreDNS + split DNS on the host. The Terraform cluster deployment does not include this DNS setup by default; to create/access sandboxes through the official E2B SDK you must configure domain resolution separately (usually requires Private DNS).

Background

  • cube-proxy terminates TLS for cube.app / *.cube.app (certificates generated on the jumpserver with mkcert by create.sh, or replaced with BYO certs).
  • The E2B SDK accesses sandboxes at URLs like https://<sandbox-id>.cube.app.
  • With the default TENCENTCLOUD_ENABLE_PUBLIC_NETWORK=false, the cube-proxy CLB is a VPC-internal VIP reachable only from inside the VPC (or via jumpserver / VPN).

Getting CLB addresses

After deployment, from terraform/tencentcloud/:

bash
cd terraform/tencentcloud
terraform output tke_cube_proxy_clb_ip    # cube-proxy
terraform output tke_cube_api_clb_ip      # cube-api (E2B_API_URL)
terraform output tke_cube_webui_clb_ip    # WebUI
terraform output tke_cubemaster_clb_ip    # cube-master (VPC-internal)
terraform output jumpserver_ssh_command   # jumpserver SSH command

create.sh also prints these entrypoints when it finishes.

For clients inside the VPC or connected via jumpserver / VPN:

  1. Log in to DNSPod Private DNS and create a private zone cube.app.
  2. Add an A record: host *, value = the internal VIP from terraform output tke_cube_proxy_clb_ip; optionally add host @ pointing to the same VIP.
  3. Associate the zone with the CubeSandbox VPC (confirm the VPC ID in the VPC console).
  4. Verify from inside the VPC (jumpserver or compute node):
bash
dig +short test.cube.app
curl -k https://test.cube.app/   # self-signed cert: use -k or trust the mkcert CA

Option 2: /etc/hosts (testing only)

Manually bind a few sandbox hostnames on the machine running the E2B SDK (must be able to route to the cube-proxy CLB internal VIP). Wildcard coverage is incomplete — not suitable for concurrent multi-sandbox tests.

Option 3: Public CLB + public DNS

  1. Set TENCENTCLOUD_ENABLE_PUBLIC_NETWORK=true and redeploy (rebuilds CLBs; VIPs change).
  2. Point cube.app / *.cube.app in public DNS to the public CLB VIP.
  3. Replace certificates in terraform/tencentcloud/cubeproxy-certs/ with a public CA-signed cert (see Advanced: Bring Your Own cube-proxy TLS Certificate).
  4. Read Hardening the Public-Facing Services before enabling.

E2B SDK environment variables

bash
export E2B_API_URL=http://<cube-api-clb-vip>:3000
export E2B_API_KEY=e2b_000000

Note: sandbox traffic itself goes through *.cube.app → cube-proxy, so cube.app DNS resolution and TLS certificates are required — configuring E2B_API_URL alone is not enough. See also Connecting to an Existing Cluster.

CapabilitySingle-machine one-clickTerraform cluster
*.cube.app resolutionAutomatic (CoreDNS + split DNS)Manual Private DNS / hosts / public DNS
TLS certificatesHost mkcertJumpserver mkcert → K8s Secret mount
cube-api addresshttp://<host>:3000cube-api CLB internal / public VIP

Verifying the Deployment

After it finishes, create.sh runs health checks and prints the access entrypoints (CLB addresses for cube-api, cube-webui, etc.). You can also enter the VPC through the jumpserver to troubleshoot:

bash
# create.sh prints a command like the following
ssh -i terraform/tencentcloud/.ssh/id_rsa -p 443 -o StrictHostKeyChecking=no root@<jumpserver_public_ip>

Tearing Everything Down

bash
./terraform/tencentcloud/destroy.sh

destroy.sh also needs TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY and reuses the selections create.sh saved to terraform/tencentcloud/.env. It runs without extra prompting — running destroy.sh itself confirms the teardown.

Avoid unexpected billing

If destroy.sh cannot remove every resource (for example MySQL/Redis stuck in the recycle bin / isolated state, or leftovers Terraform can no longer see), log in to the Tencent Cloud console and delete the remaining resources by hand so you are not billed for orphans:

destroy.sh also prints these same links when a teardown step fails or a recycle-bin cleanup is not confirmed.

Advanced: Bring Your Own cube-proxy TLS Certificate

cube-proxy terminates TLS for cube.app / *.cube.app, and its bundled nginx config hard-codes the certificate paths …/certs/cube.app+3.pem and …/certs/cube.app+3-key.pem:

  • By default, create.sh generates a self-signed pair on the jumpserver with the bundled mkcert (SANs: cube.app, *.cube.app, localhost, 127.0.0.1), downloads it to terraform/tencentcloud/cubeproxy-certs/, and Terraform packs every file in that directory into the cubeproxy-certs Secret (a Secret, not a ConfigMap, because it holds the TLS private key), mounted read-only into the cube-proxy pod at /usr/local/openresty/nginx/certs/.
  • Bring your own certificate: before running create.sh, drop your PEM cert + key into terraform/tencentcloud/cubeproxy-certs/, named exactly cube.app+3.pem and cube.app+3-key.pem (the names nginx expects) and covering the cube.app and *.cube.app SANs. create.sh reuses existing files instead of generating new ones, so a CA-signed certificate is used as-is, with no self-signed warning.
  • Rotate a certificate: replace the two files and re-run create.sh; the deploy stage refreshes the cubeproxy-certs Secret and restarts cube-proxy to pick up the new material. The self-signed default trips browsers/clients with an "untrusted CA" warning, so replace it for any non-throwaway use.

Troubleshooting

For common deployment issues (Docker, KVM, DNS, quotas, etc.), see Troubleshooting — Deployment.