Tencent Cloud Cluster Deployment (Terraform)
This guide explains how to use the Terraform deployer shipped in the release bundle to stand up a clustered Cube Sandbox on Tencent Cloud in one shot: a managed TKE control plane running cube-master / cube-api / cube-proxy / cube-webui, backed by cloud MySQL + Redis, with one or more CVM PVM compute nodes. A jumpserver (SSH on port 443) acts as the build host and the bastion for the otherwise-private VPC.
Network Hardening
Network hardening for the cluster deployment is handled by Tencent Cloud security groups: the deployer creates 4 per-role security groups (jumpserver / compute / TKE pod / CLB), each opening only the ingress that role actually needs on a least-privilege basis (compute and TKE nodes get no public ingress at all), and assigns no public IP to compute nodes. The default is internal mode (TENCENTCLOUD_ENABLE_PUBLIC_NETWORK='false'): the three user-facing services — WebUI / cube-api / cube-proxy — are fronted by VPC-internal CLBs, reachable only from inside the VPC (via the jumpserver / VPN) with no public exposure. To allow public access, explicitly set TENCENTCLOUD_ENABLE_PUBLIC_NETWORK='true' to switch them to public CLBs. To tighten further, adjust the inbound/outbound rules of each individual group (cubesandbox-sg-jumpserver / cubesandbox-sg-compute / cubesandbox-sg-tke-pod / cubesandbox-sg-clb) as needed in the Tencent Cloud security group console. When public mode is enabled, also see Hardening the Public-Facing Services.
When to use this
This deployment uses cloud resources to quickly stand up a highly-available CubeSandbox service: all cloud resources default to pay-as-you-go billing (see Billing Mode below) and can be released in one shot with destroy.sh. For long-term use, switch to prepaid (monthly/yearly subscription) resources for better cost savings (see Billing Mode). If you only need a single-machine deployment for validation, see the earlier deployment guides: PVM Deployment or Bare-Metal Deployment.
Note: The default configuration is a POC / functional validation setup (2× SA9.MEDIUM8 compute nodes, single-replica control plane, no CFS). It can host only a limited number of sandboxes. For production or load testing, adjust compute node and TKE worker specs and counts — see Node Specifications & Capacity Planning and Default Deployment Mode.
Architecture Overview
Internet
│
┌────────────┴────────────────────┐
│ │
┌────────┴────────┐ ┌──────────┴─────────┐
│ Jumpserver CVM │ │ CLB (internal/ │
│ (public IP) │ │ public) │
│ SSH:443 │ │ cube-api :3000 │
│ build & push │ │ cube-proxy :80/443 │
└────────┬────────┘ │ cube-webui :80 │
│ └──────────┬─────────┘
│ │
┌────────────┼─────────────────────────────────┼──────────┐
│ │ VPC internal │ │
│ ┌─────────┴────────┐ ┌───────────────┴─────┐ │
│ │ Compute CVM ×N │ │ TKE managed cluster │ │
│ │ Cubelet │ │ cube-master (×1) │ │
│ │ network-agent │ │ cube-api (×1) │ │
│ │ CubeEgress │ │ cube-proxy (×1) │ │
│ └──────────────────┘ │ cube-webui (×1) │ │
│ └───┬─────────────┬───┘ │
│ │ │ │
│ ┌────────────┴───┐ (opt.) │ │
│ │ CFS (NFS) │ │ │
│ │ shared store │ │ │
│ └────────────────┘ │ │
│ │ │
│ ┌─────────────────────┴───┐ │
│ │ Cloud DB: MySQL + Redis │ │
│ └─────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ TCR (opt.) │ │ NAT + EIP │→ public egress │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘| Component | Form | Notes |
|---|---|---|
| Jumpserver | CVM (public IP, SSH 443) | Build host (TCR mode) and bastion into the private VPC |
| Load balancer | CLB (internal/public) | Fronts cube-api / cube-proxy / cube-webui; internal mode by default, user traffic entry point |
| Control plane | Managed TKE cluster | Runs cube-master / cube-api / cube-proxy / cube-webui |
| Compute node | CVM PVM | Runs Cubelet / network-agent / CubeEgress; actually hosts sandboxes |
| Database | Cloud MySQL 8.0 + Redis 7.0 | VPC-internal only, no public access |
| Shared storage | CFS (General Standard NFS, optional) | When USE_CFS=true and cubemaster has multiple replicas, ReadWriteMany share for /data/CubeMaster/storage |
| Registry | TCR (basic, optional) | When USE_TCR=true; the jumpserver builds and pushes component images |
| Egress | NAT gateway + EIP | The whole VPC reaches the internet through NAT |
TKE workers and PVM compute nodes are separate resources
TENCENTCLOUD_TKE_NODE_COUNT: number of TKE workers running control-plane Pods (cubemaster / cube-api / etc.).TENCENTCLOUD_COMPUTE_NODE_COUNT: number of PVM compute nodes running Cubelet; sandboxes execute here. Both default to2but serve completely different roles.
Default Deployment Mode
Matching env.example / variables.tf, the default is public images + single-replica control plane + no CFS — a POC configuration:
| Setting | Default | Notes |
|---|---|---|
TENCENTCLOUD_USE_TCR | false | No TCR; uses public pre-built images, no build on the jumpserver |
TENCENTCLOUD_USE_CFS | false | No CFS; cubemaster uses Pod-local storage |
TENCENTCLOUD_CUBEMASTER_REPLICAS etc. | 1 | Control-plane components default to single replica |
TENCENTCLOUD_COMPUTE_NODE_COUNT | 2 | PVM compute nodes |
TENCENTCLOUD_TKE_NODE_COUNT | 2 | TKE workers (worker_config.count) |
TENCENTCLOUD_ENABLE_PUBLIC_NETWORK | false | cube-api / cube-proxy / cube-webui use VPC-internal CLBs |
Advanced modes:
TENCENTCLOUD_USE_TCR=true: create TCR and build/push the four component images on the jumpserver.TENCENTCLOUD_USE_CFS=truewithTENCENTCLOUD_CUBEMASTER_REPLICAS>1: create CFS for cubemaster multi-replica shared storage.
cube-proxy defaults to single replica (TENCENTCLOUD_CUBE_PROXY_REPLICAS=1). Auto-pause / auto-resume is reliable only with one replica (current sidecar model; after PR #705 merges this becomes a standalone cube-lifecycle-manager). For multiple replicas the front-end LB must hash on SandboxID (session affinity).
Resources Created by the Default Configuration
The table below lists cloud resources created under the default configuration (region ap-guangzhou, zone ap-guangzhou-6, 2 compute nodes, 2 TKE workers, no CFS/TCR). All resources are pay-as-you-go (see Billing Mode).
| Resource | Count | Spec / Configuration |
|---|---|---|
| VPC | 1 | CIDR 10.0.0.0/16 |
| Subnet | 1 | 10.0.1.0/24 (primary zone; extra /24 subnets only when a role lands in another zone) |
| NAT gateway + EIP | 1 + 1 | 200 Mbps bandwidth, pay-by-traffic |
| Route table entry | 1 | 0.0.0.0/0 → NAT gateway |
| Security group | 4 | Per-role (jumpserver / compute / TKE pod / CLB), least privilege, see below |
| SSH key pair | 1 | Auto-generated under terraform/tencentcloud/.ssh/ |
| Jumpserver CVM | 1 | SA9.MEDIUM4 (2C4G), 50GB general-purpose SSD system disk, 200 Mbps public bandwidth, SSH on port 443 |
| Compute CVM | 2 | SA9.MEDIUM8 (4C8G), 50GB system disk + 200GB CBS data disk (XFS, /data/cubelet), no public IP |
| TKE managed cluster | 1 | Managed L5, Kubernetes 1.34.1, Pod CIDR 10.200.0.0/16, Service CIDR 192.168.0.0/20, VPC-internal API only |
| TKE worker nodes | 2 | SA9.LARGE8 (4C8G), created via worker_config.count (no separate node pool) |
| Cloud MySQL | 1 | 8.0 InnoDB universal, 4GB memory / 200GB storage, multi-AZ (when the region has ≥2 zones) / semi-sync, intranet 3306 only |
| Cloud Redis | 1 | 7.0 standard architecture (master/replica), 1GB memory, port 6379, intranet only |
| CFS file system | 0 (default) | 1 General Standard NFS when USE_CFS=true |
| TCR registry | 0 (default) | Basic + namespace + VPC attachment when USE_TCR=true |
| OS image | — | OpenCloudOS Server 9 (public image, reused by all CVMs) |
OS image
All CVMs (jumpserver / compute / TKE workers) default to the OpenCloudOS Server 9 public image; override with TENCENTCLOUD_IMAGE_NAME.
Security Group Ingress Ports
The deployer creates 4 per-role security groups on a least-privilege basis; each opens only the ingress that role actually needs, so compromising one role never inherits the inbound surface of the others.
1. cubesandbox-sg-jumpserver (jumpserver)
| Port / Range | Source | Purpose |
|---|---|---|
| TCP 443 | 0.0.0.0/0 | Jumpserver SSH (cloud-init moves sshd to 443) |
| ALL | 10.0.0.0/16 | VPC-internal traffic |
2. cubesandbox-sg-compute (compute nodes) — no public ingress
| Port / Range | Source | Purpose |
|---|---|---|
| ALL | TKE Pod CIDR | cube-proxy (pod) → all ports on compute nodes (sandbox dynamic ports 20000-29999) |
| ALL | 10.0.0.0/16 | VPC-internal traffic (jumpserver management, cube-master scheduling) |
3. cubesandbox-sg-tke-pod (TKE workers) — no public ingress
| Port / Range | Source | Purpose |
|---|---|---|
| ALL | TKE Pod CIDR | Pod-to-pod communication |
| ALL | 10.0.0.0/16 | VPC-internal (CLB health checks, jumpserver management, CFS NFS) |
4. cubesandbox-sg-clb (load balancers)
The source for the 80 / 443 / 3000 entrypoints below depends on TENCENTCLOUD_ENABLE_PUBLIC_NETWORK: in the default internal mode (false) the source is the VPC CIDR 10.0.0.0/16 (VPC-internal only); in public mode (true) the source is 0.0.0.0/0 (open to the internet).
| Port / Range | Source (internal / public mode) | Purpose |
|---|---|---|
| TCP 80 | 10.0.0.0/16 / 0.0.0.0/0 | CLB for cube-proxy + cube-webui (HTTP) |
| TCP 443 | 10.0.0.0/16 / 0.0.0.0/0 | CLB for cube-proxy (HTTPS) |
| TCP 3000 | 10.0.0.0/16 / 0.0.0.0/0 | CLB for cube-api |
| TCP 8089 | 10.0.0.0/16 (always VPC-internal only) | Internal CLB for cube-master (unaffected by the flag) |
Egress allows all (0.0.0.0/0 ALL) by default on every group. Databases, the TKE API server, etc. are not exposed publicly — VPC-internal access only.
Hardening the Public-Facing Services
This section applies only when public mode is enabled (
TENCENTCLOUD_ENABLE_PUBLIC_NETWORK='true'). In the default internal mode none of the three services are exposed publicly, so you can skip it.
With public mode enabled, cubesandbox-sg-clb opens three public entrypoints to 0.0.0.0/0: WebUI (80), cube-proxy (80 / 443), and cube-api (3000). Their security models differ, so harden each one separately:
- WebUI (CLB 80): the WebUI console currently ships without any authentication or access control — anyone who can reach it can operate sandboxes. Strongly consider creating a dedicated security group for the WebUI CLB and configuring a strict source-IP allowlist (only your office / management egress IPs) instead of reusing the publicly-open
cubesandbox-sg-clb. Create the group in the Tencent Cloud security group console and bind it to the WebUI CLB instance. - cube-api (CLB 3000): cube-api lets every request through without credential checks by default. Before exposing it publicly, enable Auth Callback authentication to delegate authorization decisions to your own auth service — see Authentication.
- cube-proxy (CLB 80 / 443): cube-proxy is the public ingress for sandbox traffic and is public-facing by design. To restrict public access to sandboxes, see Restrict Public Access to enable mechanisms such as per-sandbox inbound tokens.
Prerequisites
The machine running create.sh only needs the following — no pre-installed Terraform required:
- Tencent Cloud API credentials: create an API key pair in the CAM console and export
TENCENTCLOUD_SECRET_ID/TENCENTCLOUD_SECRET_KEY. - Local tools:
ssh,scp,nc, plus network access to the Tencent Cloud APIs.terraformandjqare auto-installed when missing — into/usr/local/binwhen writable (e.g. running as root), otherwise into a local.bin/.terraformis fetched from the HashiCorp release site (needscurl/wget+unzip);jqcomes from the system package manager or a static binary from GitHub. mkcert/opensslare not needed locally — the cube-proxy certificate is produced on the jumpserver.- Build / deploy host: Linux recommended;
create.sh/destroy.shalso support macOS (including Bash 3.2). Use WSL2 on Windows.
Pre-deployment setup (service activation & authorization)
Complete these before the first create.sh apply to avoid half-created resources mid-run.
1. Account basics
- Register a Tencent Cloud account and complete real-name verification.
- For sub-accounts, grant permissions to create VPC / CVM / TKE / CDB / Redis / CLB resources. See Authorize with TKE preset policies to attach policies such as
QcloudTKEFullAccess.
2. TKE service role authorization (required)
Before first use of TKE you must grant service roles, otherwise cluster creation, CLB-type Services, etc. will fail.
- Log in to the TKE console and complete service authorization when prompted.
- Confirm roles such as
TKE_QCSRole/IPAMDofTKE_QCSRoleare authorized.
Official docs: Service authorization role permissions, TKE quick start.
3. Private DNS (as needed)
| Scenario | Required? |
|---|---|
Default USE_TCR=false | Usually not |
USE_TCR=true | Recommended |
E2B SDK access to *.cube.app | Yes (see E2B and the cube.app domain) |
Without activation the API may return ResourceNotFound.ServiceNotSubscribed (Private DNS service not subscribed).
Activation: DNSPod Private DNS → accept agreement → activate. Docs: Private DNS product overview, Activate Private DNS.
4. Cloud File Storage CFS (as needed)
Only when TENCENTCLOUD_USE_CFS=true and cubemaster runs multiple replicas.
- Log in to the CFS console and activate CFS when prompted.
- Terraform creates the file system during apply (no manual creation needed).
Docs: CFS quick start. If the account is not activated the console may show no instances; prefer running destroy.sh's CFS stage on teardown.
5. Other service preflight
| Service | Console |
|---|---|
| VPC / NAT | VPC |
| CVM | CVM |
| TKE | TKE |
| MySQL / Redis | MySQL / Redis |
| CLB | CLB |
TCR (USE_TCR=true) | TCR |
Quick Start
The deployer is surfaced at the top level of the extracted bundle, so you can run it directly after extracting:
tar -xzf cube-sandbox-one-click-<version>.tar.gz
cd cube-sandbox-one-click-<version>
# Copy the environment template, fill in your credentials (TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY) and edit the rest as needed
cp terraform/tencentcloud/env.example terraform/tencentcloud/.env
# $EDITOR terraform/tencentcloud/.env
./terraform/tencentcloud/create.shcreate.sh auto-loads the .env next to it (filling in only variables not already set in the current shell), so credentials can either go straight into .env or be injected via export — the latter takes precedence and overrides the matching value in .env:
export TENCENTCLOUD_SECRET_ID="your-secret-id"
export TENCENTCLOUD_SECRET_KEY="your-secret-key"See Configuration below for the meaning and defaults of each .env option; anything left unset falls back to the defaults.
create.sh runs entirely from the extracted bundle and automatically:
- Auto-detects the local bundle (the outer
cube-sandbox-one-click-<version>.tar.gz, or re-packs the extracted directory if the tarball is gone) and uses it as the offline source for component images and compute-node installation. When detected — or set viaTENCENTCLOUD_LOCAL_BUNDLE=/path/to.tar.gz— no public download is required; otherwise the jumpserver falls back to an online install (needs public network). - Generates an SSH key pair under
terraform/tencentcloud/.ssh/if none exists. - Generates the cube-proxy TLS certificate (
cube.app/*.cube.app) on the jumpserver with the bundledmkcert, downloading it toterraform/tencentcloud/cubeproxy-certs/for the Secret mount. - Default mode (
USE_TCR=false): pull public pre-built images and deploy TKE addons and CVM compute nodes (2 by default). - TCR mode (
USE_TCR=true): create TCR, build and push the four component images on the jumpserver, then deploy TKE and compute nodes.
Configuration
Common environment variables (matching the create.sh / variables.tf defaults) are listed in terraform/tencentcloud/env.example, which you can copy to .env and fill in:
export TENCENTCLOUD_REGION=ap-guangzhou
export TENCENTCLOUD_AVAILABILITY_ZONE=ap-guangzhou-6
export TENCENTCLOUD_COMPUTE_NODE_COUNT=2 # CVM PVM compute nodes
export TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE=200 # CBS data disk per compute node (GB)
export TENCENTCLOUD_TKE_NODE_COUNT=2 # TKE worker nodes (control-plane Pods)
export TENCENTCLOUD_COMPUTE_INSTANCE_TYPE=SA9.MEDIUM8
export TENCENTCLOUD_USE_TCR=false # default: public pre-built images
export TENCENTCLOUD_USE_CFS=false # default: no CFS
export TENCENTCLOUD_CUBE_IMAGE_TAG=cubesandbox002-20260630Common Variables
When running
create.shinteractively, the "Deployment configuration" step walks you through most of the settings below (region / VPC / instance types / passwords / image tag, etc.), including a[y/N]prompt for whether to enable public-network mode (default N = internal). Set any of these variables beforehand to skip the matching prompt; a non-interactive run (no TTY) silently uses the defaults.
| Variable | Default | Description |
|---|---|---|
TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY | none | Required. Tencent Cloud API credentials |
TENCENTCLOUD_REGION | ap-guangzhou | Region |
TENCENTCLOUD_AVAILABILITY_ZONE | ap-guangzhou-6 | Primary zone (subnet / MySQL / Redis / TKE control plane) |
TENCENTCLOUD_JUMPSERVER_INSTANCE_TYPE | SA9.MEDIUM4 | Jumpserver instance type |
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE | SA9.MEDIUM8 | Preferred compute-node instance type |
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPE | SA9.LARGE8 | TKE worker instance type (4C8G) |
TENCENTCLOUD_COMPUTE_NODE_COUNT | 2 | PVM compute nodes (run Cubelet / sandboxes) |
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE | 200 | CBS data disk size per compute node (GB, XFS, mounted at /data/cubelet). Sandbox image templates, snapshots and runtime data on the compute node all live under this directory — size it to your actual needs |
TENCENTCLOUD_TKE_NODE_COUNT | 2 | TKE workers (run control-plane Pods; maps to worker_config.count) |
TENCENTCLOUD_USE_TCR | false | When true, create TCR and build/push images on the jumpserver; when false, use public pre-built images |
TENCENTCLOUD_USE_CFS | false | When true and cubemaster has multiple replicas, create CFS NFS shared storage |
TENCENTCLOUD_TKE_CLUSTER_VERSION | 1.34.1 | TKE Kubernetes version |
TENCENTCLOUD_MYSQL_PASSWORD | insecure demo value | MySQL root password (change for real use) |
TENCENTCLOUD_REDIS_PASSWORD | insecure demo value | Redis password (change for real use) |
TENCENTCLOUD_CUBE_DB / TENCENTCLOUD_CUBE_USER / TENCENTCLOUD_CUBE_PASSWORD | cube_mvp / cube / demo | Application DB name / account / password |
TENCENTCLOUD_CUBEMASTER_REPLICAS | 1 | cube-master replica count |
TENCENTCLOUD_CUBE_API_REPLICAS | 1 | cube-api replica count |
TENCENTCLOUD_CUBE_PROXY_REPLICAS | 1 | cube-proxy replica count. Default 1: auto-pause/auto-resume only works in single-replica mode. Going >1 requires the LB to hash on SandboxID |
TENCENTCLOUD_CUBE_WEBUI_REPLICAS | 1 | cube-webui replica count |
TENCENTCLOUD_ENABLE_PUBLIC_NETWORK | false | Network exposure mode for cube-api / cube-proxy / cube-webui. Default false: VPC-internal CLBs, reachable only from inside the VPC (via jumpserver / VPN). Set to true for public CLBs reachable from the internet, with the security group opening 0.0.0.0/0 accordingly. cube-master always stays VPC-internal. Read Hardening the Public-Facing Services before enabling |
Non-interactive / CI runs
Without a TTY the interactive menus fall back to defaults, so set them explicitly to stay in control. The password variables are the exception: a non-interactive run refuses to start with the built-in, publicly-known demo passwords and requires them to be set — or set TENCENTCLOUD_ALLOW_INSECURE_DEFAULTS=1 to opt into the insecure defaults for a throwaway sandbox.
export TENCENTCLOUD_AVAILABILITY_ZONE=ap-guangzhou-6
export TENCENTCLOUD_COMPUTE_INSTANCE_TYPE=SA9.MEDIUM8
export TENCENTCLOUD_LOCAL_BUNDLE=/path/to/cube-sandbox-one-click-<version>.tar.gz # auto-detected inside an extracted bundle
export TENCENTCLOUD_PVM_KERNEL_VMLINUX=/path/to/vmlinux-pvm # only if the bundle ships no vmlinux-pvm
export TENCENTCLOUD_MYSQL_PASSWORD=... # required for non-interactive runs (no insecure fallback)
export TENCENTCLOUD_REDIS_PASSWORD=... # required for non-interactive runs
export TENCENTCLOUD_CUBE_PASSWORD=... # required for non-interactive runs
export TENCENTCLOUD_BUILD_IMAGES=0 # TCR mode: reuse already-pushed imagesMore advanced toggles (TENCENTCLOUD_VERBOSE, TENCENTCLOUD_REINSTALL, TENCENTCLOUD_RESET_DB, SSH port/key paths, etc.) are documented in the create.sh header comments.
Node Specifications & Capacity Planning
Default Configuration
The default deployment provisions 2× 4C8G compute nodes (SA9.MEDIUM8, 200GB data disk each) and 2 TKE workers — suitable for POC / functional validation with limited sandbox capacity. Default specs per role:
| Role | Default Type | Specs | Count |
|---|---|---|---|
| Compute node (PVM) | SA9.MEDIUM8 | 4C8G + 200GB data disk | 2 |
| TKE Worker | SA9.LARGE8 | 4C8G | 2 (worker_config.count) |
| Jumpserver | SA9.MEDIUM4 | 2C4G | 1 |
| Control-plane Pods | — | cubemaster / cube-api / cube-webui ×1 each | on TKE workers |
WARNING
The default configuration is only suitable for functional validation and small-scale evaluation. For large-scale production usage, you must adjust the compute node and TKE node specs and counts, otherwise you will encounter CPU/memory exhaustion, Pod scheduling failures, and sandbox creation timeouts.
Adjusting Compute Nodes
Compute nodes are the machines that actually run sandbox containers. Use the following environment variables to adjust their specs, count, and disk size:
| Environment Variable | Description |
|---|---|
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE | Compute node instance type |
TENCENTCLOUD_COMPUTE_NODE_COUNT | Number of compute nodes |
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE | Data disk size (GB) |
Example: scale up specs and count (.env configuration):
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE='SA9.4XLARGE32'
TENCENTCLOUD_COMPUTE_NODE_COUNT='4'
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE='500'Example: use higher specs (.env configuration):
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE='SA9.16XLARGE128'
TENCENTCLOUD_COMPUTE_NODE_COUNT='8'
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE='1000'Bare Metal Servers
For stronger compute isolation and performance (avoiding virtualization overhead), you can use Tencent Cloud Bare Metal Servers as compute nodes. Simply set TENCENTCLOUD_COMPUTE_INSTANCE_TYPE to a bare metal instance type (e.g. BMS5.12XLARGE192) — the rest of the deployment process remains the same.
Example: heterogeneous mix (using Terraform variables directly, per-node instance types):
export TF_VAR_compute_node_count=5
export TF_VAR_compute_instance_types='["SA9.8XLARGE64","SA9.8XLARGE64","SA9.4XLARGE32","SA9.4XLARGE32","SA9.4XLARGE32"]'
export TF_VAR_compute_availability_zones='["ap-guangzhou-6","ap-guangzhou-7","ap-guangzhou-6","ap-guangzhou-7","ap-guangzhou-3"]'
export TF_VAR_compute_data_disk_size=1000Adjusting TKE Worker Nodes
TKE Workers run cube-master, cube-api, cube-proxy, cube-webui and other control-plane Pods. At larger scale, control-plane request volume and scheduling pressure increase significantly, requiring more resources.
| Environment Variable | Description |
|---|---|
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPE | TKE Worker node instance type |
TENCENTCLOUD_TKE_NODE_COUNT | Number of TKE Worker nodes |
Example: upgrade TKE Worker specs:
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPE='SA9.2XLARGE16'
TENCENTCLOUD_TKE_NODE_COUNT='4'Also consider increasing control-plane replica counts for higher throughput and availability (multi-replica cubemaster requires TENCENTCLOUD_USE_CFS=true):
TENCENTCLOUD_USE_CFS=true
TENCENTCLOUD_CUBEMASTER_REPLICAS='3'
TENCENTCLOUD_CUBE_API_REPLICAS='3'
TENCENTCLOUD_CUBE_WEBUI_REPLICAS='2'Full Example: Production Configuration
# --- Compute nodes
TENCENTCLOUD_COMPUTE_INSTANCE_TYPE='SA9.8XLARGE64'
TENCENTCLOUD_COMPUTE_NODE_COUNT='6'
TENCENTCLOUD_COMPUTE_DATA_DISK_SIZE='1000'
# --- TKE control plane
TENCENTCLOUD_TKE_WORKER_INSTANCE_TYPE='SA9.2XLARGE16'
TENCENTCLOUD_TKE_NODE_COUNT='4'
# --- Control-plane replicas (requires CFS)
TENCENTCLOUD_USE_CFS=true
TENCENTCLOUD_CUBEMASTER_REPLICAS='3'
TENCENTCLOUD_CUBE_API_REPLICAS='3'
TENCENTCLOUD_CUBE_WEBUI_REPLICAS='2'
# --- Jumpserver can stay at default (build-only, no runtime load)
TENCENTCLOUD_JUMPSERVER_INSTANCE_TYPE='SA9.MEDIUM4'Billing Mode
Currently pay-as-you-go (POSTPAID) only
All cloud resources created by this deployer are hard-coded to pay-as-you-go billing. There is currently no environment or Terraform variable to switch to prepaid (monthly/yearly subscription, PREPAID).
The charge type of each resource is fixed as follows:
| Resource | Charge field | Value | Meaning |
|---|---|---|---|
| Jumpserver CVM | instance_charge_type | POSTPAID_BY_HOUR | Pay by hour |
| Compute CVM | instance_charge_type | POSTPAID_BY_HOUR | Pay by hour |
| Cloud MySQL | charge_type | POSTPAID | Pay-as-you-go |
| Cloud Redis | charge_type | POSTPAID | Pay-as-you-go |
| NAT gateway EIP | internet_charge_type | TRAFFIC_POSTPAID_BY_HOUR | Pay by traffic |
| CLB (cube-proxy) | annotation | TRAFFIC_POSTPAID_BY_HOUR | Pay by traffic (only in public mode; the annotation is absent in internal mode) |
Pay-as-you-go is the default because this deployment targets fast evaluation: everything can be fully released with destroy.sh (terraform destroy) when you are done, avoiding ongoing charges.
Switching to prepaid (subscription)
For confirmed long-term use you can manually edit the charge fields of the matching resources in terraform/tencentcloud/main.tf to get better pricing. For example:
# CVM → prepaid
resource "tencentcloud_instance" "compute" {
instance_charge_type = "PREPAID"
instance_charge_type_prepaid_period = 1 # 1 month
instance_charge_type_prepaid_renew_flag = "NOTIFY_AND_AUTO_RENEW" # auto-renew on expiry
# ...
}
# MySQL → prepaid
resource "tencentcloud_mysql_instance" "mysql" {
charge_type = "PREPAID"
prepaid_period = 1 # 1 month
auto_renew_flag = 1 # auto-renew
# ...
}
# Redis → prepaid
resource "tencentcloud_redis_instance" "redis" {
charge_type = "PREPAID"
prepaid_period = 1
auto_renew_flag = 1
# ...
}Prepaid caveats
- Prepaid resources are not auto-refunded/released by
destroy.sh(terraform destroy); you must let them expire or refund them manually. - After switching,
destroy.shmay error or leave residual resources that must be cleaned up by hand in the Tencent Cloud console. - Only switch when long-term use is confirmed; keep the default pay-as-you-go for evaluation/validation.
Cost Estimate
List prices vary by region, spec tier, promotions, and over time, so this guide does not embed concrete numbers. Use Tencent Cloud's official price calculators and estimate against the specs in Resources Created by the Default Configuration:
- Tencent Cloud price calculator (main entry)
- CVM pricing — jumpserver, compute node, TKE workers
- TKE pricing overview — managed cluster management fee
- Cloud MySQL pricing / Cloud Redis pricing
- NAT gateway / CLB / EIP pricing
- CFS pricing / TCR pricing
Controlling cost
All resources default to pay-as-you-go (billed by the second, settled hourly). Run destroy.sh right after evaluation to avoid idle charges; for long-term use, switch to prepaid as described in Billing Mode.
Deployment Flow (Phased, Fail-Fast)
Resources are created in this order; the run stops at the first failed stage:
Network (VPC / subnet / NAT) → (when
USE_TCR=true) TCR → CVMs (jumpserver + compute) → (TCR mode) image build/push on the jumpserver → MySQL / Redis → (whenUSE_CFS=true) CFS shared storage → TKE cluster + Kubernetes addons → health checks → compute-node setup.
- The Kubernetes provider is only engaged after the TKE API server exists.
- On teardown, if CFS was created, the share is destroyed before its subnet (its NFS mount target is an ENI in that subnet).
- Terraform state lives locally under
terraform/tencentcloud/(*.tfstate, gitignored — no remote backend). Keep that directory and the generated.envso a laterdestroy.shor re-run can find and manage the same resources. - Resolved selections are saved to
terraform/tencentcloud/.envand auto-loaded on the next run; explicit environment variables always win.
Retrying After a Partial Failure
If a stage fails part-way (e.g. an instance type / availability zone sold out in the chosen region/zone, an account quota limit, or a transient API error), you do not have to destroy everything and start over:
- Fix the cause — most often by changing configuration: pick a different
TENCENTCLOUD_AVAILABILITY_ZONE/TENCENTCLOUD_COMPUTE_INSTANCE_TYPE/TENCENTCLOUD_REGION, raise the quota, set a password, etc. — then simply re-run./terraform/tencentcloud/create.sh. - On a re-run,
create.shreloads the saved selections from.env, reconciles state with what already exists in the cloud (refreshing and importing stateful resources rather than recreating them), and continues from where it left off. Existing compute nodes are kept (it never scales down). - Availability genuinely varies by region and availability zone — a type offered in one zone may be unavailable in another. The interactive zone / instance-type menus are queried live for your region, and the final choice is validated at apply time.
- Only run
destroy.shwhen you actually want to tear the deployment down; it is not required between ordinary retries.
E2B and the cube.app Domain
Single-machine one-click install configures *.cube.app resolution via CoreDNS + split DNS on the host. The Terraform cluster deployment does not include this DNS setup by default; to create/access sandboxes through the official E2B SDK you must configure domain resolution separately (usually requires Private DNS).
Background
- cube-proxy terminates TLS for
cube.app/*.cube.app(certificates generated on the jumpserver with mkcert bycreate.sh, or replaced with BYO certs). - The E2B SDK accesses sandboxes at URLs like
https://<sandbox-id>.cube.app. - With the default
TENCENTCLOUD_ENABLE_PUBLIC_NETWORK=false, the cube-proxy CLB is a VPC-internal VIP reachable only from inside the VPC (or via jumpserver / VPN).
Getting CLB addresses
After deployment, from terraform/tencentcloud/:
cd terraform/tencentcloud
terraform output tke_cube_proxy_clb_ip # cube-proxy
terraform output tke_cube_api_clb_ip # cube-api (E2B_API_URL)
terraform output tke_cube_webui_clb_ip # WebUI
terraform output tke_cubemaster_clb_ip # cube-master (VPC-internal)
terraform output jumpserver_ssh_command # jumpserver SSH commandcreate.sh also prints these entrypoints when it finishes.
Option 1: Private DNS (recommended, VPC-internal access)
For clients inside the VPC or connected via jumpserver / VPN:
- Log in to DNSPod Private DNS and create a private zone
cube.app. - Add an A record: host
*, value = the internal VIP fromterraform output tke_cube_proxy_clb_ip; optionally add host@pointing to the same VIP. - Associate the zone with the CubeSandbox VPC (confirm the VPC ID in the VPC console).
- Verify from inside the VPC (jumpserver or compute node):
dig +short test.cube.app
curl -k https://test.cube.app/ # self-signed cert: use -k or trust the mkcert CAOption 2: /etc/hosts (testing only)
Manually bind a few sandbox hostnames on the machine running the E2B SDK (must be able to route to the cube-proxy CLB internal VIP). Wildcard coverage is incomplete — not suitable for concurrent multi-sandbox tests.
Option 3: Public CLB + public DNS
- Set
TENCENTCLOUD_ENABLE_PUBLIC_NETWORK=trueand redeploy (rebuilds CLBs; VIPs change). - Point
cube.app/*.cube.appin public DNS to the public CLB VIP. - Replace certificates in
terraform/tencentcloud/cubeproxy-certs/with a public CA-signed cert (see Advanced: Bring Your Own cube-proxy TLS Certificate). - Read Hardening the Public-Facing Services before enabling.
E2B SDK environment variables
export E2B_API_URL=http://<cube-api-clb-vip>:3000
export E2B_API_KEY=e2b_000000Note: sandbox traffic itself goes through *.cube.app → cube-proxy, so cube.app DNS resolution and TLS certificates are required — configuring E2B_API_URL alone is not enough. See also Connecting to an Existing Cluster.
| Capability | Single-machine one-click | Terraform cluster |
|---|---|---|
*.cube.app resolution | Automatic (CoreDNS + split DNS) | Manual Private DNS / hosts / public DNS |
| TLS certificates | Host mkcert | Jumpserver mkcert → K8s Secret mount |
| cube-api address | http://<host>:3000 | cube-api CLB internal / public VIP |
Verifying the Deployment
After it finishes, create.sh runs health checks and prints the access entrypoints (CLB addresses for cube-api, cube-webui, etc.). You can also enter the VPC through the jumpserver to troubleshoot:
# create.sh prints a command like the following
ssh -i terraform/tencentcloud/.ssh/id_rsa -p 443 -o StrictHostKeyChecking=no root@<jumpserver_public_ip>Tearing Everything Down
./terraform/tencentcloud/destroy.shdestroy.sh also needs TENCENTCLOUD_SECRET_ID / TENCENTCLOUD_SECRET_KEY and reuses the selections create.sh saved to terraform/tencentcloud/.env. It runs without extra prompting — running destroy.sh itself confirms the teardown.
Avoid unexpected billing
If destroy.sh cannot remove every resource (for example MySQL/Redis stuck in the recycle bin / isolated state, or leftovers Terraform can no longer see), log in to the Tencent Cloud console and delete the remaining resources by hand so you are not billed for orphans:
- VPC / network
- MySQL recycle bin
- Redis recycle bin
- CFS file systems (if
USE_CFS=truewas enabled)
destroy.sh also prints these same links when a teardown step fails or a recycle-bin cleanup is not confirmed.
Advanced: Bring Your Own cube-proxy TLS Certificate
cube-proxy terminates TLS for cube.app / *.cube.app, and its bundled nginx config hard-codes the certificate paths …/certs/cube.app+3.pem and …/certs/cube.app+3-key.pem:
- By default,
create.shgenerates a self-signed pair on the jumpserver with the bundledmkcert(SANs:cube.app,*.cube.app,localhost,127.0.0.1), downloads it toterraform/tencentcloud/cubeproxy-certs/, and Terraform packs every file in that directory into thecubeproxy-certsSecret (a Secret, not a ConfigMap, because it holds the TLS private key), mounted read-only into the cube-proxy pod at/usr/local/openresty/nginx/certs/. - Bring your own certificate: before running
create.sh, drop your PEM cert + key intoterraform/tencentcloud/cubeproxy-certs/, named exactlycube.app+3.pemandcube.app+3-key.pem(the names nginx expects) and covering thecube.appand*.cube.appSANs.create.shreuses existing files instead of generating new ones, so a CA-signed certificate is used as-is, with no self-signed warning. - Rotate a certificate: replace the two files and re-run
create.sh; the deploy stage refreshes thecubeproxy-certsSecret and restarts cube-proxy to pick up the new material. The self-signed default trips browsers/clients with an "untrusted CA" warning, so replace it for any non-throwaway use.
Troubleshooting
For common deployment issues (Docker, KVM, DNS, quotas, etc.), see Troubleshooting — Deployment.