04 · GPUs

If we owned the silicon, what would it cost?

2 servers · 5-year horizon

Buy vs Rent · Rent at 3.0× capacity for 4.2× less $ / slot

→ Rent $474 k over 5 yr

BUY · 2× H20

HK-importable

$300 k capex each, $50 k / yr opex each, replace at year 3.

3-yr cash: $900 k
5-yr cash: $1.7 M
Concurrent slots: ~100
$ / slot / yr: $3 k

capex × 2 opex × 5 yr

RENT · 2× H100

SG / TYO · BIS-blocked from HK

Nebius 1-yr commit · $122,640 / server / yr · no upfront capex.

3-yr cash: $736 k
5-yr cash: $1.23 M
Concurrent slots: ~300
$ / slot / yr: $818

rent · 5 yr (vs Buy 5-yr cash)

BUY is the only HK-on-prem path (only H20 ships there). RENT routes via SG / TYO — faster per dollar, no capex, no replacement cycle. Cloud APIs run on H200 / B200 fleets that aren't legally importable at any price.

Tradeoffs · beyond the dollar

GPUs give you control. Cloud gives you speed.

Why own / rent silicon

GPU

Any model · day-zero

DeepSeek V5 drops Friday — pull the weights, spin up vLLM, serve it Monday. No vendor gate, no waiting for the hyperscaler to host it.
Fine-tune on your data

Continue-pretrain on the research archive. Build the CLSA-flavoured model that turns generic LLM output into something only we can produce.
Data sovereignty

Nothing leaves the HK datacentre. Compliance signs off without auditing every API egress — and stays signed off if regulators tighten.
Predictable monthly bill

Fixed capex + fixed opex. No surprise $80 k week if an agent loop runs hot. Finance gets a number that doesn't move.

Why ride the cloud APIs

Cloud

Frontier silicon · today

Cloud APIs run on H200 / B200 fleets we cannot legally import to HK. The fastest tokens in the world arrive over HTTPS, not in a datacentre.
Ship this week

No racks, no drivers, no CUDA upgrades, no on-call for the 3 a.m. OOM. A small team can ship Foundry without standing up a GPU ops function.
Scale to zero

Idle nights, weekends, public holidays — bought GPUs depreciate whether they run or not. Cloud only charges for the tokens you actually serve.
Elastic earnings-week

10× peak at earnings week, 1× the rest of the year. Cloud absorbs the burst without staring at a row of idle servers in the trough.

The honest version

On cloud we're at the mercy of Azure · Alibaba · Bedrock — price moves, deprecations, region outages. Owning silicon makes us our own provider for the models we serve. But the MCP gateway means switching between the two is a YAML edit — start on cloud, watch the bill, buy later if it tells you to.

Buy · 8-GPU server

May 2026 list · NVIDIA OEM

H20 96 GB · 148 TF $170–$196 k yes
H100 SXM5 80 GB · 989 TF $300–$400 k no
H200 SXM5 141 GB · 989 TF $308–$366 k no
B200 (Blackwell) 192 GB · 2,250 TF $500–$550 k no

Rent · $ / GPU · hr

Nebius leads · hyperscalers 3 – 4 ×

Vast.ai (spot) $1.75 $2.80 $3.50
RunPod Community $2.69 $3.59 $4.99
Nebius $2.95 $3.50 $5.50
Crusoe Cloud $3.90 $4.29 —
Lambda Labs $3.99 $4.49 $5.49
Modal $3.95 $4.54 $6.25
CoreWeave $6.16 $6.31 $8.60
AWS p5 / p5e / p6 $6.88 $4.97 $14.24
Azure ND v5 / v6 $12.29 $10.60 —
Alibaba Cloud — — —

~50 concurrent · per H20 · 80 tok/s

~150 concurrent · per 8-GPU H100

H200 / B200 what cloud actually runs on

BIS export controls restrict H100 / H200 / B200 from HK + mainland China. Only H20 ships to HK; everything else routes via SG or TW.