04 · GPUs

If we owned the silicon, what would it cost?

2 servers · 5-year horizon

Buy vs Rent · Rent at 3.0× capacity for 4.2× less $ / slot

→ Rent $474 k over 5 yr
BUY · 2× H20
HK-importable

$300 k capex each, $50 k / yr opex each, replace at year 3.

3-yr cash
$900 k
5-yr cash
$1.7 M
Concurrent slots
~100
$ / slot / yr
$3 k
capex × 2 opex × 5 yr
RENT · 2× H100
SG / TYO · BIS-blocked from HK

Nebius 1-yr commit · $122,640 / server / yr · no upfront capex.

3-yr cash
$736 k
5-yr cash
$1.23 M
Concurrent slots
~300
$ / slot / yr
$818
rent · 5 yr (vs Buy 5-yr cash)

BUY is the only HK-on-prem path (only H20 ships there). RENT routes via SG / TYO — faster per dollar, no capex, no replacement cycle. Cloud APIs run on H200 / B200 fleets that aren't legally importable at any price.

Tradeoffs · beyond the dollar

GPUs give you control. Cloud gives you speed.

Why own / rent silicon

GPU

  • Any model · day-zero

    DeepSeek V5 drops Friday — pull the weights, spin up vLLM, serve it Monday. No vendor gate, no waiting for the hyperscaler to host it.

  • Fine-tune on your data

    Continue-pretrain on the research archive. Build the CLSA-flavoured model that turns generic LLM output into something only we can produce.

  • Data sovereignty

    Nothing leaves the HK datacentre. Compliance signs off without auditing every API egress — and stays signed off if regulators tighten.

  • Predictable monthly bill

    Fixed capex + fixed opex. No surprise $80 k week if an agent loop runs hot. Finance gets a number that doesn't move.

Why ride the cloud APIs

Cloud

  • Frontier silicon · today

    Cloud APIs run on H200 / B200 fleets we cannot legally import to HK. The fastest tokens in the world arrive over HTTPS, not in a datacentre.

  • Ship this week

    No racks, no drivers, no CUDA upgrades, no on-call for the 3 a.m. OOM. A small team can ship Foundry without standing up a GPU ops function.

  • Scale to zero

    Idle nights, weekends, public holidays — bought GPUs depreciate whether they run or not. Cloud only charges for the tokens you actually serve.

  • Elastic earnings-week

    10× peak at earnings week, 1× the rest of the year. Cloud absorbs the burst without staring at a row of idle servers in the trough.

The honest version

On cloud we're at the mercy of Azure · Alibaba · Bedrock — price moves, deprecations, region outages. Owning silicon makes us our own provider for the models we serve. But the MCP gateway means switching between the two is a YAML edit — start on cloud, watch the bill, buy later if it tells you to.

Buy · 8-GPU server

May 2026 list · NVIDIA OEM

  • H20 96 GB · 148 TF $170–$196 k yes
  • H100 SXM5 80 GB · 989 TF $300–$400 k no
  • H200 SXM5 141 GB · 989 TF $308–$366 k no
  • B200 (Blackwell) 192 GB · 2,250 TF $500–$550 k no

Rent · $ / GPU · hr

Nebius leads · hyperscalers 3 – 4 ×

  • Vast.ai (spot) $1.75 $2.80 $3.50
  • RunPod Community $2.69 $3.59 $4.99
  • Nebius $2.95 $3.50 $5.50
  • Crusoe Cloud $3.90 $4.29
  • Lambda Labs $3.99 $4.49 $5.49
  • Modal $3.95 $4.54 $6.25
  • CoreWeave $6.16 $6.31 $8.60
  • AWS p5 / p5e / p6 $6.88 $4.97 $14.24
  • Azure ND v5 / v6 $12.29 $10.60
  • Alibaba Cloud
~50 concurrent · per H20 · 80 tok/s
~150 concurrent · per 8-GPU H100
H200 / B200 what cloud actually runs on

BIS export controls restrict H100 / H200 / B200 from HK + mainland China. Only H20 ships to HK; everything else routes via SG or TW.