---
title: "Should you buy GPUs for local inference while memory prices spike?"
slug: build-vs-buy-gpus-memory-spike
date: 2026-06-15
excerpt: The memory shortage made GPUs more expensive to own, but a live tracker on June 15 shows cloud rental prices barely moved. For a board weighing local inference, the real question this quarter is not local versus cloud, it is rent versus own.
featured_image: "https://bbtxujdxvidaghmhxkqs.supabase.co/storage/v1/object/public/generated-images/blog-1781509424204-build-vs-buy-gpus-memory-spike.webp"
featured_image_alt: "A boardroom whiteboard split into two columns, one showing an owned server rack of GPUs with rising memory-chip price tags, the other showing rented cloud GPU hours at a flat hourly rate."
canonical_url: https://cerevisor.com/blog/build-vs-buy-gpus-memory-spike
updated_at: 2026-06-15T07:43:45.65368+00:00
---

# Should you buy GPUs for local inference while memory prices spike?

TLDR

The memory shortage is making GPUs more expensive to own, but renting one barely moved. A live pricing tracker on June 15 put H100 cloud rentals up only about 4 percent year over year. For a board weighing local inference, the real question this quarter is not local versus cloud. It is rent versus own a suddenly-expensive, depreciating asset.

Most boards saw the scary version first. RAM prices up, GPUs getting pricier, a memory shortage the trade press expects to run into 2027. A 32GB stick of DDR5 that sold for about $100 a year ago now runs close to $375, according to Tom’s Hardware reporting from early June. The hardware press has been calling it a memory supercycle, and the numbers hold up: TrendForce data cited across that coverage has DRAM contract prices rising roughly 90 percent or more quarter over quarter in early 2026, with memory now making up as much as 80 percent of the bill of materials on a modern GPU.

That is the slide that lands on a [board agenda](/blog/three-ai-signals-next-board-agenda) with a question attached. Are we about to overpay to run our own models?

Here is the part that did not make the scary slide.

---

## Why renting a GPU barely got more expensive

I pulled a live GPU pricing tracker on June 15, and the rental side of the market has barely moved. An H100 averages $3.25 an hour across 45 providers, with the cheapest spot instance at $0.54. On-demand pricing is up only about 4 percent from a year ago. A B200, the newer and faster card, averages $4.93 an hour and is up about 7 percent year over year. The full live range runs from pennies an hour for a consumer card to $27 an hour for a GB200 rack part.

> "On-demand pricing has increased by about 4% since June 2025, from $3.48 to $3.64/hr per GPU."

GetDeploying H100 pricing tracker, June 2026

So two things are true at once, and they only sound contradictory. Owning the hardware got more expensive, because owning means buying memory into a shortage. Renting the same compute did not, because the providers already bought their fleets, they compete hard on the hourly rate, and a rented card costs them the same whether it serves zero tokens or four billion. The memory spike hits the buyer of the asset, not the renter of the hour.

+4%

H100 on-demand cloud rental change over the past year, through the memory spike (GetDeploying, June 15 2026)

> The memory spike is not an argument for or against local models. It is an argument about who should own the memory.

---

## Does the memory spike change self-hosting? Mostly no

**Does this change whether we self-host at all?** Mostly no. The thing that decides self-hosting was always volume, not the memory headline. Below roughly 20 to 50 million tokens a day, a pay-per-token API is still cheaper than standing up an inference server, because the raw GPU is only 30 to 50 percent of the real cost once an engineer’s salary and idle time get counted. Self-hosting flips to cheaper at sustained high volume. The memory spike does not move that line much. It mostly makes the “buy a pile of GPUs” version of self-hosting look worse than the “rent reserved capacity” version.

**If we go local, do we buy or rent?** This is the question worth the meeting. Renting hands the repriced, depreciating asset to someone else. A reserved B200 at $2.25 an hour locks in a rate without buying memory at the peak of a shortage. Buying only makes sense when the workload is large, steady, and long-lived enough to amortize hardware purchased into that shortage, and when the data residency or latency story genuinely requires the box to be in the building.

**What did the memory spike actually cost us?** If the plan is to rent, close to nothing this quarter. If the plan was to buy a rack, the same rack costs meaningfully more than it would have a year ago, and the payback period stretched with it. That gap is a [real number](/blog/harness-adoption-rate-stopped-being-real), and it belongs next to the API bill before anyone signs a purchase order.

The rent side, as of June 15 2026 (per GPU-hour)

GPUAverage rentChange vs June 2025

H100**$3.25**+4%
B200**$4.93**+7%
A100 (range)$0.14 to $5.04broadly flat
RTX 5090 (range)$0.10 to $2.00broadly flat

---

## Separate the self-host decision from the buy decision

The memory shortage is real, and the buy side of local inference got more expensive. The rent side barely moved. So the spike is not a reason to abandon local models, and it is not a reason to rush out and buy GPUs either. It is a reason to separate two decisions a board usually blurs together. Whether to self-host is still a volume question. Whether to own the hardware is now a worse deal than it was a year ago for everyone except the highest, steadiest workloads. The calm move for most teams is to rent reserved capacity, prove the volume, and let someone else hold the depreciating asset until the math clearly says buy.

Key Insight

Self-hosting is a volume decision. Owning the GPU is a separate decision, and the memory spike only changed the second one.

---

## Watch rental rates and your GPU utilization

Two numbers. First, whether cloud rental rates start tracking the memory spike upward. A year of roughly flat hourly pricing held through June, but reserved-capacity deals are where any squeeze shows up first. Second, utilization on any hardware already owned. A spike makes an idle GPU more embarrassing, not less. The team that wins this year is not the one with the most cards. It is the one that knows its break-even volume and rents until it gets there.

#### Sources

- [Cloud GPU Pricing: Compare 64 Providers (2026)](https://getdeploying.com/gpus) - GetDeploying, 2026-06-15

- [H100 Cloud Pricing: Compare 45+ Providers (2026)](https://getdeploying.com/gpus/nvidia-h100) - GetDeploying, 2026-06-15

- [B200 Cloud Pricing: Compare 26+ Providers (2026)](https://getdeploying.com/gpus/nvidia-b200) - GetDeploying, 2026-06-15

- [32GB of DDR5 now costs $375 minimum as AI shortage squeezes PC building](https://www.tomshardware.com/pc-components/ddr5/32gb-of-ddr5-now-costs-usd375-minimum-ai-shortage-continues-to-squeeze-pc-building) - Tom's Hardware, 2026-06-03
