What is OpenAI's Jalapeno chip?

Jalapeno is OpenAI's first custom-designed AI chip, built with Broadcom for silicon implementation and Celestica for rack integration. It's an ASIC purpose-built for LLM inference — serving trained models to users — rather than a general-purpose GPU.

How much cheaper is Jalapeno than existing AI chips?

OpenAI says early testing shows performance-per-watt substantially better than current state-of-the-art hardware, and Bloomberg reports the chip could cut inference costs by roughly 50%. A detailed, independently verifiable technical report is expected in the coming months.

When will Jalapeno be deployed?

First deployment is planned at gigawatt scale before the end of 2026, with Microsoft confirmed as the primary partner. Reports indicate Broadcom required Microsoft to commit to purchasing 40% of the first production run.

Can developers buy or rent Jalapeno chips?

No. Jalapeno is built into finished server racks as part of OpenAI's own infrastructure, similar to how Google uses TPUs internally. It's not a chip developers can buy or rent directly.

OpenAI Built Its Own AI Chip: Meet Jalapeno

OpenAI just became a chip company. It unveiled Jalapeno, its first custom silicon, built with Broadcom specifically to run inference — the part of the AI pipeline that actually serves your ChatGPT requests and API calls.

If it works, it's a real shot at cheaper API pricing down the line. Here's what's actually in it.

What Jalapeno Is

It's an ASIC — application-specific silicon, not a general-purpose GPU. OpenAI stripped out training logic entirely and built the chip around one job: serving already-trained models to users. That's the tradeoff every ASIC makes — give up flexibility, gain efficiency on the workload you actually run most.

Broadcom handled silicon implementation and networking (its Tomahawk chips), Celestica built the boards and racks, and OpenAI designed the architecture around its own model roadmap, kernels, and serving patterns. Engineering samples are already running real workloads in the lab, including GPT-5.3-Codex-Spark.

The Numbers

OpenAI claims performance-per-watt "substantially better" than current state-of-the-art hardware, though it hasn't published independently verified benchmarks yet — a detailed technical report is coming in the next few months. Reporting from Bloomberg puts the inference cost cut at roughly 50%.

The development timeline is the wildest part: nine months from schematics to tape-out. First-gen ASIC design usually runs years. OpenAI says it used its own models to accelerate parts of the chip design process itself — EDA work and hardware co-optimization.

Rollout

First deployment is gigawatt-scale, with Microsoft confirmed as the primary partner. Reports say Broadcom required Microsoft to commit to buying 40% of the first production run to lock in the initial deployment. Servers are expected online before the end of 2026.

This isn't a chip you'll ever buy or rent directly — it's baked into finished server racks and OpenAI's own infrastructure, same as Google's TPUs and Meta's in-house accelerators.

Why Developers Should Care

Inference is the cost center for anyone running production AI workloads at volume. If Jalapeno's efficiency claims hold up, that pressure flows downstream into API pricing — OpenAI isn't hiding the motive, it's trying to dig out from training costs ahead of an anticipated 2026 IPO.

It also confirms where the industry is heading: every major lab wants off Nvidia's meter. Google has TPUs, Amazon has Trainium, Meta has its own silicon, and now OpenAI has Jalapeno. Nvidia isn't going anywhere — training still leans hard on its hardware — but the inference layer is fragmenting fast.

It's the same diversification instinct behind Qualcomm's acquisition of Modular to build a CUDA-agnostic compute layer — nobody building serious AI infrastructure wants to stay locked into a single vendor's stack anymore.

What to Watch

The real test is the technical report OpenAI promised for "coming months" — that's where the performance-per-watt claims get checked against reality. Until then, treat the 50% cost-reduction figure as a target, not a guarantee.

Sources: OpenAI, TechCrunch