OpenAI just dropped another bombshell. On Thursday, April 23, 2026, OpenAI announced GPT-5.5 — its latest AI model, which the company says is better at coding, using computers, and pursuing deeper research capabilities. The release arrives barely six weeks after GPT-5.4 shipped, a pace that signals something fundamental has shifted in how OpenAI is building and deploying frontier AI.
This isn't a minor patch. GPT-5.5 is a model that OpenAI is betting its "super app" vision on — and the early numbers back up the hype. Here's everything you need to know, from the benchmarks that matter to the honest limitations no one is advertising.
What Is GPT-5.5?
GPT-5.5 is OpenAI's frontier coding and reasoning model released April 23, 2026. It scores 88.7% on SWE-bench and 92.4% on MMLU, with a 60% drop in hallucinations versus GPT-5.4. Three variants ship: GPT-5.5 standard, GPT-5.5 Thinking (extended reasoning), and GPT-5.5 Pro (highest accuracy).
GPT-5.5 understands what you're trying to do faster and can carry more of the work itself. It excels at writing and debugging code, researching online, analyzing data, creating documents and spreadsheets, operating software, and moving across tools until a task is finished.
The codename circulating online is "Spud" — a nod to the potato emoji OpenAI used in its teaser posts. The name is unassuming; the model is not.
The Core Pitch: Less Hand-Holding, More Getting Things Done
The central narrative OpenAI is pushing with GPT-5.5 is autonomy. OpenAI President Greg Brockman described it this way: the model "can look at an unclear problem and figure out just what needs to happen next."
On a press call, Brockman framed GPT-5.5 as a step toward more "agentic and intuitive computing," calling it "a real step forward towards the kind of computing that we expect in the future." According to OpenAI, GPT-5.5 is designed to handle complex, ambiguous tasks with less human guidance than previous models required. In practice, that means you can hand it a sprawling, multi-step problem — a messy codebase, a research task with unclear boundaries, a cross-tool workflow — and trust it to plan, iterate, and self-correct.
GPT-5.5 Key Features: A Deep Dive
1. Agentic Coding — The Headline Capability
OpenAI says GPT-5.5 is its strongest agentic coding model so far. On Terminal-Bench 2.0, which measures complex command-line workflows, the model scored 82.7%, up from 75.1% for GPT-5.4. On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, it reached 58.6%, and OpenAI says it solved more tasks end-to-end in a single pass than earlier versions.
To put those numbers in competitive context: Claude Opus 4.7 scored 69.4% on Terminal-Bench 2.0 — a gap of more than 13 percentage points in GPT-5.5's favor. OpenAI also used GPT-5.5 on its own infrastructure. The LLM helped optimize the software that manages the infrastructure on which it runs — an unusual and telling proof-of-concept.
2. Computer Use — Operating Software Directly
One of the most underappreciated upgrades in GPT-5.5 is its ability to navigate software interfaces autonomously. OpenAI says GPT-5.5 is better than GPT-5.4 in Codex at generating documents, spreadsheets, and slide presentations, and that its computer-use abilities make it better at moving across tools, checking results, and navigating interfaces.
This isn't just convenience. For enterprise teams managing complex workflows, a model that can genuinely "operate" software — not just suggest what to do — is a qualitative leap.
This model is a real step forward towards the kind of computing that we expect in the future — but it is one step, and we expect to see many in the future.
— Greg Brockman, President, OpenAI3. Deep Research & Scientific Discovery
On research tasks, OpenAI says GPT-5.5 improved over GPT-5.4 on GeneBench and reached 80.5% on BixBench, which it describes as leading performance among models with published scores.
The most striking research claim is this: a customized version of GPT-5.5 helped researchers discover a new proof involving off-diagonal Ramsey numbers, later verified in Lean. For context, Ramsey theory is a notoriously hard area of combinatorics with direct applications in computer science. AI-assisted mathematical discovery at this level is rare and significant.
4. Long-Context Reasoning
On the MRCR v2 benchmark, which tests how reliably a model can locate multiple pieces of hidden information across very long texts, GPT-5.5 jumps to 74.0% at context lengths of 512K to 1M tokens, up from 36.6% for GPT-5.4. On the Graphwalks BFS test with one million tokens, GPT-5.5 leaps from 9.4% (GPT-5.4) to 45.4%.
These aren't marginal improvements. Doubling long-context performance means GPT-5.5 can handle entire codebases, lengthy legal documents, or multi-year research archives in a single context window with dramatically higher reliability.
5. Knowledge Work Across Professions
On GDPVal, a benchmark testing knowledge work across 44 real occupations — from finance to legal research to product management — GPT-5.5 matches or beats industry professionals in 84.9% of comparisons.
Internally, OpenAI says its finance team used Codex with GPT-5.5 to review 24,771 K-1 tax forms spanning 71,637 pages, helping accelerate the process by two weeks compared to the prior year. That's not a benchmark — that's a real productivity outcome at scale.
GPT-5.5 vs GPT-5.4: Key Differences at a Glance
| Metric | GPT-5.4 | GPT-5.5 |
|---|---|---|
| Release Date | March 5, 2026 | April 23, 2026 |
| Terminal-Bench 2.0 | 75.1% | 82.7% |
| SWE-Bench Pro | 57.7% | 58.6% |
| MRCR v2 (512K–1M tokens) | 36.6% | 74.0% |
| GDPVal (knowledge work) | 83.0% | 84.9% |
| API Input Price (per 1M tokens) | $2.50 | $5.00 |
| API Output Price (per 1M tokens) | $15.00 | $30.00 |
| Context Window | 1M tokens | 1M tokens |
| Hallucination Reduction | — | 60% fewer vs GPT-5.4 |
GPT-5.5 improves on 9 of the 10 benchmarks compared directly with GPT-5.4, with the largest gains on ARC-AGI-2, MCP Atlas, and Terminal-Bench 2.0.
Pricing and Access: Who Gets It and What It Costs
GPT-5.5 is included in ChatGPT Plus ($20/month), Pro ($200/month), Business, and Enterprise plans. The API pricing is announced but not yet live: $5 per million input tokens and $30 per million output tokens for the standard model — a 2x jump from GPT-5.4's $2.50/$15. GPT-5.5 Pro comes in at $30/$180 per million, unchanged from GPT-5.4 Pro.
GPT-5.5 Thinking is available to all paid tiers, while GPT-5.5 Pro is limited to Pro, Business, and Enterprise subscribers. Codex access spans Plus, Pro, Business, Enterprise, Edu, and Go plans with a 400K context window.
The price doubling looks steep on paper. But OpenAI's counterargument is efficiency: GPT-5.5 is both more intelligent and much more token-efficient, delivering better results with fewer tokens than GPT-5.4 for most users on Codex tasks. For teams running complex agentic workflows, the net cost impact may be softer than the sticker price suggests.
Real-World Applications: Where GPT-5.5 Shines
For developers and engineering teams: The SWE-bench and Terminal-Bench gains translate directly to faster debugging, better codebase navigation, and fewer rounds of human correction in agent loops. If you run any form of automated coding pipeline, this upgrade is meaningful.
For legal and financial professionals: GPT-5.5 Pro provides particularly large quality gains across business, legal, education, and data science use cases. The K-1 tax form example — 71,637 pages processed with a two-week time saving — gives a concrete sense of the efficiency ceiling being pushed.
For researchers: The Ramsey number proof, the GeneBench improvements, and the BixBench scores position GPT-5.5 as a genuine co-investigator for scientific work — not just a summarization tool.
For enterprise knowledge workers: Over 10,000 NVIDIA employees across engineering, product, legal, marketing, finance, sales, HR, and operations got early access and described results as "mind-blowing" and "life-changing." That's an unusually strong internal endorsement.
Where GPT-5.5 Doesn't Win
Honest coverage of a new model requires acknowledging where the competition still holds ground.
On SWE-Bench Pro, Claude Opus 4.7 beats GPT-5.5 with 64.3% versus 58.6%. On MCP Atlas, GPT-5.5 scores 75.3%, trailing both Claude Opus 4.7 (79.1%) and Gemini 3.1 Pro (78.2%). GPT-5.5 also falls slightly behind Gemini on BrowseComp, a web research benchmark, with 84.4% versus 85.9%.
On GDPval, GPT-5.5 scores 84.9%, only a marginal improvement over GPT-5.4's 83.0% — suggesting that for everyday professional tasks, the performance delta versus the previous generation may be smaller than headlines imply.
Should You Upgrade? A Decision Framework
Upgrade Now If…
- You run agentic coding pipelines or use Codex heavily
- You work with documents spanning 500K+ tokens
- Your use case involves multi-step computer interaction
- You're doing research that requires deep, iterative reasoning
Stick with GPT-5.4 If…
- High-volume, low-complexity workloads (classification, summarization)
- Cost-sensitive and already hitting ceilings on GPT-5.4
- Primarily need tool-use via MCP Atlas where Claude/Gemini leads
- Building consumer apps — wait to A/B test on production traffic
The Bigger Picture: OpenAI's Velocity Problem (and Opportunity)
The announcement arrived less than two months after OpenAI released GPT-5.4, a timeline that underscores just how rapidly the AI sector is moving and how intense the competition among the industry's biggest players has become. The launch comes just weeks after Anthropic unveiled Claude Mythos Preview, its new model with advanced cybersecurity capabilities — a reminder that GPT-5.5's release is as much about competitive timing as it is about technical readiness.
Brockman framed GPT-5.5 as a step toward OpenAI's "super app" vision — a single intelligent interface that handles knowledge work, coding, research, and software operation end-to-end. Whether that vision is achievable in 2026 remains to be seen. But the cadence of releases — GPT-5, 5.1, 5.2, 5.3-Codex, 5.4, and now 5.5 in under nine months — suggests OpenAI is building toward something significantly larger than any individual model.
Frequently Asked Questions
GPT-5.5 is OpenAI's frontier AI model released on April 23, 2026. It is designed for agentic coding, computer use, deep research, and knowledge work, and ships in three variants: standard, Thinking, and Pro.
GPT-5.5 significantly improves on long-context reasoning, terminal and coding benchmarks, and hallucination reduction (60% fewer errors vs GPT-5.4), while matching GPT-5.4's per-token latency. The tradeoff is a 2x increase in API pricing.
Via API: $5 per million input tokens and $30 per million output tokens. GPT-5.5 Pro is $30/$180 per million tokens. It is included in ChatGPT Plus, Pro, Business, and Enterprise subscriptions at no additional cost.
No. GPT-5.5 is currently rolling out to paid ChatGPT subscribers only (Plus, Pro, Business, Enterprise). Free-tier access has not been announced.
On coding benchmarks like Terminal-Bench 2.0 and agentic tasks, GPT-5.5 leads. On SWE-Bench Pro and MCP Atlas tool-use benchmarks, Claude Opus 4.7 and Gemini 3.1 Pro hold competitive advantages. No single model dominates every benchmark.
OpenAI has announced pricing but says the API rollout is "coming very soon" as of April 23, 2026.
GPT-5.5 is the most capable model OpenAI has shipped to date on the benchmarks that matter for agentic, long-horizon tasks — and the long-context performance leap alone is substantial. At double the API price, it demands scrutiny before deployment at scale. But for teams doing serious coding, research, or computer-use automation, the capability uplift is real and measurable.
The pace of progress is the subtext here. We are in an era where frontier AI models iterate on six-week cycles. That changes how developers, enterprises, and individuals should think about their AI stack — not as a static infrastructure choice, but as a living decision that needs to be revisited, continuously.
GPT-5.5 is not the finish line. It is, as Brockman put it, one step.
Sources
OpenAI official announcement · CNBC · TechCrunch · The Decoder · SiliconAngle · iClarified · NVIDIA Blog · Artificial Analysis · llm-stats.com

Log in














