OpenAI has released GPT-5.4, its newest flagship AI model, bringing major improvements across reasoning, coding, scientific tasks, mathematics, and real-world desktop interactions. According to OpenAI VP of Science Kevin Weil, the new release represents โour best model ever.โ
The launch follows closely behind the release of GPT-5.3 Instant, which was introduced only two days earlier as the default chat model. GPT-5.4 is currently available as GPT-5.4 Thinking for Plus, Team, and Pro users.
Strong Performance on Real-World Tasks
One of the most notable benchmarks for GPT-5.4 is its performance on OSWorld-V, a test designed to evaluate how effectively AI agents can navigate and complete tasks on a desktop environment.
GPT-5.4 scored 75%, outperforming the human baseline of 72.4% and delivering double the performance of GPT-5.2 on the same benchmark.
This improvement signals a major step forward in AI systems capable of interacting with real software environments rather than just generating text.
Larger Context and Deeper Reasoning
The new model introduces several technical upgrades designed for more complex workflows:
- Up to 1 million tokens of context
- A new โx-high reasoning effortโ mode
- Improved planning and long-running task execution
These capabilities allow GPT-5.4-based agents to plan and execute multi-step tasks that may run for hours, opening the door for more sophisticated automation across research, software development, and knowledge work.
Knowledge-Work Benchmark Gains
GPT-5.4 also demonstrated strong results on GDPval, a benchmark designed to measure AI performance across 44 real-world knowledge-worker roles.
The model matched or outperformed professionals 83% of the time, a significant improvement from the 71% score achieved by GPT-5.2.
This jump highlights continued progress toward AI systems capable of assisting โ and in some cases competing with โ human expertise across a wide range of professional tasks.
Why This Release Matters
The release comes at an important moment for OpenAI following a week of mixed sentiment around the AI industry. GPT-5.4 appears to represent a strong response, delivering meaningful gains across reasoning, automation, and real-world task execution.
Perhaps the most striking signal of confidence came from OpenAI researcher Noam Brown, who stated:
โWe see no wall.โ
If that assessment holds true, GPT-5.4 may mark another step toward increasingly capable agentic AI systems โ models that do more than generate answers and instead actively plan, navigate software, and execute complex workflows.
As AI systems continue expanding into real desktop environments, the line between tool and autonomous digital worker may become increasingly thin.


