How OpenAI Decides What ChatGPT Should—and Shouldn’t—Do

On Wednesday, OpenAI published a blog post outlining the “philosophy and mechanics” behind its Model Spec—the document that shapes the behavior of ChatGPT, the world’s most widely used chatbot.

The Model Spec is meant to be an “interface,” according to the blog post. It makes the intended behavior of the world’s most widely used chatbot explicit, so that researchers, policymakers, and the public can “read, inspect, and debate” it. The result is a 100-page document that defines how OpenAI’s models should adjudicate between potentially conflicting obligations to society, OpenAI, developers who build on top of OpenAI’s products, and end users. To this end, the Spec is built around a chain of command, with a prohibition on “high severity harms” taking precedence over developers’ or users’ instructions, and is updated regularly—the most recent version dates to December.

“I see myself as a maintainer—in the spirit of an open-source side project,” says Jason Wolfe, who manages input into the Model Spec from different parts of OpenAI. Wolfe says that researchers had begun reaching out to OpenAI for input on how to design similar Model Specs, which prompted a blog post outlining the company’s thinking behind the document. “There’s a bunch of stuff here that would hopefully be useful to people.”

However, the document is “not an implementation,” according to the blog post, which raises the question of how representative the document is of what actually shapes ChatGPT during training. The process by which the Spec influences model behavior is “complicated,” according to Wolfe. In some cases, text from the Spec is directly used in “alignment” training, where AI models are trained to behave desirably. But just as often, the principles that appear in the Spec are summaries of the detailed work done by safety teams inside the organization. “In many cases, the Spec and the training are actually … parallel processes that we are keeping in sync,” says Wolfe.

ChatGPT’s growing user base means that the principles in the Spec have the potential to affect hundreds of millions of people around the world. As of February, the chatbot counted about 10% of the global population among its weekly active users. The challenge is compounded by OpenAI’s rapid internal growth, with the company planning to almost double its current headcount of 4,500 people by the end of 2026. Dozens of people throughout the organization have directly contributed text to the Spec, according to the blog post, with input from research, product, and legal teams, among others. Wolfe works particularly closely with the heads of model behavior and policy.

OpenAI is not the only AI company grappling with how to shape the behavior of its models and how to solicit input as the impact of AI on society grows. In January, Anthropic published the “Claude Constitution,” an 80-page document describing the type of entity that it wants Claude, its flagship model, to be. Anthropic’s Constitution and OpenAI’s Model Spec read very differently: the former feels akin to a moral philosophy essay, the latter is closer to a compendium of case law with examples of desired behavior. “The Anthropic Constitution is more philosophical, and the OpenAI Spec is more behavioral,” says Sharan Maiya, a PhD researcher in AI alignment at the University of Cambridge.

The documents are also used in different ways by the two companies. OpenAI’s Spec is “first and foremost, a document for people,” according to Wolfe, useful for building consensus on the desired behavior of the model, but further removed from what the model concretely learns. By contrast, Amanda Askell, the philosopher responsible for Anthropic’s Constitution, says that Anthropic gives its Constitution to Claude “to create training material itself that allows it to understand the document.” An early version of Anthropic’s Constitution was found by users, in its entirety, in Claude’s responses prior to its official publication, demonstrating that the model had learned the text of the constitution. “I do want the document to feel good to Claude models,” Askell told TIME in January.

Google DeepMind, xAI, and Meta have not published similar documents describing the intended behavior of their models.

Red lines for AI models have been particularly salient in recent weeks. Last month, OpenAI signed a deal with the Department of War after Anthropic refused to remove red lines around domestic mass surveillance and autonomous weapons. OpenAI then backtracked, with Sam Altman admitting that the deal looked “opportunistic and sloppy.” An updated version of the deal states that “the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals,” although legal experts are divided on the strength of these guarantees.

The Model Spec states that OpenAI’s models should “never” be used to facilitate mass surveillance.

“I hope that, to the extent that it’s possible given the nature of the work, if we do adapt our policies for [classified deployments], we find a way to make those adaptations transparent,” says Wolfe.

How OpenAI Decides What ChatGPT Should—and Shouldn’t—Do

By

By

Related Post

Israel Vows to ‘Intensify and Expand’ Strikes Against Iran, as Tehran Warns Civilians Near U.S. Targets to Evacuate

The Best New TV Shows of March 2026

AI Slop Is Flooding Streaming—and Musicians Are Fighting Back

Leave a Reply Cancel reply