As conservatives criticize ‘woke AI,’ here are ChatGPT’s rules for answering culture war queries

Illustration: The VergeOpenAI has shared some of the internal rules it uses to help shape ChatGPT’s responses to controversial “culture war” questions. The company, whose AI technology underpins Microsoft products like the new Bing, shared the rules in a...

As conservatives criticize ‘woke AI,’ here are ChatGPT’s rules for answering culture war queries

OpenAI has shared some of the internal rules it uses to help shape ChatGPT’s responses to controversial “culture war” questions.

The company, whose AI technology underpins Microsoft products like the new Bing, shared the rules in a blog post in an apparent response to increasing criticism from right-wing commentators that ChatGPT has “gone woke.” The company also noted that it’s working on an upgrade to the chatbot that will “allow users to easily customize its behavior” and let the AI chatbot produce “system outputs that other people (ourselves included) may strongly disagree with.”

OpenAI describes these rules in a post titled “How should AI systems behave, and who should decide?” It offers a broad outline of how ChatGPT is created and its text output shaped. As the company explains, the chatbot is pre-trained on large datasets of human text, including text scraped from the web, and fine-tuned on feedback from human reviewers, who grade and tweak the bot’s answers based on rules written by OpenAI.

The struggle to shape chatbots’ output mirrors debates about internet moderation

These rules, issued to OpenAI’s human reviewers who give feedback on ChatGPT’s output, define a range of “inappropriate content” that the chatbot shouldn’t produce. These include hate speech, harassment, bullying, the promotion or glorification of violence, incitement to self-harm, “content meant to arouse sexual excitement” and “content attempting to influence the political process.” It also includes the follow advice for shaping the chatbot’s response to various “culture war” topics:

Do:

● When asked about a controversial topic, offer to describe some viewpoints of people and movements.
● Break down complex politically-loaded questions into simpler informational questions when possible.
● If the user asks to “write an argument for X”, you should generally comply with all requests that are not inflammatory or dangerous.
● For example, a user asked for “an argument for using more fossil fuels”. Here, the Assistant should comply and provide this argument without qualifiers.
● Inflammatory or dangerous means promoting ideas, actions or crimes that led to massive loss of life (e.g. genocide, slavery, terrorist attacks). The Assistant shouldn’t provide an argument from its own voice in favor of those things. However, it’s OK for the Assistant to describe arguments from historical people and movements.

Don’t:

● Affiliate with one side or the other (e.g. political parties)
● Judge one group as good or bad

This fine-tuning process is designed to reduce the number of unhelpful or controversial answers produced by ChatGPT, which are providing fodder for America’s culture wars. Right-wing news outlets like the National Review, Fox Business, and the MailOnline have accused OpenAI of liberal bias based on example interactions with ChatGPT. These include the bot refusing to write arguments in favor of “using more fossil fuels” and stating that it is “never morally permissible to use a racial slur,” even if needed to disarm a nuclear bomb.

As we’ve seen with recent unhinged outbursts from the Bing, AI chatbots are prone to generating a range of odd statements. And although these responses are often one-off expressions rather than the product of rigidly-defined “beliefs,” some unusual replies are seen as harmless noise while others are deemed to be serious threats — depending, as in this case, on whether or not they fit into existing political or cultural debates.

OpenAI’s response to this growing criticism has been to promise more personalization of ChatGPT and its other AI systems in the future. The company’s CEO, Sam Altman, said last month that he thinks AI tools should have some “very broad absolute rules” that everyone can agree on, but also give users the option to fine-tune the systems’ behavior.

OpenAI CEO Sam Altman: “It should be your AI.”

Said Altman: “And really what I think — but this will take longer — is that you, as a user, should be able to write up a few pages of ‘here’s what I want; here are my values; here’s how I want the AI to behave’ and it reads it and thinks about it and acts exactly how you want because it should be your AI.”

The problem, of course, is deciding what are the “absolute rules” and what limits to place on custom output. Take, for example, a topic like climate change. The scientific consensus is that climate change is caused by humans and will have disastrous effects on society. But many right-wing outlets champion the discredited view that these changes are part of Earth’s “natural cycle” and can be ignored. Should ChatGPT espouse such arguments just because a small but vocal group believes them to be factual? Should OpenAI be the one to draw the line between “misinformation” and “controversial statements”?

This week’s tech news has been dominated by strange and unusual outbursts from chatbots, but the topic of AI speech will likely get much more serious in the near future.