OpenAI’s U-Turn: Not Training GPT-4 On API Customer Data via @sejournal, @MattGSouthern
OpenAI stops training language models such as GPT-4 with customer API data. The post OpenAI’s U-Turn: Not Training GPT-4 On API Customer Data appeared first on Search Engine Journal.
In a significant departure from its previous practices, OpenAI has announced that it will no longer utilize customer data sent via its APIs to train its expansive language models, such as GPT-4.
The change was confirmed by Sam Altman, the CEO of OpenAI, in a recent interview with CNBC.
OpenAI’s New Approach to User Data
OpenAI’s change in policy was implemented on March 1, 2023, when the company quietly updated its terms of service to reflect this new commitment to user privacy.
Altman clarified, “Customers clearly want us not to train on their data, so we’ve changed our plans: We will not do that.”
APIs, or application programming interfaces, are technological frameworks that allow customers to connect directly to OpenAI’s software.
Altman stated that OpenAI has not been using API data for model training “for a while,” suggesting that this official announcement formalizes an existing practice.
Implications For Business Customers
OpenAI’s move has far-reaching implications, particularly for its business customers, which include giants like Microsoft, Salesforce, and Snapchat.
These companies are more likely to utilize OpenAI’s API capabilities for their operations, so the privacy and data protection shift is particularly relevant to them.
However, the new data protection measures apply solely to customers utilizing the company’s API services. OpenAI’s updated terms of service note, “We may use Content from Services other than our API.”
As such, other forms of data input, like text entered into the popular chatbot ChatGPT, may still be utilized by OpenAI unless the data is shared through the API.
Broader Industry Impact
OpenAI’s policy shift comes when industries grapple with the potential impacts of large language models, such as OpenAI’s ChatGPT, replacing material traditionally created by humans.
For example, the Writers Guild of America recently began striking after negotiations between the Guild and movie studios broke down. The Guild had been advocating for restrictions on using OpenAI’s ChatGPT for script generation or rewriting.
OpenAI’s decision not to use customer data for training marks a pivotal moment in the ongoing conversation about data privacy and AI. As companies continue to explore and push the boundaries of AI technology, ensuring user privacy and maintaining trust will likely remain central to these discussions.
The Evolution of ChatGPT: GPT-3 To GPT-4
It is important to note that OpenAI’s commitment to not using customer data for training applies to its latest language model, GPT-4, released on March 14, 2023.
GPT-4 introduced several improvements over its predecessor, GPT-3, including a significant increase in word limit size (25,000 compared to the 3,000-word limit of ChatGPT), greater context window size, and improved reasoning and understanding capabilities.
Another notable feature of GPT-4 is its multi-modality, or the ability to understand and infer information from images in addition to text. This latest model generates more human-like texts, using features like emojis for a more personalized feel.
However, the exact size and architecture of GPT-4 remain undisclosed, leading to speculation about the details of the model.
Despite these rumors, OpenAI’s CEO has denied specific claims about the model’s size.
As for performance, GPT-4 has demonstrated strengths in text generation but also some limitations. For instance, it scored in the 54th percentile on the Graduate Record Examination (GRE) Writing and performed in the 43rd – 59th percentile on the AP Calculus BC exam.
Additionally, it performed well on easy Leetcode coding tasks, but its performance declined with increased task difficulty.
While the specifics of GPT-4’s training process are not officially documented, it’s known that GPT models generally involve large-scale machine learning with a diverse range of internet text.
Looking Forward
As a result of changes to OpenAI’s data usage policy, the data used for training its language models doesn’t include information shared via the API unless users explicitly agree to contribute it for this purpose.
While this technology improves and plays a more significant part in our lives, it’s interesting how companies pivot and respond to concerns about keeping data private and earning people’s trust.
Featured image generated by the author using Midjourney.