OpenAI managed to appease Italian data authorities andon ChatGPT last week, but its fight against European regulators is far from over.
Earlier this year, OpenAI’s popular and controversial ChatGPT chatbot hit a big legal snag: an effective ban in Italy. The Italian Data Protection Authority (GPDP) accused OpenAI of violating EU data protection rules, and the companyin Italy while it attempted to fix the problem. On April 28th, , with OpenAI lightly addressing GPDP’s concerns without making major changes to its service — an apparent victory.
it “welcomes” the changes ChatGPT made. However, the firm’s legal issues — and those of companies building similar chatbots — are likely just beginning. Regulators in several countries are investigating how these AI tools collect and produce information, citing a range of concerns from companies’ collection of unlicensed training data to chatbots’ tendency to spew misinformation. In the EU, they’re applying the General Data Protection Regulation (GDPR), one of the world’s strongest legal privacy frameworks, the effects of which will likely reach far outside Europe. Meanwhile, lawmakers in the bloc are putting together a law that will address AI specifically — likely ushering in a new era of regulation for systems like ChatGPT.
ChatGPT’s various issues with misinformation, copyright, and data protection have placed a target on its back
ChatGPT is one of the most popular examples of generative AI — a blanket term covering tools that produce text, image, video, and audio based on user prompts. The service reportedly became one of thein history after reaching 100 million monthly active users just two months after launching in November 2022 (OpenAI has never confirmed these figures). People use it to translate text into different languages, write , and . But critics — including regulators — have highlighted ChatGPT’s unreliable output, , and murky data protection practices.
Italy was the first country to make a move. On March 31st, it highlighted four ways it believed OpenAI was breaking GDPR: allowing ChatGPT to provide inaccurate or misleading information, failing to notify users of its data collection practices, failing to meet any of thefor processing personal data, and failing to adequately prevent children under 13 years old using the service. It ordered OpenAI to immediately stop using personal information collected from Italian citizens in its training data for ChatGPT.
No other country has taken such action. But since March, at least three EU nations —, , and — have launched their own investigations into ChatGPT. Meanwhile, across the Atlantic, is evaluating privacy concerns under its Personal Information Protection and Electronic Documents Act, or PIPEDA. The European Data Protection Board (EDPB) has even established a to help coordinate investigations. And if these agencies demand changes from OpenAI, they could affect how the service runs for users across the globe.
Regulators’ concerns can be broadly split into two categories: where ChatGPT’s training data comes from and how OpenAI is delivering information to its users.
ChatGPT uses either OpenAI’s GPT-3.5 and GPT-4 large language models (LLMs), which are trained on vast quantities of human-produced text.exactly what training text is used but says it draws on “a variety of licensed, created, and publicly available data sources, which may include publicly available personal information.”
This potentially poses huge problems under GDPR. The law was enacted in 2018 and covers every service that collects or processes data from EU citizens — no matter where the organization responsible is based. GDPR rules require companies to have explicit consent before collecting personal data, to have legal justification for why it’s being collected, and to be transparent about how it’s being used and stored.
European regulators claim that the secrecy around OpenAI’s training data means there’s no way to confirm if the personal information swept into it was initially given with user consent, and the GPDP specifically argued that OpenAI had “no legal basis” for collecting it in the first place. OpenAI and others have gotten away with little scrutiny so far, but this claim adds a big question mark to future data scraping efforts.
Then there’s GDPR’s “,” which lets users demand that companies correct their personal information or remove it entirely. OpenAI preemptively to facilitate those requests, but there’s been about whether it’s technically possible to handle them, given how complex it can be to once it’s churned into these large language models.
OpenAI also gathers information directly from users. Like any internet platform, it collects a(e.g., name, contact info, card details, etc). But, more significantly, it records interactions users have with ChatGPT. As , this data can be reviewed by OpenAI’s employees and is used to train future versions of its model. Given the intimate questions people ask ChatGPT — using the bot as a therapist or a doctor — this means the company is scooping up all sorts of sensitive data.
At least some of this data may have been collected from minors, as while OpenAI’s policy states that it “does not knowingly collect personal information from children under the age of 13,” there’s no strict age verification gate. That doesn’t play well with EU rules, which ban collecting data from people under 13 and (in some countries) require parental consent for minors under 16. On the output side, the GPDPChatGPT’s lack of age filters exposes minors to “absolutely unsuitable responses with respect to their degree of development and self-awareness.”
OpenAI maintains broad latitude to use that data, which has worried some regulators, and storing it presents a security risk. Companies likeand have banned employees from using generative AI tools over fears they’ll upload sensitive data. And, in fact, Italy announced its ban soon after , exposing users’ chat history and email addresses.
ChatGPT’s propensity formay also pose a problem. GDPR regulations stipulate that all personal data must be accurate, something the GPDP highlighted in its announcement. Depending on how that’s defined, it could spell trouble for most AI text generators, which are prone to “ ”: a cutesy industry term for factually incorrect or irrelevant responses to a query. This has already seen some real-world repercussions elsewhere, as a regional Australian mayor has after ChatGPT falsely claimed he had served time in prison for bribery.
ChatGPT’s popularity and current dominance over the AI market make it a particularly attractive target, but there’s no reason why its competitors and collaborators, likeor Microsoft with its OpenAI-powered Azure AI, won’t face scrutiny, too. Before ChatGPT, Italy banned the chatbot platform for collecting information on minors — and so far, it’s stayed banned.
While GDPR is a powerful set of laws, it wasn’t made to address AI-specific issues. Rules that do, however, may be on the horizon.
In 2021, the EU submitted its first draft of the, legislation that will work alongside GDPR. The act governs AI tools according to their perceived risk, from “minimal” (things like spam filters) to “high” (AI tools for law enforcement or education) or “unacceptable” and therefore banned (like a social credit system). After the explosion of large language models like ChatGPT last year, lawmakers are now racing to add rules for “foundation models” and “General Purpose AI Systems (GPAIs)” — two terms for large-scale AI systems that include LLMs — and potentially “high risk” services.
The AIA’s provisions go beyond data protection. Awould force companies to disclose any copyrighted material used to develop generative AI tools. That could expose once-secret datasets and leave more companies vulnerable to infringement lawsuits, which are .
Laws specifically designed to regulate AI may not come into effect in Europe until late 2024
But passing it may take a while. EU lawmakerson April 27th. A committee will vote on the draft on May 11th, and the final proposal is expected by mid-June. Then, the European Council, Parliament, and Commission will have to before implementing the law. If everything goes smoothly, it could be adopted by the second half of 2024, a little behind the of Europe’s May 2024 elections.
For now, Italy and OpenAI’s spat offers an early look at how regulators and AI companies might negotiate. The GPDP offered to lift its ban if OpenAI metby April 30th. That included informing users how ChatGPT stores and processes their data, asking for explicit consent to use said data, facilitating requests to correct or remove false personal information generated by ChatGPT, and requiring Italian users to confirm they’re over 18 when registering for an account. OpenAI didn’t hit all of those stipulations, but it met enough to appease Italian regulators and get .
OpenAI still has targets to meet. It has until September 30th to create a harder age-gate to keep out minors under 13 and require parental consent for older underage teens. If it fails, it could see itself blocked again. But it’s provided an example of what Europe considers acceptable behavior for an AI company — at least until new laws are on the books.