Think of any topic vaguely related to raising kids imaginable, and there’s probably a post about it on Mumsnet, the long-running, enormously popular, controversy-spurring UK-based parenting forum for mothers. Over its more than two decade-long history, Mumsnet has amassed an archive of more than six billion words written by its highly engaged user base, on topics such as dirty diapers and lazy husbands. (Not to mention a bonkers rant about dolphins.)
This spring, after Mumsnet discovered that AI companies were scraping its data, the company says it decided to try to strike licensing deals with some of the major players in the space, including OpenAI, which initially expressed willingness to explore an arrangement after Mumsnet first reached out. After talks with OpenAI fell apart, Mumsnet in July announced its intention to pursue legal action.
According to Mumsnet, during those early conversations, an OpenAI strategic partnership lead told the company that datasets over 1 billion words were of interest to the AI giant. Mumsnet’s leadership was excited. “We spent quite some time in a back-and-forth with them,” Mumsnet founder and CEO Justine Roberts tells WIRED. “We had to sign some NDAs, and they wanted a lot of information from us.”
However, over a month later, OpenAI told Mumsnet that the company was no longer interested in partnering at that time, according to an email exchange reviewed by WIRED. When asked why, the OpenAI staffer characterized Mumsnet’s 6 billion word dataset as too small to warrant a licensing arrangement, Roberts says. They also noted that OpenAI is primarily interested in large datasets that the public cannot already access online, and that it wanted datasets that captured broad human experience.
This sentiment was echoed by the company when asked for comment from WIRED. “We pursue partnerships for large-scale datasets that reflect human society and do not pursue partnerships solely for publicly available information,” says OpenAI spokesperson Kayla Wood. “We support publisher and creator choice, offering them ways to express their preferences about how their sites and content work with AI in search results and training generative AI foundation models.”
Roberts says she was “irritated” by this development. She recalls that OpenAI at first had seemed especially interested in Mumsnet because of the platform’s heavily female-written content. “It’s very high-quality conversational data,” she says. “It’s 90 percent female conversation, which is quite unusual.”
OpenAI has struck a variety of data-licensing deals with media outlets and platforms in the past year, entering into agreements with Vox Media, the Atlantic, Axel Springer, Time, and WIRED parent company Condé Nast, as well as platforms filled with user-generated content like Reddit. (Automattic, the owner of WordPress.com and Tumblr, was also said to be in licensing talks earlier this year.) As the particulars of those deals haven’t been revealed, it’s not clear what the size of their respective corpuses are.
When WIRED asked about the size of datasets it will consider for commercial licensing, OpenAI declined to share that information. But spokesperson Kayla Wood emphasizes that the company’s partnerships with publishers are “focused on displaying their content in our products and driving traffic to them.”
Alex Bestall, CEO of music copyright management firm Rightsify, isn’t surprised if it’s the case that OpenAI wanted to focus on bigger fish. “Startups are much more flexible, but big labs have minimum data volumes to consider any deals,” he says.
Now, OpenAI is facing the prospect of its first copyright infringement litigation in the UK. In addition to its copyright claims, Mumsnet is claiming a breach of its terms of use and alleging database right infringement, meaning the extraction of all or a large part of a database without the owner’s consent.
Mumsnet sent its initial letter announcing it was considering legal action in July. More recently, it received a response from OpenAI with a list of questions. “They did not deny the fact that they had scraped,” she says. As of now, Mumsnet plans to continue on the litigation track; it has not yet determined whether it will file suit in the UK’s High Court or a specialized intellectual property court. (OpenAI acknowledged to WIRED that it had received and responded to the Mumsnet complaint, but did not offer comment on Mumsnet’s legal claims.)
In the meantime, Mumsnet is actively pursuing licensing arrangements with other AI companies. Roberts says that it is speaking with Google, as well as intermediary startups that have cropped up to facilitate data licensing. (Google did not respond to WIRED’s request to confirm these talks.)
“I’m quite worried about the ecosystem, where these big LLMs are allowed to march all over small publishers to build their models, and then people have less reasons to go and visit the websites,” Roberts says. “We need to come to some sort of satisfactory arrangement where people are compensated for their work.”
As Mumsnet’s content is largely user-generated, WIRED asked whether it was considering any sort of payment system for users when it does strike deals. Roberts says there is no plan at the moment, but that she would consider it if data licensing for AI became incredibly lucrative down the road.
She says that, based on comments she received after the announcement Mumsnet was looking into legal action, users by and large understand the company’s aims in licensing their data. “We’re quite concerned about AI being gender-biased,” she says. “There’s something to be said for it being trained on verified female voices.”
Roberts is optimistic about how Mumsnet’s potential legal action will unfold. “We think we have a good chance,” she says. In the US, there have already been dozens of copyright-infringement cases brought against AI companies. In many of the ongoing cases, AI companies are defending themselves by arguing that their actions are shielded by the “fair use” doctrine, which allows for copyright infringement in certain circumstances. The UK has a similar concept, which it calls “fair dealing,” but it’s significantly more limited in scope.
Regardless of the outcome, Roberts is glad her platform is taking a stance. “This is probably more about the principle of the thing than anything else,” she says.
Source : Wired