An AI expert at Amazon-backed firm Anthropic has been accused of citing a fabricated academic article in a court filing meant to defend the company against claims that it trained its AI model on copyrighted song lyrics without permission.

The filing, submitted by Anthropic data scientist Olivia Chen, was part of the company’s legal response to a $75 million lawsuit filed by Universal Music Group, Concord, ABKCO, and other major publishers.

The publishers alleged in the 2023 lawsuit that Anthropic unlawfully used lyrics from hundreds of songs, including those by Beyoncé, The Rolling Stones, and The Beach Boys, to train its Claude language model.

Chen’s declaration included a citation to an article from The American Statistician, intended to support Anthropic’s argument that Claude only reproduces copyrighted lyrics under rare and specific conditions, according to a Reuters report.

During a hearing Tuesday in San Jose, the plaintiffs’ attorney Matt Oppenheim called the citation a “complete fabrication,” but said he didn’t believe Chen intentionally made it up, only that she likely used Claude itself to generate the source.

Anthropic’s attorney, Sy Damle, told the court Chen’s error appeared to be a mis-citation, not a fabrication, while criticizing the plaintiffs for raising the issue late in the proceedings.

Per Reuters, U.S. Magistrate Judge Susan van Keulen said the issue posed “a very serious and grave” concern, noting that “there’s a world of difference between a missed citation and a hallucination generated by AI.”

She declined a request to immediately question Chen, but ordered Anthropic to formally respond to the allegation by Thursday.

Anthropic did not immediately respond to Decrypt’s request for comment.

Anthropic in court

The lawsuit against Anthropic was filed in October 2023, with the plaintiffs accusing Anthropic’s Claude model of being trained on a massive volume of copyrighted lyrics and reproducing them on demand.

They demanded damages, disclosure of the training set, and the destruction of infringing content.

Anthropic responded in January 2024, denying that its systems were designed to output copyrighted lyrics.

It called any such reproduction a “rare bug” and accused the publishers of offering no evidence that typical users encountered infringing content.

In August 2024, the company was hit with another lawsuit, this time from authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who accused Anthropic of training Claude on pirated versions of their books.

GenAI and copyright

The case is part of a growing backlash against generative AI companies accused of feeding copyrighted material into training datasets without consent.

OpenAI is facing multiple lawsuits from comedian Sarah Silverman, the Authors Guild, and The New York Times, accusing the company of using copyrighted books and articles to train its GPT models without permission or licenses.

Meta is named in similar suits, with plaintiffs alleging that its LLaMA models were trained on unlicensed literary works sourced from pirated datasets.

Meanwhile, in March, OpenAI and Google urged the Trump administration to ease copyright restrictions around AI training, calling them a barrier to innovation in their formal proposals for the upcoming U.S. “AI Action Plan.”

In the UK, a government bill that would enable artificial intelligence firms to use copyright-protected work without permission hit a roadblock this week, after the House of Lords backed an amendment requiring AI firms to reveal what copyrighted material they have used in their models.

Your Email