The New York Times Is Suing Perplexity For Copyright Infringement (techcrunch.com) 68
The New York Times is suing Perplexity for copyright infringement, accusing the AI startup of repackaging its paywalled reporting without permission. TechCrunch reports: The Times joins several media outlets suing Perplexity, including the Chicago Tribune, which also filed suit this week. The Times' suit claims that "Perplexity provides commercial products to its own users that substitute" for the outlet, "without permission or remuneration." [...] "While we believe in the ethical and responsible use and development of AI, we firmly object to Perplexity's unlicensed use of our content to develop and promote their products," Graham James, a spokesperson for The Times, said in a statement. "We will continue to work to hold companies accountable that refuse to recognize the value of our work."
Similar to the Tribune's suit, the Times takes issue with Perplexity's method for answering user queries by gathering information from websites and databases to generate responses via its retrieval-augmented generation (RAG) products, like its chatbots and Comet browser AI assistant. "Perplexity then repackages the original content in written responses to users," the suit reads. "Those responses, or outputs, often are verbatim or near-verbatim reproductions, summaries, or abridgments of the original content, including The Times's copyrighted works."
Or, as James put it in his statement, "RAG allows Perplexity to crawl the internet and steal content from behind our paywall and deliver it to its customers in real time. That content should only be accessible to our paying subscribers." The Times also claims Perplexity's search engine has hallucinated information and falsely attributed it to the outlet, which damages its brand. "Publishers have been suing new tech companies for a hundred years, starting with radio, TV, the internet, social media, and now AI," Jesse Dwyer, Perplexity's head of communications, told TechCrunch. "Fortunately it's never worked, or we'd all be talking about this by telegraph."
Similar to the Tribune's suit, the Times takes issue with Perplexity's method for answering user queries by gathering information from websites and databases to generate responses via its retrieval-augmented generation (RAG) products, like its chatbots and Comet browser AI assistant. "Perplexity then repackages the original content in written responses to users," the suit reads. "Those responses, or outputs, often are verbatim or near-verbatim reproductions, summaries, or abridgments of the original content, including The Times's copyrighted works."
Or, as James put it in his statement, "RAG allows Perplexity to crawl the internet and steal content from behind our paywall and deliver it to its customers in real time. That content should only be accessible to our paying subscribers." The Times also claims Perplexity's search engine has hallucinated information and falsely attributed it to the outlet, which damages its brand. "Publishers have been suing new tech companies for a hundred years, starting with radio, TV, the internet, social media, and now AI," Jesse Dwyer, Perplexity's head of communications, told TechCrunch. "Fortunately it's never worked, or we'd all be talking about this by telegraph."
This might be a tangent... (Score:3)
Not to mention the question of how many articles from the NY Times are claimed to be paywalled that are of older content - as in, stuff that would be public domain being pre-1929 (and available elsewhere as well).
For me, it just seems like a lot that hinges on what actually is truly paywalled, if soft paywalls count (like the one I mentioned where you can copy the text before a paywall pops up), and the like.
Re: (Score:2)
Re: (Score:1)
News articles receive copyright protection the moment they are created. This protection is automatic, beginning as soon as the work is recorded in a tangible form like being written or published online.
https://legalclarity.org/are-n... [legalclarity.org]
Re: This might be a tangent... (Score:2)
Re: This might be a tangent... (Score:2)
Re: This might be a tangent... (Score:1)
Why not ask ChatGPT to plain-ASCIIfy your posts for you?
Re: This might be a tangent... (Score:2)
It's worse than that. Slashdot's source code has had support for Unicode for more than 20 years. They even turned it on once. Then, like a week later, they turned it back off again and turned it back on. The reason was that they forgot to properly deal with direction-of-text markers, people figured that out, and the pages of comments got really messy and unreadable. So, instead of fixing their mistake, they said, "Fuck it," and went back to straight ASCII.
Re: This might be a tangent... (Score:2)
*never turned it back on. Oops.
Re: (Score:2)
What Perplexity does is, that it parses the article and for example creates bullet points or a table from the core facts. That doesn't contain the editorializing of the article, but just the facts. That's also why the publishers don't like it, because it discards their framing and their (native) ads just extracting the things the users wants.
If they would provide additional value people would click the source link, but in many cases the summary on the Perplexity page is what the user truly wanted (UX vs. UI
Re: (Score:2)
Re: (Score:2)
Perplexity is mostly:
- Ask the LLM if it can write an introduction
- Do a bing search
- Ask the LLM what webpages are relevant
- Fetch relevant pages
- Ask the LLM to write an answer
- Ask the LLM to attach sources to its claims
- Ask the LLM to write a conclusion
The answer contains a bit of information from every source but usually not verbatim. I'm also pretty sure that it is instructed not to use longer verbatim quotes, so it does not infringe copyright. On the other hand, it attributes the information to the
Re: (Score:2)
Bypassing technological measure to violate copyright is a crime, even if those technical measures are easily bypassed. See [DMCA] 17USC 1201a
Re: (Score:3)
Bypassing technological measure to violate copyright is a crime, even if those technical measures are easily bypassed. See [DMCA] 17USC 1201a
But if the text is already loaded and just hidden after the fact - which absolutely is the case at least some% of the time, you don't, in any reasonable sense of the term, bypass anything, the information is already there. Not to mention the free articles w/o paywall one can and sometimes will get to access.
Re: (Score:2)
Just because the NYT allowed someone to see an article for free does not mean, ipso facto, that *you* are entitled to see that article for free yourself. In particular copyright is not transitive.
Re: (Score:2)
Well I'm sure the fine article would have those details, except the entire article appears to be one line:
"This content is not available in your region"
Re: (Score:1)
Because it "damages its [NYT's] brand." I'm still trying to figure out how that's possible.
Re: (Score:2)
Easy. Let's say the NYT article says Trump is a thief. The AI reads the article, regurgitates it to one of its users with hallucinations, like Trump is the antichrist. Now the user thinks the NYT article said Trump is the antichrist because the AI said so, and goes postal with a shotgun.
Re: Is it just my Alzheimer's? (Score:1)
Did you just describe Fox News?
Re: Is it just my Alzheimer's? (Score:1)
Are you saying that if I ask AI what Tesla would have thought about new tsunami wave observations, it will hallucinate what the NY Times would say?
Re: (Score:2)
It's monkeys with typewriters,
All the way down.
Re: Is it just my Alzheimer's? (Score:1)
Is it inappropriate to suggest that so are stock market prices?
Re: (Score:2)
Re: Is it just my Alzheimer's? (Score:1)
Thus if stock markets, the canonical example of capitalism, have noisy prices, why shouldn't we treat inflation as noise and index it away?
Re: (Score:2)
However, the purpose of indexing it away is to normalize, due to the recognition that it's not the absolute prices that matter, only their relative relationships
Inflation is a flaw in the measuring instrument we're using (fiat currency), kind of like if you have a metallic ruler and you increase the ambient temperat
Re: (Score:2)
Currently they are still in the process to prove the IF. The WHY comes after they have proven that it actually does.
Re: (Score:2)
If AI is so unusable because all it does is hallucinate, as I've learned from slashdot, why is the NY Times concerned?
Because the idiot AI is attributing the hallucinations to NYT, damaging its reputation (regardless of what you think of it - they have a point).
Re: Is it just my Alzheimer's? (Score:1)
Can you think of human organizations that say the New York Times said things they didn't actually say?
"How the New York Times has published lies to serve a biased narrative" - NY Post headline
Re: Is it just my Alzheimer's? (Score:2)
These AI companies believe they're above the law, and they need to be put in their place, badly and devastatingly. If the same standards were applied to them that are applied to regular people who commit copyright infringement or slander, Perplexity & Co. should be some trillion bazillion dollars in debt by now.
Re: (Score:1)
You ask a stupid question. Try harder.
Re: Is it just my Alzheimer's? (Score:1)
Why not legalize suicide so I don't have to?
Re: (Score:2)
In civilized countries, suicide is quite legal. I am not surprised you apparently do not know that. Seriously, dude, think about maybe being less cringe and stupid?
Re: Is it just my Alzheimer's? (Score:1)
How come I was handcuffed to a hospital bed and involuntarily institutionalized after one attempt? How come my brother had to use the dark web to get the horse tranquilizer he took to end his mental suffering? Why not remove the police's authority to forcefully restrain me as a threat to myself, and let me legally purchase the same pills my brother had to act criminally to get?
Re: (Score:2)
How come I was handcuffed to a hospital bed and involuntarily institutionalized after one attempt?
Wrong question. The right one is "How come you did not go to prison?" because that is what happens when suicide is illegal.
Maybe stop lumping things together that are different?
Re: Is it just my Alzheimer's? (Score:1)
Why do you cringe? Perhaps your cringe reaction to me is as wrong as those who cringed so hard at Turing's homosexuality, they drove him to suicide?
If it's behind paywall how do they get it? (Score:1)
If the nytimes articles are truly hidden behind a paywall, how does perplexity get access to it?
Re: If it's behind paywall how do they get it? (Score:1)
They probably have multiple accounts with paid subscriptions connecting from IPs around the world scraping data 24/7. Setting that up would be trivial with the budgets AI companies have.
Re: (Score:1)
That may not be fair ball.
If they were just scraping whatever they find on the web, more-or-less at random, then that's one thing.
If they're taking positive steps to buy subscriptions for the sole purpose of scraping that content, that's something else.
Re: (Score:3)
I've often wondered this, until I came across this story on slashdot [slashdot.org] whereby authors of browser addons/extensions are approached to generate money from their hard work if they covertly add a javascript file similar to Mellowtel [github.com] which essentially spies on browser traffic/scrapes webpages [arstechnica.com] so each user unknowingly becomes a bot.
Sadly most addons are never monitored, and even if they have approval [mozilla.org], they can just as easily slip in the few lines of code to import the js script in the next minor release.
... critics say the monetization works by using the browser extensions to scrape websites on behalf of paying customers, which include AI startups, according to MellowTel founder Arsian Ali. Tuckner (security researcher) reached this conclusion after uncovering close ties between MellowTel and Olostep, a company that bills itself as "the world's most reliable and cost-effective Web scraping API." Olostep says its service "avoids all bot detection and can parallelize up to 100K requests in minutes." Paying customers submit the locations of browsers they want to access specific webpages. Olostep then uses its installed base of extension users to fulfill the request.
Re: (Score:2)
The paywall does not actually block access to the information: it just uses javascript to halt display of the text by your browser -but it already sent the text to your browser. To Perplexity, the javascript is just another block of text sent as a response to the wget request.
The NY Times could actually require a successful login to access data on their website -but that would prevent search index spiders from cataloging what they are offering. Then only their subscribers would see their content... and no
Re: (Score:1)
Some paywalls do indeed work this way, but nytimes is not none of them. It cuts the text off in the html so the article text is not there to be read unless you have a subscription.
Re: (Score:2)
It cuts the text off in the html so the article text is not there to be read unless you have a subscription.
Yeah, cuts it off ... after all the text is already loaded. Been able to copy and paste article text to read at my leisure many times.
Re: (Score:2)
Nope. You can verify this yourself by looking at what gets loaded in the browser. The first page is loaded, then the bottom half of that page is cut-off. The entire article itself is never sent to you unless you're logged in.
Re: (Score:2)
Re: (Score:2)
Re: If it's behind paywall how do they get it? (Score:1)
They want to drink a glass of water without getting their mouth wet.
Re: (Score:2)
Because NYT is not playing fair. They open the paywall to bots to get listed so you have unusable Google results (and a small percentage may sign up instead of clicking the next result) and AI scrapers are run from the same IP ranges as search engine bots. If they wanted a secure paywall every serious web developer would know how to build a secure login system. They actively invest work to build a semi-open paywall page.
The AI companies are going to kill themselves. (Score:1)
Where do the AI companies plan to get their data after they put all the news outlets and publishers of nonfiction out of business? Will there just be nothing written after 2030 in their results? Or will the AIs just hallucinate everything?
Re: The AI companies are going to kill themselves. (Score:1)
They can just recycle Twitter the way mainstream news does.
Re: (Score:2)
Why do you think they put them out of business? That was never the plan.
I must have missed those lawsuits (Score:2)
When did publishers ever sue TV or Radio makers?
I'm kind of surprised that MLB hasn't sued AI companies yet for reproducing descriptions of baseball games. Maybe they just aren't paying attention.
Re: (Score:2)
"Publishers have been suing new tech companies for a hundred years, starting with radio, TV, the internet, social media, and now AI," Jesse Dwyer, Perplexity's head of communications, told TechCrunch. "Fortunately it's never worked, or we'd all be talking about this by telegraph."
I also wondered about this, came across as a fairly good argument, so I checked with AI ;)
Initially it summarised each point, but when I probed it further, including checking its own sources, it came back with: -
* AP v. KVOS (1936): [historylink.org] The Associated Press sued KVOS, a Washington radio station, alleging unfair competition for reading newspaper stories verbatim on air; the U.S. Supreme Court ruled in the station's favor on a technicality, but the case catalyzed licensing arrangements between news services and r
Re: I must have missed those lawsuits (Score:2)
Did you confirm these cases, since Ai is known to make up legal precedents and cite non-existent rulings?
Re: (Score:2)
err no, I verified all links and the reason for each case cited in each article before posting.
They're valid and AI accurately pointed them out - but it rightly added a caveat to the "Westmoreland v. CBS (1982–1985)" case as it's not a direct news publisher vs TV station, but rather the fact that the TV documentary made use of investigative reporting which resulted in a defamation case.
Someone doesn't have their brain engaged, but I'm fairly sure it's not me.
Re: (Score:2)
Did you confirm these cases
They are literally links to the cases. Jesus Christ I've met single celled organisms with more brains than you.
plagiarism anyone? (Score:2)
The irony of the NYTimes complaining on copyrights given their history of reporters found guilt of or complicit in plagiarism is not lost to me.
Re: plagiarism anyone? (Score:1)
To be fair, their complaint is probably best translated at "Look at meeeeee!"
\o/ (Score:1)
We need to sue them too - the NYT have been kind enough to put up a wall to protect us from their recycled drivel and now Perplexity have covertly injected some of it into our minds. Wtf!?!
Shut up and DIE already! (Score:2)
I'm ready for New York Times to just go out of business, like so many other legacy platforms.
Bloody greedy fools abound (Score:2)
The NY Times seems to be unaware that the activity they cite is very much what they do to provide the articles they claim are in some way being stolen. It would be fun to see their reaction when their non-payed sources sue their sorry asterisks.
{^_^}
Never saw paywalled content (Score:2)
Fortunately Perplexity never used a paywalled source in my queries, other than Google what sometimes had three inaccessible links on the first page.
we are the only true font of truth (Score:2)
NYT doesn't like competition.
A message to publishers and innovators (Score:2)
This will slow down the progress but will give the atmosphere, the oceans and all the species ample time to recover. Agreed?
This will slow down the progress but will give social imbalance, political imbalance and racial imbalance enough time to adjust against concentrated wealth and power accumulation. Now agreed?