Protecting Your Content from AI: Webinar Takeaways
Click play to LISTEN to the article below
Artificial intelligence (AI) is advancing at breakneck speed, raising alarms among publishers about how their content could be used without permission. In a recent webinar, experts dug into the risks publishers face in the AI era, and what they can do to protect themselves.
The Problem: AI Scraping Content to Train Models
A major concern is that AI models like ChatGPT are scraping online content without permission to train their systems. As one speaker explained, this could lead to lost traffic and revenue if people start getting their news directly from chatbots instead of going to the original publishers.
One webinar attendee, Teri from the OHT team, worried about “loss of revenue as it relates to a publisher’s content.” If AI models can absorb news articles and spit back answers to users’ questions, fewer people may subscribe to access that original reporting.
Technical Solutions: Paywalls, Robots.txt and More
So how can publishers restrict access to their full articles? The team discussed a multi-layered security approach:
Paywalls may seem an obvious solution. But the experts explained they are not foolproof protections on their own.
Many paywalls still deliver the full article HTML to the user’s browser, even if it’s behind a login screen. Bots can scrape the content before the paywall kicks in.
But some paywalls only deliver article excerpts to users initially. The full content lives “server side” and is never exposed. This is much more secure against scraping.
The speakers showed live demos of testing different sites by inspecting page HTML. One robust paywall completely blocked full article text from appearing in the code.
This file tells bots which pages they can and can’t access. Publishers can use it to restrict scraper bots from crawling certain content while still allowing helpful bots like Google.
Recently OpenAI said their bots will respect robots.txt. But one big caveat is that this is voluntary, not legally enforceable.
Additional technical layers can supplement paywalls and robots.txt, like requiring email registration for metered access.
Calls for Ethics Principles and Industry Advocacy
But technical fixes only go so far. The core problem remains a lack of standards around how publishers should be compensated and credited when their work is used to develop AI.
Panelists said press associations may need to get involved, lobbying for the industry and establishing ethical principles around AI development. Groups like the News Media Alliance are starting to put forward guidelines.
Government leaders are also scrutinizing these issues, but legislation tends to lag behind the pace of tech innovation.
Emerging Models for Content Protection
Adobe’s new AI image generator Firefly offers one potential model for making AI work for content creators.
Firefly’s system only draws on images that artists have already licensed through Adobe Stock. And it tracks each image used so artists earn royalties when their work trains the AI.
This shows a pathway where content creators get paid for their vital contributions to AI, sharing in the value it generates.
What Publishers Can Do Now
So amid all the uncertainty, what should publishers be doing in the near-term? Here are some key takeaways:
- Audit paywalls and website code for vulnerabilities where scrapers could access full articles.
- Implement a restrictive robots.txt file and opt out of data collection where possible.
- Explore emerging protections like digital watermarking of articles.
- Develop your own AI tools tailored for news organizations. Fight fire with fire.
- Get involved with industry groups and advocate for standards that protect publishers.
The team emphasized publishers need to get proactive in this new landscape. While risks exist, AI ultimately presents new opportunities to engage users and maintain the vital role of original reporting.
Balancing these competing priorities will only grow more complex. As one panelist concluded, paying attention and taking action today helps ensure publishers don’t get left behind tomorrow.