Our Hometown Blog

Protecting Your Content from AI: Webinar Takeaways

September 22, 2023
/Christopher Winders
/AI, Latest from Our Hometown, Paywalls, Virtual Conference Replays
/No Comments

Click play to LISTEN to the article below

Artificial intelligence (AI) is advancing at breakneck speed, raising alarms among publishers about how their content could be used without permission. In a recent webinar, experts dug into the risks publishers face in the AI era, and what they can do to protect themselves.

The Problem: AI Scraping Content to Train Models

A major concern is that AI models like ChatGPT are scraping online content without permission to train their systems. As one speaker explained, this could lead to lost traffic and revenue if people start getting their news directly from chatbots instead of going to the original publishers.

One webinar attendee, Teri from the OHT team, worried about “loss of revenue as it relates to a publisher’s content.” If AI models can absorb news articles and spit back answers to users’ questions, fewer people may subscribe to access that original reporting.

Technical Solutions: Paywalls, Robots.txt and More

So how can publishers restrict access to their full articles? The team discussed a multi-layered security approach:

Paywalls

Paywalls may seem an obvious solution. But the experts explained they are not foolproof protections on their own.

Many paywalls still deliver the full article HTML to the user’s browser, even if it’s behind a login screen. Bots can scrape the content before the paywall kicks in.

But some paywalls only deliver article excerpts to users initially. The full content lives “server side” and is never exposed. This is much more secure against scraping.

The speakers showed live demos of testing different sites by inspecting page HTML. One robust paywall completely blocked full article text from appearing in the code.

Robots.txt

This file tells bots which pages they can and can’t access. Publishers can use it to restrict scraper bots from crawling certain content while still allowing helpful bots like Google.

Recently OpenAI said their bots will respect robots.txt. But one big caveat is that this is voluntary, not legally enforceable.

Other Layers

Additional technical layers can supplement paywalls and robots.txt, like requiring email registration for metered access.

Calls for Ethics Principles and Industry Advocacy

But technical fixes only go so far. The core problem remains a lack of standards around how publishers should be compensated and credited when their work is used to develop AI.

Panelists said press associations may need to get involved, lobbying for the industry and establishing ethical principles around AI development. Groups like the News Media Alliance are starting to put forward guidelines.

Government leaders are also scrutinizing these issues, but legislation tends to lag behind the pace of tech innovation.

Emerging Models for Content Protection

Adobe’s new AI image generator Firefly offers one potential model for making AI work for content creators.

Firefly’s system only draws on images that artists have already licensed through Adobe Stock. And it tracks each image used so artists earn royalties when their work trains the AI.

This shows a pathway where content creators get paid for their vital contributions to AI, sharing in the value it generates.

What Publishers Can Do Now

So amid all the uncertainty, what should publishers be doing in the near-term? Here are some key takeaways:

Audit paywalls and website code for vulnerabilities where scrapers could access full articles.
Implement a restrictive robots.txt file and opt out of data collection where possible.
Explore emerging protections like digital watermarking of articles.
Develop your own AI tools tailored for news organizations. Fight fire with fire.
Get involved with industry groups and advocate for standards that protect publishers.

The team emphasized publishers need to get proactive in this new landscape. While risks exist, AI ultimately presents new opportunities to engage users and maintain the vital role of original reporting.

Balancing these competing priorities will only grow more complex. As one panelist concluded, paying attention and taking action today helps ensure publishers don’t get left behind tomorrow.

Full Transcript

00:00 –> 00:10
[MUSIC]

00:10 –> 00:12
>> Hey, it’s a high five.

00:12 –> 00:13
How’s it going, Cliff?

00:13 –> 00:17
Welcome, and Sabrina from Main Street,

00:17 –> 00:23
Tennessee, relatively new Our-Hometown customer, welcome.

00:25 –> 00:30
We’re going to probably just start right on time here.

00:30 –> 00:33
We’re expecting a pretty intimate group.

00:33 –> 00:37
This is like a niche subject,

00:37 –> 00:39
more so than I expected.

00:39 –> 00:44
But I’ve got Christopher on the call with me.

00:44 –> 00:48
Anyone at Our-Hometown probably knows Christopher.

00:48 –> 00:51
He’s been with the company how many years now?

00:51 –> 00:54
>> I think it’s pushing a decade.

00:54 –> 00:56
>> Yeah, a decade.

00:56 –> 00:59
Just pretty much like we started around the same time.

00:59 –> 01:03
You were definitely here before me even.

01:03 –> 01:09
Christopher just does a little bit of everything,

01:09 –> 01:17
and lately he’s been spending a lot of time researching the industry impacts of AI.

01:17 –> 01:22
Anyone on our newsletter list probably sees us talking about this a lot.

01:22 –> 01:26
We really felt like this topic today was an important one.

01:26 –> 01:31
We’ve been talking about all the benefits of AI,

01:31 –> 01:35
but this is like the downside that you got to watch out for.

01:35 –> 01:40
This is all stuff that we’re learning about all the time,

01:40 –> 01:46
and we’re open to ideas on how to approach

01:46 –> 01:50
these issues and questions at any time.

01:50 –> 01:54
I think we’ve got definitely a good group to get started with.

01:54 –> 01:57
If everyone can locate the chat and just give us a quick hello,

01:57 –> 02:01
we’ll use the webinar chat to go back and forth,

02:01 –> 02:06
just so that I know that you can all hear me and you know where the chat is.

02:06 –> 02:09
Just quick hello, your name,

02:09 –> 02:11
maybe in the paper you’re with.

02:11 –> 02:14
That would be awesome.

02:15 –> 02:21
Christopher is from Our-Hometown, Rochester.

02:21 –> 02:25
I grew up there, but I’m out in Salt Lake City now,

02:25 –> 02:31
so we got pretty wide ranging attendance.

02:31 –> 02:36
Terry, welcome from Brockport, New York. Awesome.

02:36 –> 02:42
Right. I didn’t see you on the registration list, Cliff.

02:42 –> 02:43
Great to have you.

02:43 –> 02:48
Cliff from Ohio, the Clintonville spotlight been with us.

02:48 –> 02:51
Probably six or seven years now, I think.

02:51 –> 02:53
I remember bringing you on board.

02:53 –> 02:56
Great. Hello, everyone.

02:56 –> 03:01
If you’d like to ask a question, 2017.

03:01 –> 03:03
Yeah, right about six years.

03:03 –> 03:07
If you’d like to ask a question or join the conversation at any point,

03:07 –> 03:11
don’t hesitate to just say so in the chat.

03:11 –> 03:16
We can invite people to talk if you want to just talk out your question.

03:16 –> 03:19
This is a casual format.

03:19 –> 03:24
We’ve got Andrea from the Chicago Daily Law Bulletin.

03:24 –> 03:28
She typed over in Q&A, but I wanted to shout her out.

03:28 –> 03:32
Awesome. Welcome from Chicago. Very cool.

03:32 –> 03:36
We’d love to hear your interest in this specifically.

03:36 –> 03:40
We’ll get to that. We’ll have some poll questions in a minute.

03:40 –> 03:44
All right. Chris, you’ve got your screen shared.

03:44 –> 03:47
Could you jump over to?

03:47 –> 03:50
Whoa. Something just went wacky with my screen there.

03:50 –> 03:52
Oh, that was me.

03:52 –> 03:54
Oh, okay.

03:54 –> 04:01
Because I’ve got the Zoom on my vertical screen, so it’s kind of wacky.

04:01 –> 04:06
But anyway, just some quick background for folks that don’t know about us

04:06 –> 04:11
and just how we’re, you know, our interest in this topic is, you know,

04:11 –> 04:16
because we work with local newspapers, weeklies, a few dailies,

04:16 –> 04:19
but, you know, just small community newspapers.

04:19 –> 04:21
We help them with their websites.

04:21 –> 04:25
We’ve been doing this for 27 years now.

04:25 –> 04:27
It’s all based on WordPress.

04:27 –> 04:32
And we’ll talk a little bit about, you know, open source WordPress

04:32 –> 04:36
and the way that things work on that system.

04:36 –> 04:42
But we also really want to look at everyone else’s CMS and help you,

04:42 –> 04:46
you know, kind of work through some of these questions

04:46 –> 04:50
that we’re going to bring up just in terms of how secure your content is.

04:50 –> 04:54
We can talk very extensively about WordPress, though.

04:54 –> 04:57
And then, yeah, just in general, just so you all know,

04:57 –> 04:59
we’ve kind of got an evolving feature set.

04:59 –> 05:02
So, you know, if anyone’s ever looking for help with your website,

05:02 –> 05:05
we’d love to talk to you.

05:05 –> 05:08
But we can talk about that more later.

05:08 –> 05:11
For the too long didn’t read.

05:11 –> 05:14
Let’s just kind of set the stage again.

05:14 –> 05:19
Like you’ve all been hearing about the A.I. opportunities.

05:19 –> 05:21
But, you know, there is this issue of A.I.

05:21 –> 05:24
scraping your content without permission.

05:24 –> 05:27
They’re doing this to train their models.

05:27 –> 05:30
They’ve been doing it for years.

05:30 –> 05:35
But now, you know, that they’re public and we kind of know what they’ve been doing.

05:35 –> 05:41
We can kind of adjust our strategy and we’ll talk about some of the technicalities behind that.

05:41 –> 05:48
But, you know, it’s just at a high level, they are kind of stealing this information.

05:48 –> 05:55
They’re not necessarily directly serving up the articles as they appeared on your site.

05:55 –> 06:01
But they are ingesting the knowledge and then they can be queried, you know,

06:01 –> 06:06
to basically tell users about your news in theory.

06:06 –> 06:11
So like they could get a news report based on your articles, even though it’s not,

06:11 –> 06:15
you know, copying the article content exactly.

06:15 –> 06:16
That’s the issue.

06:16 –> 06:24
So I guess the big question for everyone on the call is how concerned are you that A.I.

06:24 –> 06:28
models are scraping your content without permission?

06:28 –> 06:31
And what about it concerns you?

06:31 –> 06:38
Is it, you know, the fact that, you know, you’d be maybe losing traffic to these things,

06:38 –> 06:45
these A.I. engines, if people start asking them questions, what is it about it that concerns you?

06:45 –> 06:52
And if we could get answers in the chat box to everyone, if you could send it to everyone,

06:52 –> 07:01
then we could all just see, you know, what brought you to this webinar, I think is the core to this question.

07:01 –> 07:07
And I know folks are probably typing their answers and they can, you know,

07:07 –> 07:10
we can definitely come back to this throughout.

07:10 –> 07:18
But, you know, I think it’s just becoming a bigger, bigger question that, okay,

07:18 –> 07:23
so Terry says loss of revenue as it relates to my content.

07:23 –> 07:31
So you’re you’re concerned about, yeah, basically less traffic, less subscribers.

07:31 –> 07:41
If, you know, these models are somehow absorbing all the news and then able to feed it back to their users, then you’ll get less users.

07:41 –> 07:51
Yeah. I mean, this is kind of like the idea that we always had behind Google and the way that we’ve always looked at and kind of dealt with Google bots,

07:51 –> 07:59
which is to use a robots.txt file to tell it where, you know, it can get the teaser for the article in the headline,

07:59 –> 08:04
but we don’t give Google the whole article for the exact same reason.

08:04 –> 08:10
So really, if your site is safe from Google and you may not know if it is, we can talk about how to find out.

08:10 –> 08:17
But if it’s safe from Google bots and you’re just protecting, you’re just giving away the teaser to them,

08:17 –> 08:25
then that’s all that you’re given to the AI models because they have essentially the same level of access.

08:25 –> 08:28
Correct. Yeah. Just am I saying that right?

08:28 –> 08:34
Yes, I was actually going to point out I came across another concern that that I know publishers might have.

08:34 –> 08:43
And that is your ranking because Google recently made an update where they call it the helpful content update.

08:43 –> 08:48
And one of the things they changed is it’s just helpful content for people.

08:48 –> 08:52
It used to be helpful content by people or people.

08:52 –> 09:00
So what that means now is that AI generated content does not get penalized as long as it’s curated and all that,

09:00 –> 09:08
all that kind of stuff, and it’s not duplicated. But so you might end up with AI generated stories ranking higher than your content.

09:08 –> 09:13
Right. On the search engine results. Right.

09:13 –> 09:22
Right. Yeah, exactly. And it’s like extra competition, you know, for for the for that spot for that search results.

09:22 –> 09:26
Right. Yeah, that’s kind of like a really important point.

09:26 –> 09:34
The fact that Google doesn’t discriminate AI written content because they know it’s going to just become the norm.

09:34 –> 09:41
But, you know, what it boils down to is getting credit for your work and being paid for your content.

09:41 –> 09:49
Just like Terry’s saying here. So, yeah, let’s continue to look at the problem here.

09:49 –> 09:54
Let’s continue to unfold this a little bit. So true or false question for the audience.

09:54 –> 10:01
And just if you can drop it in the chat or the Q&A, either box is fine.

10:01 –> 10:06
A paywall protects my content from a scraping. That’s the statement.

10:06 –> 10:13
Any type of paywall, metered paywall, as long as you have some type of paywall, you’re not going to get scraped.

10:13 –> 10:20
What do you all think? Fortunately, I don’t have the official poll ready to go.

10:20 –> 10:29
So all we have to work with is the chat box. Terry, can you I think you were able to.

10:29 –> 10:31
OK, good. We got some feedback from Terry.

10:31 –> 10:42
Just want to make sure everyone’s still got audio. False. OK.

10:42 –> 10:48
Well, Cliff’s got a false one, too. Yes. So we’ve got some savvy customers here. Right.

10:48 –> 10:52
I don’t want to give it away just yet, but yeah.

10:52 –> 11:02
So basically, the way I said it, maybe I gave it away a little bit. The answer is false.

11:02 –> 11:07
Any type of paywall will not necessarily protect your content.

11:07 –> 11:11
You really have to. It comes down to the source code, really.

11:11 –> 11:22
Like the user interface is not a good predictor because even if the story, even if it’s a hard paywall without a meter,

11:22 –> 11:31
and the stories blurred out, it all depends on how your CMS puts together that source code for the page.

11:31 –> 11:38
Because we’ve come across many examples where the full article is in the source code,

11:38 –> 11:45
even though it’s hidden by some CSS on the actual screen.

11:45 –> 11:49
The full article is there, which means the bots can get it. And that’s the bottom line.

11:49 –> 11:55
So this is a perfect example of what Chris has shown. These are some screenshots I just took from one of our customers.

11:55 –> 12:01
It’s probably you probably can’t read that, but that is a picture of the source code.

12:01 –> 12:09
It’s a section from our WordPress CMS. It’s called All in One SEO.

12:09 –> 12:18
It manages what the search engines see, basically, and it gives them what they need to index the site very directly.

12:18 –> 12:24
So if you look, basically, this is what the user sees, right? And it’s the teaser.

12:24 –> 12:29
And a lot of people are familiar with this interface. This is the standard now for paywalls, right?

12:29 –> 12:39
To have a teaser on the front end. But if I right click and inspect the HTML for this page,

12:39 –> 12:47
that’s all that you got from the article. You can search this whole source code and there’s no more strings from the article.

12:47 –> 12:56
So you take and a good way to test this for yourself is to maybe take the last sentence of the article that’s behind the paywall

12:56 –> 13:02
and search for it in the source code of the page that’s behind the paywall.

13:02 –> 13:07
OK, so you can’t be logged in as a user because that’s what the bots are seeing.

13:07 –> 13:13
They’re just showing up at the site. They’re hitting the paywall. Yes, but they’re really looking at the source code.

13:13 –> 13:18
They don’t see the paywall like humans do. You know, they’re they’re just all about the code.

13:18 –> 13:28
So that’s just something to recognize. And I thought it might be interesting to go through some of our attendees sites,

13:28 –> 13:32
anyone that wants to have their site tested. We can do this right now.

13:32 –> 13:43
We can just do a quick audit and show you basically what what information you’re given to the bots that you may not be aware of.

13:43 –> 13:47
Cliff, I can tell you your site’s on WordPress with Our-Hometown.

13:47 –> 13:53
So all of our sites work the same. Sabrina, same thing for you.

13:53 –> 13:59
But let’s see this one here, this example on the left, we were looking at earlier.

13:59 –> 14:06
This is from a paper, I think, in Illinois.

14:06 –> 14:10
I don’t think they’re on the call with us as an attendee. They did register.

14:10 –> 14:14
So we kind of looked at it ahead of time. But tell us what we’re looking at here, Chris.

14:14 –> 14:19
This is basically two different types of paywalls. Right.

14:19 –> 14:28
Yeah, the one on the left is a simple overlay. And it may be a little hard to tell, but you can see there’s kind of a square behind the overlay.

14:28 –> 14:37
And that’s the slideshow for a sports story. And what this is telling us is even though we’re being prompted to log in and or pay,

14:37 –> 14:45
you can tell that the entire story has already been loaded into the user’s browser in the background,

14:45 –> 14:50
which basically means the paywall is hasn’t blocked anything.

14:50 –> 14:59
Everything’s already been delivered. And now you just have this little thing over here that that is not too hard to to get around.

14:59 –> 15:06
I’m going to do a quick swap here so I can demonstrate the live demo on this site.

15:06 –> 15:10
So, yeah, this is I’m going to open up the site in Google Chrome.

15:10 –> 15:16
Yeah. OK. And actually, I’ll even go back so you can see what it looks like.

15:16 –> 15:20
So this was just this is the Iroquois County Times Republic.

15:20 –> 15:25
Go to their sports rap. Actually, there was one that had more photos in here.

15:25 –> 15:29
Yeah, this is the one has 14 photos in it. OK.

15:29 –> 15:35
And you see you could even see the content pop up before the overlay did.

15:35 –> 15:38
Right. The contents already been there. Yeah. Yeah.

15:38 –> 15:43
And so I found this simple extension that turns on the reader view,

15:43 –> 15:46
which basically gets rid of all the extra styling and just gives you the content.

15:46 –> 15:51
And since it’s already been delivered to the browser with one click. Yeah.

15:51 –> 15:59
I buy paywall. Right. Here’s every picture and everything that was behind the paywall.

15:59 –> 16:05
Yeah, basically that reader, whatever that extension you you got there,

16:05 –> 16:09
that’s allowing you to see it like the bots do in a way.

16:09 –> 16:15
It’s like I can ignore all the styling because, you know, they can just put that out.

16:15 –> 16:23
So, yeah, that’s that’s a very good demonstration of not only how a reader could get around

16:23 –> 16:29
the paywall very easily, just with some HTML knowledge or knowledge of this extension,

16:29 –> 16:32
but definitely how all these bots are getting around it.

16:32 –> 16:41
So that that would contrast to the Herald, which is it’s displaying exactly what is in the code.

16:41 –> 16:47
There’s there’s no extra content in the, as you said, loaded into the user’s browser.

16:47 –> 16:50
The only on the Herald exactly what’s there.

16:50 –> 16:56
If you if you save it or download it or go into the source code and copy out what’s been delivered.

16:56 –> 17:02
It’s exactly what you see. There’s no no background download of the full content or anything.

17:02 –> 17:08
It’s only getting just the excerpt and then it’s prompting for the subscription or login.

17:08 –> 17:16
Right. Exactly. OK, so we can move on to the robot. Oh, yeah, sorry. Sure.

17:16 –> 17:22
Oh, yeah. No, no, no problem. Just if you have any questions on that

17:22 –> 17:28
or if you want us to look at the source code and just do a quick analysis on your site,

17:28 –> 17:31
we could also do that at the end of the webinar.

17:31 –> 17:35
You’ll just post a link in chat. We can we can take a peek. Exactly.

17:35 –> 17:38
Yeah, we’ll show you our process. It’s pretty straightforward.

17:38 –> 17:42
You could do it yourself, but we’re happy to help.

17:42 –> 17:48
So now, yeah, we jumped ahead. OK, so what would be the next thing?

17:48 –> 17:53
I guess the true or false. You want to do the next poll here? Yeah, why don’t we?

17:53 –> 17:58
Well, actually, how about this? How about we go back? Maybe go back to that news.

17:58 –> 18:09
This guy here, right? Right. Yeah. So we so people have been.

18:09 –> 18:16
Basically, OpenAI is one of the big GPT founders,

18:16 –> 18:21
and they recently released information on their own bot

18:21 –> 18:29
and also gave instructions on how you can opt out of being scraped by it by using by

18:29 –> 18:35
calling out the user agent, which is it’s called GPT bot, I think is what it’s called.

18:35 –> 18:39
And and then you can just block it.

18:39 –> 18:42
And so the top, I don’t know, 500 or so sites in the world,

18:42 –> 18:45
I think about 20 percent of them by now have already started doing that.

18:46 –> 18:48
Well, let’s ask the audience. That’s a great question.

18:48 –> 18:52
I didn’t think of that. Is anyone on the call that as far as you know,

18:52 –> 18:57
have a robots that text file that blocks GPT bot?

18:57 –> 19:02
I’d love to hear

19:02 –> 19:05
because this is like the next level.

19:05 –> 19:09
I mean, you got paywalls to think about.

19:09 –> 19:14
But we’ve already pointed out some of the weaknesses of certain types of paywalls.

19:14 –> 19:18
So the next right step in defense is the robots.txt.

19:18 –> 19:24
Exactly. And the news here is that, like you said,

19:24 –> 19:30
OpenAI has kind of agreed to respect this.

19:30 –> 19:37
So any site that has it on there will not be incorporated into their language model.

19:37 –> 19:38
Is that the bottom line?

19:38 –> 19:42
Their content will not be part of the training.

19:43 –> 19:46
Right. Exactly. Exactly.

19:46 –> 19:50
And to go to Cliff’s point, even if you don’t have a subscription model,

19:50 –> 19:53
it could be a good idea to have the robots.txt

19:53 –> 19:57
because then, you know, normal people will interact with your content as normal.

19:57 –> 20:02
But then you can also just exclude these scraper guys from either

20:02 –> 20:06
making too many requests and slowing your site down or utilizing your content

20:06 –> 20:09
in a way that you wouldn’t want.

20:09 –> 20:13
Yeah, I’m looking at this comment, too.

20:14 –> 20:16
And it looks like everyone can read that.

20:16 –> 20:17
So that’s good.

20:17 –> 20:23
Now, you’re not worried about the content being stolen,

20:23 –> 20:28
although it feels as if there should be standardized rules about including sources.

20:28 –> 20:31
Right. Yeah. That’s the Terry’s point.

20:31 –> 20:37
It’s kind of like, yeah, as long as I get some of that traffic back,

20:37 –> 20:42
you know, you know, for them to read more about the article that

20:42 –> 20:46
is being referenced in the chat with the bar or whatever.

20:46 –> 20:49
Yeah, that’s that’s the Bing’s model.

20:49 –> 20:53
So they are kind of leading the way there.

20:53 –> 20:59
So now this text is this is an example of the robots.txt,

20:59 –> 21:01
just the code.

21:01 –> 21:05
Yeah, this is a file just and it’s called robots.txt.

21:05 –> 21:08
It’s just a simple text file that sits in the very bottom root

21:08 –> 21:11
or the very top root, I guess you’d say of your website.

21:11 –> 21:16
You can submit it to Google and then also it’ll just live on your website.

21:16 –> 21:23
And any bots that are upstanding will respect these rules.

21:23 –> 21:27
And you can even see how granular you can get where you can allow certain

21:27 –> 21:32
portions of your content, i.e. categories, things like that,

21:32 –> 21:36
or which is what the first one is doing, saying that any user agent

21:36 –> 21:38
can go to see our crosswords.

21:38 –> 21:43
But as you see the two below that CC bot, that one’s the common crawl bot.

21:43 –> 21:46
Right. And that crawls everybody.

21:46 –> 21:49
And that was definitely has been used a lot.

21:49 –> 21:51
And then the GPT bot.

21:51 –> 21:56
And so basically disallow slash means you have no you we don’t want you

21:56 –> 21:58
to look at anything on the site at all.

21:58 –> 22:02
Same thing with the Internet Archive archiver.

22:02 –> 22:03
I think that’s the next one.

22:03 –> 22:07
And then with Twitter bot, you can you can allow certain things through.

22:07 –> 22:14
So what the robots text allows you to do is is kind of drive the bots

22:14 –> 22:18
to where you want them to have access to stuff and then keep their hands

22:18 –> 22:21
off of stuff that is more proprietary or shouldn’t be out there.

22:21 –> 22:23
You know, right. Right. Right.

22:23 –> 22:27
So you can keep the SEO things going and all that kind of stuff by not blocking

22:27 –> 22:31
stuff that is actually good for the bots to ingest.

22:31 –> 22:34
Right. Right. Interesting.

22:34 –> 22:38
OK, yeah. So I think

22:38 –> 22:41
there doesn’t seem to be

22:41 –> 22:47
much concern from the audience on this.

22:47 –> 22:55
But yeah, I think this is just like I said, it’s it’s similar to just blocking Google.

22:55 –> 22:58
I mean, really, like years ago, even a year ago,

22:58 –> 23:01
I would have thought that Google was going to be the first one to do this

23:01 –> 23:06
because they’ve been for years taking snippets from content.

23:06 –> 23:10
You know, like you Google the definition of a word.

23:10 –> 23:14
They’re no longer sending traffic to dictionary dot com or websters.

23:14 –> 23:20
It’s all just being answered right in the Google search results page.

23:20 –> 23:21
It’s just right there.

23:21 –> 23:24
So like that’s where I thought they were going.

23:24 –> 23:29
And that was kind of the logic behind blocking all but teasers.

23:29 –> 23:34
You know, all the all everything except what they absolutely need to,

23:34 –> 23:38
you know, index our content as news.

23:38 –> 23:42
You know, so this brings us to the true or false question.

23:42 –> 23:47
A robots.txt file legally protects your content from bot scraping

23:47 –> 23:50
and incorporation into LLMs.

23:50 –> 23:58
So based on this is kind of kind of hinted at this with my language a little bit.

23:58 –> 24:00
I may have given away a little bit.

24:00 –> 24:00
Right.

24:00 –> 24:05
Yeah, specifically referring to like the recent news with Sam Altman

24:05 –> 24:11
and saying that, you know, their bots will respect robots.txt.

24:11 –> 24:16
We’re getting faults, mostly false is which is correct.

24:16 –> 24:18
It’s not a legal thing yet.

24:18 –> 24:20
We are right on the bleeding edge of this stuff.

24:20 –> 24:27
This is just, I mean, them kind of saying out of respect for

24:27 –> 24:31
you know, content creators will do this.

24:31 –> 24:36
But this is so far from being the law, right, Chris?

24:36 –> 24:40
I mean, oh, yeah, they don’t really know what to do with a lot of this stuff.

24:40 –> 24:45
They’re they’re having Senate hearings where they’re asking tech leaders what to do.

24:45 –> 24:51
But it’s it’s a weird situation like the tech is kind of leading the way.

24:51 –> 24:57
As we can see here, they they they sort of volunteered to do this.

24:57 –> 25:04
And this is, of course, after OpenAI was founded like 2015.

25:04 –> 25:07
And we’re at GPT-4.

25:07 –> 25:10
They’ve already gone through two, three and three point five.

25:10 –> 25:14
And they said nothing.

25:14 –> 25:19
So nobody knew what I mean, these data sets were out there, but we didn’t.

25:19 –> 25:19
Nobody’s known.

25:19 –> 25:23
And now we’ve gotten all the way through GPT-4 eight years later.

25:23 –> 25:25
And now they’re like, oh, yeah, here you go.

25:25 –> 25:28
Maybe now we’ll give you a choice.

25:28 –> 25:31
Right. Yeah. Yeah.

25:31 –> 25:35
I mean, the way I look at it and, you know, this isn’t really

25:35 –> 25:39
an official.

25:39 –> 25:43
Position yet, but I mean, just as a company,

25:43 –> 25:50
we look at it like you should do everything you can to lock this stuff down,

25:50 –> 25:57
because, you know, we want to be able to create our own AI interface.

25:57 –> 26:01
We’re this is something that we’re working on at Our-Hometown, our own chatbots,

26:01 –> 26:05
because AI is just going to sweep through everything, it seems.

26:05 –> 26:09
It’s like we’ve known that it’s going to do that for years.

26:09 –> 26:11
And now that it’s here,

26:11 –> 26:16
you know, it’s kind of like the way I see it is we want to fight fire with fire.

26:16 –> 26:19
I mean, they’re they’re going to get a lot of stuff.

26:19 –> 26:22
Like they could get stuff about your town from other publications

26:22 –> 26:24
that aren’t following these standards.

26:24 –> 26:29
But if you are doing good journalism

26:29 –> 26:33
and then you have some, you know, AI assistance on the site,

26:33 –> 26:39
then people aren’t going to query just chat GPT for local news.

26:39 –> 26:40
They’ll go to your chatbot.

26:40 –> 26:45
It’s a little bit of a tangent, but like that’s that’s just kind of at a high level

26:45 –> 26:50
what I’m trying to position our publishers as.

26:50 –> 26:54
So should we talk about SEO?

26:54 –> 27:01
Yes, because this is a big question from actually the Wilson County News.

27:01 –> 27:08
We had really just on the nose question, like in very timely, because

27:08 –> 27:11
basically Google was announcing that they’re going to.

27:12 –> 27:19
I think the big announcement was they’re going to start incorporating AI into their

27:19 –> 27:24
basically content or results.

27:24 –> 27:29
But yes, the enhanced search results are going to have like some AI blurb stuff

27:29 –> 27:33
alongside a couple of links to sources.

27:33 –> 27:39
And then the normal search engine order starts below that header.

27:39 –> 27:42
Right, right.

27:42 –> 27:46
Yeah, so I guess the point I wanted to just start off with here is that

27:46 –> 27:52
Google has basically been stealing content for years from many publishers

27:52 –> 27:57
unless you have these server side paywall that we demonstrated earlier

27:57 –> 28:00
that only gives the teaser away.

28:00 –> 28:05
So they’ve been doing this, but from.

28:05 –> 28:09
From everything we can tell, they don’t seem to discriminate

28:10 –> 28:14
between full articles and those that are just providing the headline and teaser.

28:14 –> 28:21
All that they really care about is, is this URL a news article or not?

28:21 –> 28:23
You know, is the site map

28:23 –> 28:30
all filled out so I know where the header is and where the article begins?

28:30 –> 28:32
I don’t need to know what’s actually in the article.

28:32 –> 28:35
I just need to know where it starts, that kind of stuff.

28:35 –> 28:39
And those are the things that all in one SEO provides.

28:40 –> 28:44
So that’s really been our policy with Google is just to provide

28:44 –> 28:49
the minimum information so they’re not ingesting the content into their own AI

28:49 –> 28:53
and feeding it back in their own way.

28:53 –> 28:59
And the bottom line is the AI bots have the same level of access as Google bots.

28:59 –> 29:03
So if you are restricting Google bots

29:03 –> 29:07
access to full articles, then you’re already restricting the AI bots.

29:07 –> 29:12
So I think. Really, SEO,

29:12 –> 29:15
there’s a lot of other things that are more important to SEO

29:15 –> 29:17
than giving the full article.

29:17 –> 29:19
So I mean, that’s that’s the best we can tell right now.

29:19 –> 29:22
We Google is a big black box.

29:22 –> 29:26
We don’t really know how the engine works, but, you know,

29:26 –> 29:29
just based on years of following this policy,

29:29 –> 29:32
you know, our customers get great traffic from Google.

29:32 –> 29:35
I mean, there’s there’s not much.

29:36 –> 29:38
More than we can do to improve that.

29:38 –> 29:43
So I think that’s that’s basically our position.

29:43 –> 29:48
Does anyone else have any if anyone else has any like experience with this

29:48 –> 29:52
or perspective on it, I’d be interested.

29:52 –> 29:56
You know, if there’s been any if you’ve changed your settings

29:56 –> 29:59
and seen any difference in traffic from Google, but.

29:59 –> 30:05
Yeah, that’s that’s kind of like the operation

30:05 –> 30:07
the opportunity that we’ve seen.

30:07 –> 30:10
You can just sneak right in here if you follow,

30:10 –> 30:15
you know, these steps, you can kind of strike that balance

30:15 –> 30:19
you know, between protecting, you know, because you can go all the way

30:19 –> 30:22
to the extreme and just totally lock down everything.

30:22 –> 30:27
And, you know, you’ll be safe from AI, but no one will ever see your stories.

30:27 –> 30:32
So it’s it’s all just kind of like finding where to be on that continuum.

30:34 –> 30:37
I don’t know. I guess like this kind of does change the perspective

30:37 –> 30:40
a little bit on metered paywalls.

30:40 –> 30:43
I, you know, for years I’ve been saying

30:43 –> 30:46
and we’ve been saying as a company, a metered paywalls,

30:46 –> 30:49
you know, at a minimum, you should have that.

30:49 –> 30:54
This kind of suggests that maybe the hard paywall.

30:54 –> 30:59
We want to go back in that direction, which was the standard for years.

30:59 –> 31:03
Do you have any thoughts on that, Chris?

31:03 –> 31:08
Well, basically, if let’s say let’s say you allow

31:08 –> 31:14
Google to do its normal indexing and it gets the snippets and the excerpts.

31:14 –> 31:19
But if you if you put a blanket ban on the chat,

31:19 –> 31:24
GPT bot and stuff like that, as long as they’re respecting that robots.txt,

31:24 –> 31:28
it doesn’t matter about the paywall.

31:28 –> 31:32
Right. So you can so you can keep your your your your relationship

31:32 –> 31:37
to the SEO side of things and then still block at least open

31:37 –> 31:40
AI or like Twitter’s got their bots.

31:40 –> 31:43
X it is there or anybody else who’s going to have their bots out there.

31:43 –> 31:46
And the thing about let me go.

31:46 –> 31:49
I’ll go back to the the New York Times one

31:49 –> 31:54
because because of the amount of granularity you can put in there.

31:54 –> 31:58
You can kind of tailor it so that you’re getting the both best of both.

31:58 –> 32:01
You’re you’re keeping out what you don’t want.

32:01 –> 32:03
And you’re only letting in the sliver

32:03 –> 32:06
so that you get your

32:06 –> 32:13
your benefits, because I mean, I described this whole robots.txt thing.

32:13 –> 32:14
It’s kind of market driven.

32:14 –> 32:18
It’s a benefit for Google and it’s a benefit for the publishers.

32:18 –> 32:22
And what’s happened is this AI kind of thing is really upended the balance.

32:22 –> 32:26
And so right. So now we’re kind of going in and you can now

32:27 –> 32:31
kind of reimagine how to handle just the gateways

32:31 –> 32:34
and your content through the robots.txt.

32:34 –> 32:39
And then on top of that, the paywall will take care of the human aspect

32:39 –> 32:43
of things most exactly. Yeah, that’s a good way to put it.

32:43 –> 32:47
Yeah, like technically speaking,

32:47 –> 32:51
with a meter paywall.

32:51 –> 32:55
A bot, a rogue bot could still get your your content, but

32:56 –> 33:00
practically speaking, they are, you know, going out there

33:00 –> 33:02
and making these statements that they’re not going to do that.

33:02 –> 33:04
So you should. So it shouldn’t.

33:04 –> 33:08
Yeah, it should leave you alone completely, regardless of your paywall.

33:08 –> 33:12
Because because actually the first thing that the bot looks at

33:12 –> 33:16
is the robots.txt. That’s the first line of defense.

33:16 –> 33:19
So any time a robot’s going in and it’s trying to get permission.

33:19 –> 33:20
That makes sense.

33:20 –> 33:23
Or is doing is doing a request from the site. Right.

33:23 –> 33:25
It immediately goes to that one first.

33:25 –> 33:29
So, yeah, that’s first line and then and then paywall for the other parts.

33:29 –> 33:35
You made a really interesting point earlier, and I want to just tease it apart

33:35 –> 33:39
a little bit. You said Google used to benefit from robots.txt.

33:39 –> 33:43
To me, I would think I don’t see how they would benefit

33:43 –> 33:45
because it’s it’s just restricting them.

33:45 –> 33:50
But how how would they benefit by having this standard?

33:50 –> 33:54
Just because they know where to look then. Yeah.

33:55 –> 33:59
It just yeah. OK, so like if you OK.

33:59 –> 34:02
Yeah, because I just think in Google’s mind, they want everything open.

34:02 –> 34:08
And why would they make it easier for people to to block stuff?

34:08 –> 34:11
And I guess it’s just the market pushing back, like you said.

34:11 –> 34:15
So yeah. Yeah. Interesting. OK. OK.

34:15 –> 34:18
Yeah, that’s pretty crazy.

34:18 –> 34:22
And it looks like we have Kristin now on with us.

34:22 –> 34:23
Thank you for joining.

34:23 –> 34:27
She was the one with the question about balancing SEO.

34:27 –> 34:32
So hopefully that was all clear.

34:32 –> 34:35
And, you know, we can definitely answer any other questions you have on it.

34:35 –> 34:40
Terry says, if a bot doesn’t honor the robots.txt boundary

34:40 –> 34:45
with a metered paywall, would they then act as a consumer

34:45 –> 34:51
where it gets free articles, enabling it to scrape a certain number of full articles?

34:52 –> 34:55
That’s actually what I thought in the beginning.

34:55 –> 35:00
But what we were just saying is actually that the robots.txt file

35:00 –> 35:04
is the first thing that they see. So they can still ignore it, though.

35:04 –> 35:06
They can still ignore it.

35:06 –> 35:10
So to get to Terry’s point, if let’s just say, let’s just call them hackers.

35:10 –> 35:12
Let’s just say you’ve got an actual hacker.

35:12 –> 35:16
Right. They can they can ignore the robots.txt.

35:16 –> 35:18
They can spoof their IP addresses.

35:18 –> 35:20
They can do all kinds of stuff.

35:20 –> 35:24
And so that’s why we have something like CloudFlare or Kismet,

35:24 –> 35:26
some of these bot monitoring things.

35:26 –> 35:29
And we weren’t we weren’t going to get into too much of that

35:29 –> 35:31
because that’s basically server side infrastructure that we use

35:31 –> 35:33
to protect all of our websites.

35:33 –> 35:38
So there’s nothing specific for the customers to deal with there.

35:38 –> 35:42
But but yeah, that I mean, if you have a nefarious actor,

35:42 –> 35:45
they can get around it. Absolutely.

35:45 –> 35:48
Right.

35:48 –> 35:51
Yeah, it’s interesting. I feel like

35:51 –> 35:57
I’m like thinking a lot like my father did back in like 90s,

35:57 –> 36:00
a ninety nine when Google was coming out, like he was

36:00 –> 36:04
he was super anti Google, even though everyone was super pro.

36:04 –> 36:07
They could only see the positives now.

36:07 –> 36:10
And he always used to talk about the hard paywall.

36:10 –> 36:12
Like he was really pushing for that.

36:12 –> 36:16
I don’t think we need to, you know, go back to that.

36:16 –> 36:20
But if you really are, you know, paranoid about stuff being stolen,

36:20 –> 36:24
if you really fear what I could do with your content,

36:24 –> 36:30
the only true lockdown is the robots.txt and a hard paywall.

36:30 –> 36:33
And another thing we could think of with

36:33 –> 36:38
with what Terry was pointing out, if you change the metered paywall

36:38 –> 36:42
so that you have to provide an email address

36:43 –> 36:47
to even get the those limited number of free articles.

36:47 –> 36:50
That’s another deterrent against the bots.

36:50 –> 36:53
So you instead of having to give up the meter paywall completely,

36:53 –> 36:57
you could add just one layer of identification

36:57 –> 37:00
that would determine the bots.

37:00 –> 37:05
Yeah, I mean, it’s just I just got to say it’s the world we live in.

37:05 –> 37:10
You know, it’s like the problem of our day, like bots.

37:11 –> 37:13
It’s crazy how this has crept up on us.

37:13 –> 37:15
It’s in the news because of Twitter.

37:15 –> 37:18
You know, he’s he’s making a popular subject.

37:18 –> 37:22
But yeah, I mean, that’s kind of where we’re at.

37:22 –> 37:25
We’re like almost like in this matrix kind of.

37:25 –> 37:28
Well, it’s not one of our earlier web webinars.

37:28 –> 37:33
We came across the factoid like over half of all of the content

37:33 –> 37:36
on the Internet is bought and generated or bought operated.

37:36 –> 37:38
Yep. Yep. Exactly.

37:38 –> 37:43
So then it comes down to just like human or bot.

37:43 –> 37:49
And so, you know, it’s just it’s like the Swiss cheese approach to security.

37:49 –> 37:54
Like no one level of security is going to protect against everything.

37:54 –> 37:58
So you’ve got to have multiple layers like stacking Swiss cheese

37:58 –> 38:01
on top of each other. So the holes don’t line up.

38:01 –> 38:05
So, yeah, you got the robots file, you have the meter paywall.

38:05 –> 38:10
And then maybe on top of that, you have a registration for the meter,

38:10 –> 38:14
which I believe the wave does a version of that

38:14 –> 38:18
rockaway of dot com if people wanted to see an example of that.

38:18 –> 38:24
But just to to what Elon is doing with X, he’s taking it one step further.

38:24 –> 38:28
So instead of having to just provide a registration, right?

38:28 –> 38:33
Right. He’s actually going to charge a small like micro fee.

38:33 –> 38:35
I mean, maybe two or three dollars.

38:35 –> 38:38
But then that adds the extra layer of a payment method

38:38 –> 38:41
to also keep the bots out.

38:41 –> 38:43
So those are those are kind of like the

38:43 –> 38:47
yeah, as you’re describing the different layers that you apply

38:47 –> 38:51
to kind of cover all the holes that maybe one one layer does it. Right.

38:51 –> 38:57
Yeah, it’s interesting how like paying money is like the one thing that

38:57 –> 39:01
AI is going to have a hard time getting around like it can’t just print money

39:01 –> 39:05
because you prevent the scale, you prevent them from the scale.

39:05 –> 39:07
Right, right.

39:07 –> 39:12
Yeah, so he’s dealing with like a whole nother level like it like that.

39:12 –> 39:16
We we are more, you know, I don’t think

39:16 –> 39:22
we need to I mean, we have, you know, obviously paying subscribers already.

39:22 –> 39:26
So like he’s kind of backing into this with this free service,

39:26 –> 39:28
which is not the situation we’re in. So

39:30 –> 39:32
this is.

39:32 –> 39:36
Basically, a summary of what publishers should do.

39:36 –> 39:39
We’ve talked about both of these things

39:39 –> 39:43
at the bottom, the technical things, but tell us a little bit about

39:43 –> 39:47
the Press Association

39:47 –> 39:51
representation there, Chris, what have you read about that?

39:51 –> 39:55
Oh, let’s see, there was.

39:55 –> 39:57
Bring this slide up.

39:57 –> 40:02
So there have been several different advocacy groups that are advocating for

40:02 –> 40:06
sorry for the redundancy there groups that are advocating for different.

40:06 –> 40:14
Slots in the market, so there’s journalists, there’s obviously

40:14 –> 40:18
we’ve all heard about the the Actors Guild, the Writers Guild

40:18 –> 40:19
and everything like that.

40:19 –> 40:22
So as far as I know, there’s not a specific

40:24 –> 40:29
push by by the by the Press Association’s.

40:29 –> 40:33
But there are other organizations that are moving in this direction.

40:33 –> 40:37
And I think maybe kind of setting a path for where

40:37 –> 40:42
Press Association’s could could go with these things. Right.

40:42 –> 40:45
And actually, I’ll come back to.

40:45 –> 40:50
Let’s see. I think we have Cindy from Illinois Press Association.

40:50 –> 40:52
So maybe this would.

40:54 –> 40:57
So like you’re saying

40:57 –> 41:02
if Press Association’s could do where that one was contact.

41:02 –> 41:08
Well, I guess it’s just like what what can be done like specifically?

41:08 –> 41:13
Well, the thing of going through the Press Association is for lobbying

41:13 –> 41:16
because it’s for the legislative side of things.

41:16 –> 41:20
So, yeah, I mean, I don’t know, Cindy, if you have any comment on that.

41:20 –> 41:22
Is that something that

41:23 –> 41:27
IPA has access to like these

41:27 –> 41:30
I don’t know, lobbying channels?

41:30 –> 41:32
It’s not really my.

41:32 –> 41:35
Well, I mean, because all these press associations were lobbying

41:35 –> 41:37
for the journalism protection, local journalism protection.

41:37 –> 41:39
Exactly. That was going around.

41:39 –> 41:42
So I’m thinking there’s going to be stuff like that going on.

41:42 –> 41:48
That just that that level of advocacy.

41:48 –> 41:50
It was another. Right.

41:50 –> 41:52
So, yeah, I mean, this.

41:53 –> 41:55
OK, I definitely want to talk about

41:55 –> 42:00
just more in that direction, but.

42:00 –> 42:05
And here’s here’s a group of media.

42:05 –> 42:08
This is a group of media associations. Right.

42:08 –> 42:09
And they release.

42:09 –> 42:14
This is just an outline I pulled, but they released these ethical principles.

42:14 –> 42:20
So this is kind of where I think the publishers going into the press associations.

42:20 –> 42:23
This is kind of the real deal.

42:23 –> 42:27
So they’re negotiating for compensation when your content is used.

42:27 –> 42:30
Regulations and corporate responsibility,

42:30 –> 42:35
trustworthiness, actually having transparency with the developers.

42:35 –> 42:38
So we know what’s in the training data

42:38 –> 42:42
and actually use high quality data instead of I don’t know.

42:42 –> 42:47
I mean, should we really be scraping Reddit for quality AI data?

42:47 –> 42:49
I’m not sure. Right.

42:49 –> 42:53
And then also be able to monitor them for how well they’re they’re behaving.

42:53 –> 42:55
You know, right. Right.

42:55 –> 42:57
So Cindy says our board is looking into this

42:57 –> 42:59
and trying to wrap their heads around it.

42:59 –> 43:03
That’s pretty much what we’re doing. Yeah. Right. Right. Yeah.

43:03 –> 43:09
So I mean, I just got to read some of Andrea’s comment.

43:09 –> 43:13
Yeah. Yeah. Take a second on that one. Yeah. This is fantastic.

43:13 –> 43:15
This like sets it up because.

43:15 –> 43:18
Well, let me just read it just in case everyone hasn’t seen it.

43:18 –> 43:24
The nefarious actor thing interests me because for a lot of local news orgs,

43:24 –> 43:26
the competition isn’t necessarily global.

43:26 –> 43:30
There are tools that are not necessarily writing large, but writing small.

43:30 –> 43:35
They would allow local competitors, say a bad actor,

43:35 –> 43:38
crops up to scrape and resell your content. Exactly.

43:38 –> 43:43
Like imagine it could be just even like a Twitter handle.

43:43 –> 43:45
You know, they just start.

43:45 –> 43:47
It doesn’t have to be a website. It could be a newsletter.

43:47 –> 43:51
They start just scraping your articles, rewriting it.

43:51 –> 43:54
And then, you know, they say, hey, this is enough.

43:54 –> 43:56
You know, this is fresh copy. It’s the same story.

43:56 –> 43:59
Start to see that trending higher in search results.

43:59 –> 44:03
And you’re really like, what is going on? Right. Right. So.

44:03 –> 44:08
Yeah. So she goes on to say, you know, there are now ways

44:08 –> 44:14
it would be very easy for bad actors to build ways to drain other pubs.

44:14 –> 44:16
Yeah, that’s the I think another good point.

44:16 –> 44:19
It’s just a lot easier. Like this was always possible.

44:19 –> 44:23
I mean, I could buy a subscription to the Salt Lake Tribune

44:23 –> 44:27
and turn it into a newsletter, but that would be like a full time job.

44:27 –> 44:32
So yeah, the fact that you can just plug in an article and say rewrite it

44:32 –> 44:34
in your own words makes it easier.

44:34 –> 44:38
So it’s like Cliff’s actually got an example of one of those.

44:38 –> 44:43
Yeah. Yeah. It’s not ultra paranoid, says Cliff. I agree.

44:43 –> 44:46
I feel like we’ve seen this for years.

44:46 –> 44:48
There’s a site that does this around Columbus, Ohio,

44:48 –> 44:50
and presumably makes money off advertising.

44:50 –> 44:53
So Newsbreak. Yeah.

44:53 –> 44:58
Newsbreak has been doing this for a while.

44:58 –> 45:01
They. Yeah, we see a lot of

45:01 –> 45:07
issues with Newsbreak, a lot of tickets coming in related to that.

45:07 –> 45:10
How is content protected now in InDesign?

45:10 –> 45:13
Well, we can talk about InDesign, actually. I want to get to that.

45:14 –> 45:18
So let’s talk about now, like we’ve talked about the problem

45:18 –> 45:22
and some some things you can do to mitigate and protect yourself.

45:22 –> 45:26
But like the real solution are these ethical principles

45:26 –> 45:29
like standards in the industry.

45:29 –> 45:34
Maybe it can be led by Cindy and IPA and, you know, other press associations.

45:34 –> 45:39
But like let’s talk now, Chris, you have some great slides about like

45:39 –> 45:43
some of the solutions that are out there already with Adobe

45:43 –> 45:47
and let’s let’s talk about that. OK, perfect.

45:47 –> 45:51
So this is this is specifically for images.

45:51 –> 45:56
But as the transparency opens up and

45:56 –> 46:01
and some of these guidelines and things may be regulated into action,

46:01 –> 46:05
more people will follow along in this. But so

46:05 –> 46:10
Adobe has now released their AI image generation product,

46:10 –> 46:12
Firefly, into commercial release.

46:13 –> 46:16
And that means you can use them for anything.

46:16 –> 46:19
And one of the things that they were

46:19 –> 46:24
emphasizing through the entire beta period was that the image generation

46:24 –> 46:31
model was only trained on their pre-existing Adobe stock collection

46:31 –> 46:34
and then some other licensed stock collections.

46:34 –> 46:40
So you have no worry about about any kind of litigation using these tools.

46:40 –> 46:45
The fun part, though, is when you think, oh, so what about all those artists

46:45 –> 46:50
who submitted their their photographs and illustrations to stock in the first place?

46:50 –> 46:52
Then Adobe scrapes that and makes this new tool.

46:52 –> 46:56
Well, what Adobe is going to do is pay them

46:56 –> 47:01
because they’ve been keeping track of every piece of art that was submitted

47:01 –> 47:03
and then used to train Firefly.

47:03 –> 47:08
And they’re going to do a yearly dispersion of funds.

47:09 –> 47:10
And I even looked at it.

47:10 –> 47:15
I don’t have the stats right with me, but it’s like a few cents per however many

47:15 –> 47:17
hundred licensees or whatever.

47:17 –> 47:20
So it’s not huge, but it’s basically the same

47:20 –> 47:25
royalty share that was happening with the original stock program.

47:25 –> 47:30
But now that AI has come into it, if your AI, if your artist

47:30 –> 47:32
being used to train the AI, you also get paid for that.

47:32 –> 47:39
And then what’s even more fun is anyone who generates art

47:39 –> 47:44
with Firefly and then goes on to create something with it using

47:44 –> 47:49
further generative tools can then as long as it meets and they’ve

47:49 –> 47:53
published a new set of guidelines for AI submissions.

47:53 –> 47:57
But now artists can then submit that new

47:57 –> 48:03
AI generated derivative work and get it posted to Adobe Stock

48:03 –> 48:05
and start getting royalties off.

48:05 –> 48:08
So they’re basically saying, you create the content.

48:08 –> 48:11
We trained our model on your contents.

48:11 –> 48:12
We’re going to pay you for that.

48:12 –> 48:15
Now, if you make some something with that model.

48:15 –> 48:21
And then add your artistry to it, we’re going to accept that back again

48:21 –> 48:22
and keep the circle running.

48:22 –> 48:30
Yeah, so I thought to put this in terms for newspapers, it would seem like

48:30 –> 48:35
a piece of artwork would basically be a piece of news content.

48:36 –> 48:41
Now, there’s definitely a huge technical difference in how you watermark

48:41 –> 48:46
and find provenance for these types of things.

48:46 –> 48:51
So but as they get that, but they can still have a record in the database

48:51 –> 48:57
that saying that this URL was used to train this data from this data

48:57 –> 48:58
set was used to train this model.

48:58 –> 49:01
And if they keep track of that, there’s absolutely no reason.

49:01 –> 49:03
And this goes to the transparency thing.

49:03 –> 49:06
There’s absolutely no reason why they couldn’t keep track of that.

49:06 –> 49:10
And actually compensate people and have some kind of a negotiated thing.

49:10 –> 49:15
Kind of I mentioned ASCAP doing performance royalties for songwriters.

49:15 –> 49:21
It would be something along those lines so that you wouldn’t necessarily want

49:21 –> 49:25
to block everything once they start putting a system in place where they’re

49:25 –> 49:28
not just taking your stuff and doing whatever they want with it

49:28 –> 49:29
without you having any say.

49:29 –> 49:33
So I’m kind of seeing a lot of this protection as just.

49:34 –> 49:38
We need to put something in place now because it’s going to be a while

49:38 –> 49:42
for the Senate and all the meetings and the hearings and all these kind of things

49:42 –> 49:47
to develop actual legislation that will then come back and give us more

49:47 –> 49:51
insight into how we need to to relate to these tools.

49:51 –> 49:52
And maybe it’ll be easier.

49:52 –> 49:55
So if you don’t block this or maybe there’s something that goes in the robots

49:55 –> 49:59
text, it gives you a little signature that then goes in and that

50:00 –> 50:04
comes back around so you can get credited and perhaps paid for your content.

50:04 –> 50:07
So I’m excited about this.

50:07 –> 50:11
This just they just started rolling this out over the past two weeks.

50:11 –> 50:13
Right. Yeah.

50:13 –> 50:17
I mean, if anyone has experience using Firefly,

50:17 –> 50:23
some of the new Adobe AI products, I’d love to hear about your experience there.

50:23 –> 50:27
But so this this is kind of, I guess,

50:27 –> 50:32
what we’re talking about earlier, just like, well, as you said, locking it down.

50:32 –> 50:36
And I mean, anyone can can build their own

50:36 –> 50:40
AI now like there’s these, you know, adults out there.

50:40 –> 50:42
So that’s what Adobe’s done.

50:42 –> 50:44
They’ve taken.

50:44 –> 50:49
They’re creating a proprietary AI, kind of like what we’ve done with WordPress.

50:49 –> 50:52
It’s open source WordPress, but we’ve specialized it for newspapers.

50:54 –> 50:58
And then there’s this walled garden kind of environment

50:58 –> 51:00
where everything lives in there.

51:00 –> 51:06
The payments are the circular analogy that you kind of made there was good.

51:06 –> 51:08
Like it really is like a.

51:08 –> 51:12
So it’s going to get it’s going to like just accelerate creativity to get.

51:12 –> 51:14
I mean, so they do have some guidelines.

51:14 –> 51:16
I’ll go across a couple.

51:16 –> 51:19
You can’t submit anything that’s a caricature of a real person.

51:19 –> 51:22
You’re not supposed to do stuff with real places.

51:22 –> 51:25
It needs to be imaginary kind of thing.

51:25 –> 51:29
So, yeah, you can’t do the Trump in a Superman costume or something.

51:29 –> 51:31
They’re that’s not going to fly.

51:31 –> 51:34
So it has to be more more traditional style stock stuff.

51:34 –> 51:36
Like if you. Yeah.

51:36 –> 51:39
Like if you just kind of built some railroad going through a forest

51:39 –> 51:42
or something like that and it looks really cool with nice lighting.

51:42 –> 51:45
That would be a good a good candidate for submission.

51:45 –> 51:51
I do want to address a couple of points that we got on that got us started on Adobe.

51:51 –> 51:56
And that is how is Adobe protecting or allowing you to to sign your content?

51:56 –> 52:00
Right. And I don’t have a slide for this, but I’ll give you just a quick breakdown.

52:00 –> 52:06
Any image is generated by Firefly has something embedded called content credentials.

52:06 –> 52:11
And it’s using blockchain technology so that it can’t be tampered with.

52:11 –> 52:16
And it’s and it’s saved into a verification cloud partnership

52:16 –> 52:19
that they’ve they’ve built with some other

52:20 –> 52:25
companies, even like camera companies, Canon and Leica and stuff.

52:25 –> 52:29
They’re going to start building these these provenance blockchain based things

52:29 –> 52:33
so that these content credentials will start from the camera

52:33 –> 52:37
and end at your export of the photo.

52:37 –> 52:41
And your name is on it.

52:41 –> 52:44
And basically, let’s let’s go back to the idea.

52:44 –> 52:47
So if you get something from Adobe Firefly, it’s going to say

52:47 –> 52:49
Adobe Firefly generated this.

52:49 –> 52:53
Now you open it up on Photoshop, you use all the tools, even the generative ones,

52:53 –> 52:56
make some cool things, add some ponies, do whatever sky replacement,

52:56 –> 52:58
all that kind of stuff.

52:58 –> 53:03
Adobe will keep track of the high level changes that you’re making.

53:03 –> 53:06
It’ll say, oh, yeah, you generated some new stuff or you brought in another file

53:06 –> 53:11
to composite and all of that will be saved in this content credentials list

53:11 –> 53:15
so that then when you export that that credentials is now

53:15 –> 53:20
permanently assigned to that image and you can use image look up

53:20 –> 53:23
all kinds of different things.

53:23 –> 53:26
But if you use the verify tool that Adobe puts out there,

53:26 –> 53:30
you can actually even open up one of these things with the content credentials

53:30 –> 53:32
and you can actually see the different versions

53:32 –> 53:38
of what what was created and what was added and go back and see the original one

53:38 –> 53:42
that they got from Firefly versus what they did to resubmit it to stock.

53:43 –> 53:46
And yes, so that was speaking of images.

53:46 –> 53:50
But to get back to it, Adobe is going to be rolling out

53:50 –> 53:56
content credentials for, I believe, all of their creation apps.

53:56 –> 54:00
So it’s I think the next one is going to be video.

54:00 –> 54:03
But I would I would definitely think that

54:03 –> 54:06
that InDesign would be a candidate.

54:06 –> 54:10
But the thing is with InDesign is remember, if you’re if you’re talking about InDesign,

54:10 –> 54:13
are you what are you actually trying to protect?

54:14 –> 54:16
If you’re trying to protect the photography that’s in there,

54:16 –> 54:19
that’s taking care of content credentials.

54:19 –> 54:22
If it’s if it’s the PDF that gets exported,

54:22 –> 54:25
I’m hoping that they’ll do something with content credentials there

54:25 –> 54:30
because, you know, there’s already so much metadata in a PDF anyway.

54:30 –> 54:34
But I’m just not sure how the technology is going to work on that yet.

54:34 –> 54:37
I think it’s a little easier for images and they’re still going to be working

54:37 –> 54:39
on the other ones, but yeah.

54:39 –> 54:42
Interesting. Yeah.

54:42 –> 54:48
I mean, there’s we’ve got a couple other examples in here, but

54:48 –> 54:50
I mean, we’re coming up on an hour.

54:50 –> 54:54
So I just want to make sure that we have all our questions answered.

54:54 –> 55:00
And if anyone wants to see, you know, have us take a look at the source code

55:00 –> 55:04
to your website and just make sure that the paywall isn’t given away

55:04 –> 55:08
more than you think to the bots, we can definitely stick around.

55:08 –> 55:12
Just drop us a message in the chat that you’d like to do that.

55:13 –> 55:16
But I mean, there was was there anything else, Chris,

55:16 –> 55:19
that you wanted to show from the slides or?

55:19 –> 55:24
Well, let me bring let me bring one up because this this kind of came up

55:24 –> 55:30
because a lot of publishers, I think, are nervous about using AI

55:30 –> 55:35
as part of their creative process because they’re worried about copyright

55:35 –> 55:37
and things like that.

55:37 –> 55:42
So Microsoft, in addition, because actually Microsoft had their meeting

55:42 –> 55:46
this morning, and so the next version of Windows 11 is going to have their

55:46 –> 55:49
copilot sitting right there in the taskbar.

55:49 –> 55:55
So before they release that, they released this announcement saying that any use

55:55 –> 55:58
any commercial usage you you put their copilot

55:58 –> 56:01
models to the AI stuff and anything’s generated.

56:01 –> 56:05
If anybody comes after you for copyright,

56:06 –> 56:09
they’ll take care of the they’ll take care of you.

56:09 –> 56:12
And that’s that’s a crazy.

56:12 –> 56:15
It’s so odd. Yeah, it’s really weird.

56:15 –> 56:19
Yeah. So again, it’s just kind of like

56:19 –> 56:23
the tech companies leading the way and just like

56:23 –> 56:29
given, I don’t know, being proactive, I guess you could say.

56:29 –> 56:31
But it’s like

56:32 –> 56:36
they’re obviously benefiting so much from this

56:36 –> 56:40
to be able to make that claim.

56:40 –> 56:43
And I hadn’t really brought in any of the government stuff,

56:43 –> 56:45
but I did put this slide together.

56:45 –> 56:47
This is just from last week.

56:47 –> 56:51
So Schumer was the one that was leading the the national security

56:51 –> 56:56
Senate hearing, which is why it was closed, the one that had Elon and Zuck

56:56 –> 56:58
and Sam and all those guys.

56:58 –> 57:01
But he did a presentation the week before.

57:01 –> 57:04
And you can see there’s some very similar

57:04 –> 57:09
guidelines to what the the news association plan looks like.

57:09 –> 57:13
They have a little bit more on national security in there, but,

57:13 –> 57:17
you know, they’re really concerned from just it’s top down.

57:17 –> 57:20
It’s governmental level stuff.

57:20 –> 57:23
But that’s why from the bottom up, the grassroots,

57:23 –> 57:26
the press associations and stuff, you know, if they’re just going to be

57:26 –> 57:30
focused on on on AI powered drones and stuff,

57:30 –> 57:32
we want to make sure that they are aware of us

57:32 –> 57:36
and how AI is going to be affecting the publishing industry as well.

57:36 –> 57:39
So while they’re they’re making all these discussions and,

57:39 –> 57:42
you know, accountability, all these kind of things,

57:42 –> 57:46
I think it’s important that we we also.

57:46 –> 57:50
And this is this is where doing going through the copyright office.

57:50 –> 57:52
That was the other one I wanted to show. Where is that?

57:52 –> 57:57
Yeah, here it is. So this this just started this

57:57 –> 57:59
this month where this is another place where

58:00 –> 58:03
people who are interested and actually get their voices heard.

58:03 –> 58:08
This is going to be open until October 18th.

58:08 –> 58:11
It looks like no, I’m sorry, November 15th.

58:11 –> 58:16
And so you can there’ll be submitting basically what your concerns are

58:16 –> 58:19
with regards to AI and copyright.

58:19 –> 58:23
I mean, we haven’t talked about it too much, but I will touch on it very briefly

58:23 –> 58:26
is that if you use AI to generate content.

58:27 –> 58:31
Right. And you don’t do all that much with it.

58:31 –> 58:35
Then you can’t copyright it.

58:35 –> 58:38
Or the only thing you can copyright is whatever you actually did.

58:38 –> 58:42
Like if you had it write a bunch of poems and you collated the poems into a book,

58:42 –> 58:47
you could copyright the order of the poems, but not the content of the poem.

58:47 –> 58:49
You know what I mean? Right.

58:49 –> 58:52
It’s the same thing with the comic book, where someone wrote

58:52 –> 58:55
that wrote the text for the comic book, but all the images were made

58:55 –> 58:59
by an image generator can’t copyright any of the images,

58:59 –> 59:02
but you can copyright the storyline and the layout of the images in the book.

59:02 –> 59:07
So that’s just kind of in its very they’re just starting to do this.

59:07 –> 59:11
So that’s why they’ve opened this up to to public comment

59:11 –> 59:17
for artists, authors, news publishers, journalists to to to put in their

59:17 –> 59:20
their concerns. Right. Right.

59:20 –> 59:24
Yeah. If you do you have a link to that page handy

59:24 –> 59:28
or maybe we could send it out after in the follow up email also.

59:28 –> 59:31
I’ll type it into chat real quick.

59:31 –> 59:33
Just in the. Yeah, well, we’ll get to you.

59:33 –> 59:39
But I want to address Andrea’s check here.

59:39 –> 59:42
I really appreciate your questions and comments, Andrea.

59:42 –> 59:46
So we actually looked at your site already

59:46 –> 59:49
because we were expecting you.

59:49 –> 59:53
And it looks like it is.

59:53 –> 59:57
As I recall, Chris, they are actually locked down

59:57 –> 01:00:01
in terms of the source code.

01:00:01 –> 01:00:05
I think you you did a quick check on Chicago Daily Lob Bulletin.

01:00:05 –> 01:00:10
Could you maybe just bring up their site really quick

01:00:10 –> 01:00:14
and we’ll just show how we we test this?

01:00:14 –> 01:00:18
Yeah, let me do a reshare on that after I get it loaded.

01:00:18 –> 01:00:20
Yeah, no problem.

01:00:21 –> 01:00:24
I’m almost going to show you a little bit about the technical side

01:00:24 –> 01:00:26
because I’m going to have Chrome here.

01:00:26 –> 01:00:28
Yeah, and take a look.

01:00:28 –> 01:00:30
We’ll dig a little bit more in here.

01:00:30 –> 01:00:33
This would also probably be like a really good

01:00:33 –> 01:00:37
blog post, just like the steps to checking.

01:00:37 –> 01:00:41
All right. So let’s see what we’ve got.

01:00:41 –> 01:00:43
Oh, my gosh, we have we have we have a paywall.

01:00:43 –> 01:00:45
OK, we definitely have a paywall.

01:00:45 –> 01:00:47
Let’s try the reader thing.

01:00:47 –> 01:00:50
Right. OK, so this is good.

01:00:50 –> 01:00:52
The reader only get the excerpt.

01:00:52 –> 01:00:57
Right. So now let’s take a look.

01:00:57 –> 01:01:00
At this.

01:01:00 –> 01:01:09
I’m curious, Andrea, what is the CMS that your site’s built on?

01:01:09 –> 01:01:12
I couldn’t tell.

01:01:12 –> 01:01:14
Just looking at it.

01:01:14 –> 01:01:18
Briefly, before we jumped on.

01:01:18 –> 01:01:22
Yeah, no, I’m pretty sure this is this is not

01:01:22 –> 01:01:26
that one. No, that’s a picture.

01:01:26 –> 01:01:30
I know. I think actually let me take it out of reader mode.

01:01:30 –> 01:01:33
So in reader mode. No. Yeah, I don’t see it.

01:01:33 –> 01:01:36
I do not see it.

01:01:36 –> 01:01:40
G.N. for miles 33.

01:01:40 –> 01:01:42
OK, I haven’t heard of that.

01:01:42 –> 01:01:43
I’ll have to look into them. Thank you. Wow.

01:01:43 –> 01:01:45
G.N.

01:01:47 –> 01:01:49
Yeah, because I’m basically OK.

01:01:49 –> 01:01:51
So here here we go. Here’s the HTML.

01:01:51 –> 01:01:55
It’s in there. OK, so what I’m going to do,

01:01:55 –> 01:01:58
this is this will be a little silly, but bear with me for just a moment.

01:01:58 –> 01:02:02
I’m going to I’m going to basically create a local copy of the HTML

01:02:02 –> 01:02:04
that I just quote unquote scraped.

01:02:04 –> 01:02:08
Right from your website.

01:02:08 –> 01:02:11
Right. You’re pretending you’re the bot. Yeah.

01:02:11 –> 01:02:14
Yep. OK. OK.

01:02:14 –> 01:02:20
Chicago to OK.

01:02:20 –> 01:02:24
Now I’m going to try to find that.

01:02:24 –> 01:02:27
OK, you get to go to.

01:02:27 –> 01:02:32
So I’m going to I’m now I’m going to reopen that same HTML file in Chrome again.

01:02:32 –> 01:02:35
Right. And this is what it looks like.

01:02:35 –> 01:02:42
Right. So as you see that all you get is is the excerpt.

01:02:43 –> 01:02:45
So, yeah, this is this is nice and solid.

01:02:45 –> 01:02:47
This is the way you want it. Right. That’s a.

01:02:47 –> 01:02:50
So that would be like a server side paywall.

01:02:50 –> 01:02:54
That’s that’s essentially what we demoed on our Our-Hometowns, WordPress sites.

01:02:54 –> 01:02:57
It’s it’s been our standard for years.

01:02:57 –> 01:03:00
You know, like like I said, many people wanted to just kind of give

01:03:00 –> 01:03:01
everything away to Google.

01:03:01 –> 01:03:06
It’s like anything to get me traffic, but they really don’t need the full story.

01:03:06 –> 01:03:10
So it looks like your CMS provider set it up right.

01:03:10 –> 01:03:13
So that’s good for you.

01:03:13 –> 01:03:18
Let me think here.

01:03:18 –> 01:03:23
I don’t think we’re just about an hour and I usually like to kind of wrap it up now,

01:03:23 –> 01:03:26
unless there’s any other questions.

01:03:26 –> 01:03:27
I think we showed everything.

01:03:27 –> 01:03:30
I know, Chris, you prepared a lot of stuff,

01:03:30 –> 01:03:33
but did we hit all the the main points that you wanted to?

01:03:33 –> 01:03:37
Yeah, I mean, I had some stuff, more, you know, stuff from the tech guys

01:03:37 –> 01:03:40
that are running things and comments, but it’s nothing to it’s all kind

01:03:40 –> 01:03:43
of fairly democratic language.

01:03:43 –> 01:03:44
It’s not all that right.

01:03:44 –> 01:03:45
I’ll take or anything.

01:03:45 –> 01:03:48
You know, the high. Oh, I got you.

01:03:48 –> 01:03:48
Yeah. Yeah.

01:03:48 –> 01:03:54
We’ll probably want to do a follow up on this because it’s, you know, it’s an evolving thing.

01:03:54 –> 01:03:58
Yeah. I’ve got a there’s just a quick couple of comments from Elon right after the Senate meeting

01:03:58 –> 01:04:01
and and Bill did a little thing there as well.

01:04:01 –> 01:04:05
But that’s not the most exciting thing there.

01:04:05 –> 01:04:07
Oh, is it? Yeah. Yeah. No problem.

01:04:07 –> 01:04:10
Well, what we’ll do is we’ll send out the whole presentation.

01:04:10 –> 01:04:16
It’s got a lot more detail on like this, you know, government level stuff that’s happening.

01:04:16 –> 01:04:21
I really wanted to, you know, focus the time that we had together today on like

01:04:21 –> 01:04:26
just the stuff you can look at with your own site, like really close to home.

01:04:26 –> 01:04:29
But yeah, there’s there’s a lot going on.

01:04:29 –> 01:04:33
And, you know, we’re trying to keep our finger on the pulse and keep you informed

01:04:33 –> 01:04:36
on how it impacts newspapers as best we can.

01:04:36 –> 01:04:38
So we’ll continue doing that.

01:04:38 –> 01:04:40
Stay tuned.

01:04:40 –> 01:04:43
And for more, you know, webinars in the future.

01:04:43 –> 01:04:46
Cliff has a comment here.

01:04:46 –> 01:04:50
No questions. Really appreciate Our-Hometown offering these webinars

01:04:50 –> 01:04:54
for some great insight into issues in our industry.

01:04:54 –> 01:04:58
Very helpful, especially for small shops as mine.

01:04:58 –> 01:05:00
I appreciate that, Cliff, very much.

01:05:00 –> 01:05:03
Exactly my goal.

01:05:03 –> 01:05:07
So thank you and Andrea, thank you.

01:05:07 –> 01:05:11
Very, very kind for you to join us and give us your time.

01:05:11 –> 01:05:15
So it’s nice talking with you all on the chat.

01:05:15 –> 01:05:17
And yeah, just a real quick shout out.

01:05:17 –> 01:05:20
If anyone, Chris, has got our wrap up slide here.

01:05:20 –> 01:05:26
Anyone not with us yet and you want to see what your newspaper could look like on

01:05:26 –> 01:05:30
WordPress, just scan that app or go to our website.

01:05:31 –> 01:05:34
You know, you can fill out a contact us form, give us a call.

01:05:34 –> 01:05:40
What we usually like to do is have you send us a copy of your print PDF

01:05:40 –> 01:05:43
and we’ll just turn it into a website, which is what we basically do

01:05:43 –> 01:05:48
every week for all of our customers, for the majority of our customers,

01:05:48 –> 01:05:51
is the full service management of the site.

01:05:51 –> 01:05:56
So just a quick plug for Our-Hometown there and the rapid prototyping.

01:05:56 –> 01:06:00
But I think that’s all our content for today.

01:06:00 –> 01:06:05
So again, we’ll follow up with an email and please share the recording

01:06:05 –> 01:06:06
with anyone that you’d like.

01:06:06 –> 01:06:12
It’s just going to be open on YouTube and Christopher, thank you so much, sir.

01:06:12 –> 01:06:13
A great job.

01:06:13 –> 01:06:13
Thank you, Matt.

01:06:13 –> 01:06:16
Appreciate your expertise as always.

01:06:16 –> 01:06:20
And Terry, thank you for jumping on and everyone appreciate your time.

01:06:20 –> 01:06:22
We’ll hopefully see you next time.

Share this Post:

Comments are closed.