Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reason why the archive bot is joining the room #262

Merged
merged 9 commits into from
Jun 9, 2023
49 changes: 35 additions & 14 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,31 +17,52 @@ And with the introduction of the jump to date API via
[MSC3030](https://github.com/matrix-org/matrix-spec-proposals/pull/3030), we could show
messages from any given date and day-by-day navigation.

## How do I opt out and keep my room from being indexed by search engines?
## Why did the archive bot join my room?

All public Matrix rooms are accessible to view in the Matrix Public Archive. But only
rooms with history visibility set to `world_readable` are indexable by search engines.
Only public Matrix rooms with `shared` or `world_readable` [history
visibility](https://spec.matrix.org/v1.6/client-server-api/#room-history-visibility) are
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
accessible in the Matrix Public Archive.

Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better
opt out controls.
But the archive bot (`@archive:matrix.org`) will join any public room because it doesn't
know the history visibility without first joining. Any room without `world_readable` or
`shared` history visibility will lead a `403 Forbidden`. And if the public room is in
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
the room directory, it will be listed in the archive but will still lead to a `403
Forbidden` in that case.

For [archive.matrix.org](https://archive.matrix.org/), you can ban the
`@archive:matrix.org` user if you don't want your room content to be shown in the
archive at all.
The Matrix Public Archive doesn't hold onto any data (it's
stateless) and requests the messages from the homeserver every time. The
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
minutes for the current day, and 2 days for past content.

The Matrix Public Archive only allows rooms with `world_readable` history visibility to
be indexed by search engines. See the [opt
out](#how-do-i-opt-out-and-keep-my-room-from-being-indexed-by-search-engines) topic
below for more details.

## Why does the archive user join rooms instead of browsing them as a guest?
### Why does the archive user join rooms instead of browsing them as a guest?

Guests require `m.room.guest_access` to access a room. Most public rooms do not allow
guests because even the `public_chat` preset when creating a room does not allow guest
access. Not being able to view most public rooms is the major blocker on being able to
use guest access. The idea is if I can view the messages from a Matrix client as a
random user, I should also be able to see the messages in the archive.

Keep in mind that only rooms with history visibility set to `world_readable` are
indexable by search engines. The Matrix Public Archive doesn't hold onto any data (it's
stateless) and requests the messages from the homeserver every time. The
[archive.matrix.org](https://archive.matrix.org/) instance has some caching in place, 5
minutes for the current day, and 2 days for past content.
Guest access is also a much different ask than read-only access since guests can also
send messages in the room which isn't always desirable. The archive bot is read-only and
does not send messages.

## How do I opt out and keep my room from being indexed by search engines?

Only public Matrix rooms with `shared` or `world_readable` history visibility are
accessible to view in the Matrix Public Archive. But only rooms with history visibility
set to `world_readable` are indexable by search engines.

Also see https://github.com/matrix-org/matrix-public-archive/issues/47 to track better
opt out controls.

As a workaround for [archive.matrix.org](https://archive.matrix.org/) today, you can ban
the `@archive:matrix.org` user if you don't want your room content to be shown in the
archive at all.

## Technical details

Expand Down
8 changes: 8 additions & 0 deletions server/lib/matrix-utils/ensure-room-joined.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,14 @@ async function ensureRoomJoined(
method: 'POST',
accessToken,
abortSignal,
body: {
reason:
`Joining room to check history visibility. ` +
`If your room is public with shared or world readable history visibility, ` +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is hostile wording towards the users as the term shared doesn't appear outside of the spec. Please consider using client terminology such as Element Web members-only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the FAQ with the equivalents you might see in the UI but it's not possible to be exhaustive for how every client might expose these options. We're also trying to be brief and to the point with this join reason so I'm going to just let the pointer to the FAQ take the lead for people trying to understand more.

`it will be accessible at archive.matrix.org. ` +
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
`See the FAQ for more details: ` +
`https://github.com/matrix-org/matrix-public-archive/blob/main/docs/faq.md#why-did-the-archive-bot-join-my-room`,
},
});
assert(
joinData.room_id,
Expand Down