add basic permissive robots.txt #1777

megahirt · 2023-09-21T01:15:21Z

Fixes #1776

Description

It appears that one of the reasons languageforge.org does not show up in google search results is the absence of a robots.txt file (who knew?)

Screenshots

From google search console:

Checklist

I have labeled my PR with: bug, feature, engineering, security fix or testing
I have performed a self-review of my own code
I have reviewed the title & description of this PR which I will use as the squashed PR commit message
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
I have enabled auto-merge (optional)

Testing

This PR needs to be merged and testing can really only be done once the robots.txt has shipped to production.

github-actions · 2023-09-21T01:17:37Z

Unit Test Results

362 tests 362 ✔️ 18s ⏱️
  37 suites     0 💤
    1 files     0 ❌

Results for commit 7c18c3a.

rmunn

I don't think it's a good idea to just allow every bot to index every part of our site. I think we want to consider what parts to disallow indexing, e.g. Disallow: /projects/ and several other URLs.

I also think we might want to disallow certain bots specifically, copying what other people have done. E.g., https://en.wikipedia.org/robots.txt has various bots blocked and comments about specifically why they're blocked. We don't really have time to do extensive research, so copying what bots sites have chosen to block might be a good plan.

However, if you want to just allow all bots now and edit the robots.txt later on to disallow certain ones, I won't be offended if you go ahead and dismiss this review and merge this as-is.

megahirt · 2023-09-21T19:08:47Z

I don't think it's a good idea to just allow every bot to index every part of our site. I think we want to consider what parts to disallow indexing, e.g. Disallow: /projects/ and several other URLs.

Our implicit policy has historically been "index what you want" since we had no specific robots.txt file with instructions otherwise. The fact that google changed their stance on "we won't index you without explicit allowance defined in robots.txt" doesn't mean that we have changed our policy. There are only a few pages that are publicly accessible (and that we want indexed). All of the the LF content is behind a login and cannot be indexed anyway. When we arrive at a point where we have public projects with public data, we will surely want that indexed as well.

So, I'd like to move forward with the simple robots.txt as proposed for expediency sake, and address other concerns in a separate PR. I do have some concerns about simply copying a robots.txt from a large org, as we may have different goals and the rules/policies are all based upon observed behavior of various bots which inevitably change over time, so I cannot say that blocking various bots which have acted badly in the past with respect to one org is necessarily a good move on our part - just some more thought on that.

megahirt · 2023-09-21T19:12:08Z

It's strange that there is a code formatting failure for files that weren't touched by this PR. I will ignore that for this PR and address in a separate PR.

with permission, moving forward with this PR for expediency

add basic permissive robots.txt

7c18c3a

megahirt added the bug An existing problem with our app in production label Sep 21, 2023

megahirt requested a review from rmunn September 21, 2023 01:15

megahirt self-assigned this Sep 21, 2023

megahirt enabled auto-merge (squash) September 21, 2023 01:17

rmunn previously requested changes Sep 21, 2023

View reviewed changes

megahirt disabled auto-merge September 21, 2023 19:11

megahirt merged commit 68d75e5 into develop Sep 21, 2023
16 of 17 checks passed

megahirt deleted the bug/add-robots-txt branch September 21, 2023 19:14

megahirt mentioned this pull request Sep 21, 2023

attempt additional prettier checks on PR builds #1778

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add basic permissive robots.txt #1777

add basic permissive robots.txt #1777

megahirt commented Sep 21, 2023 •

edited

Loading

github-actions bot commented Sep 21, 2023

rmunn left a comment

megahirt commented Sep 21, 2023

megahirt commented Sep 21, 2023

add basic permissive robots.txt #1777

add basic permissive robots.txt #1777

Conversation

megahirt commented Sep 21, 2023 • edited Loading

Fixes #1776

Description

Screenshots

Checklist

Testing

github-actions bot commented Sep 21, 2023

Unit Test Results

rmunn left a comment

Choose a reason for hiding this comment

megahirt commented Sep 21, 2023

megahirt commented Sep 21, 2023

megahirt commented Sep 21, 2023 •

edited

Loading