Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add basic permissive robots.txt #1777

Merged
merged 1 commit into from
Sep 21, 2023
Merged

add basic permissive robots.txt #1777

merged 1 commit into from
Sep 21, 2023

Conversation

megahirt
Copy link
Collaborator

@megahirt megahirt commented Sep 21, 2023

Fixes #1776

Description

It appears that one of the reasons languageforge.org does not show up in google search results is the absence of a robots.txt file (who knew?)

Screenshots

From google search console:
image

Checklist

  • I have labeled my PR with: bug, feature, engineering, security fix or testing
  • I have performed a self-review of my own code
  • I have reviewed the title & description of this PR which I will use as the squashed PR commit message
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have enabled auto-merge (optional)

Testing

This PR needs to be merged and testing can really only be done once the robots.txt has shipped to production.

@megahirt megahirt added the bug An existing problem with our app in production label Sep 21, 2023
@megahirt megahirt requested a review from rmunn September 21, 2023 01:15
@megahirt megahirt self-assigned this Sep 21, 2023
@megahirt megahirt enabled auto-merge (squash) September 21, 2023 01:17
@github-actions
Copy link

Unit Test Results

362 tests   362 ✔️  18s ⏱️
  37 suites      0 💤
    1 files        0

Results for commit 7c18c3a.

rmunn
rmunn previously requested changes Sep 21, 2023
Copy link
Collaborator

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to just allow every bot to index every part of our site. I think we want to consider what parts to disallow indexing, e.g. Disallow: /projects/ and several other URLs.

I also think we might want to disallow certain bots specifically, copying what other people have done. E.g., https://en.wikipedia.org/robots.txt has various bots blocked and comments about specifically why they're blocked. We don't really have time to do extensive research, so copying what bots sites have chosen to block might be a good plan.

However, if you want to just allow all bots now and edit the robots.txt later on to disallow certain ones, I won't be offended if you go ahead and dismiss this review and merge this as-is.

@megahirt
Copy link
Collaborator Author

I don't think it's a good idea to just allow every bot to index every part of our site. I think we want to consider what parts to disallow indexing, e.g. Disallow: /projects/ and several other URLs.

Our implicit policy has historically been "index what you want" since we had no specific robots.txt file with instructions otherwise. The fact that google changed their stance on "we won't index you without explicit allowance defined in robots.txt" doesn't mean that we have changed our policy. There are only a few pages that are publicly accessible (and that we want indexed). All of the the LF content is behind a login and cannot be indexed anyway. When we arrive at a point where we have public projects with public data, we will surely want that indexed as well.

So, I'd like to move forward with the simple robots.txt as proposed for expediency sake, and address other concerns in a separate PR. I do have some concerns about simply copying a robots.txt from a large org, as we may have different goals and the rules/policies are all based upon observed behavior of various bots which inevitably change over time, so I cannot say that blocking various bots which have acted badly in the past with respect to one org is necessarily a good move on our part - just some more thought on that.

@megahirt
Copy link
Collaborator Author

It's strange that there is a code formatting failure for files that weren't touched by this PR. I will ignore that for this PR and address in a separate PR.

@megahirt megahirt dismissed rmunn’s stale review September 21, 2023 19:12

with permission, moving forward with this PR for expediency

@megahirt megahirt merged commit 68d75e5 into develop Sep 21, 2023
16 of 17 checks passed
@megahirt megahirt deleted the bug/add-robots-txt branch September 21, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An existing problem with our app in production
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: languageforge.org does not show up in Google search results when searching for "language forge"
2 participants