Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block internet search engines from indexing the mirror #48

Closed
bardiharborow opened this issue Jan 30, 2018 · 9 comments
Closed

Block internet search engines from indexing the mirror #48

bardiharborow opened this issue Jan 30, 2018 · 9 comments

Comments

@bardiharborow
Copy link

If possible, are you able to make your mirror non-indexed by internet search engines? There is very minimal benefit for clearnet users to run across three (WMF, WikiVisually and ipfs) different copies of the Wikipedia article every time they search for something.

@Stebalien
Copy link
Member

There is very minimal benefit for clearnet users to run across three (WMF, WikiVisually and ipfs) different copies of the Wikipedia article every time they search for something.

The benefit is entirely for clearnet users. Tor users, for example, will (almost) always be able to access Wikipedia over tor so they'll see little benefit.

We should probably add a rel=canonical link pointing to Wikipedia to the head of each page but I haven't thought through the possible ramifications/downsides of this approach.

@bardiharborow
Copy link
Author

bardiharborow commented Jan 30, 2018

The benefit is entirely for clearnet users. Tor users, for example, will (almost) always be able to access Wikipedia over tor so they'll see little benefit.

If anonymity is the concern, then accessing IPFS through the ipfs.io endpoint is no more anonymous to large scale surveillance than accessing Wikipedia directly, and if anything I trust the Wikimedia Foundation to handle server logs better than ipfs.io. Users of the actual IPFS software will presumably discover the mirror through different means than Google, and will not be impacted by this change.

We should probably add a rel=canonical link pointing to Wikipedia to the head of each page but I haven't thought through the possible ramifications/downsides of this approach.

Doing so would have the intended effect of removing the mirror from Google search results, and it is actually the preferred way to implement this.

@Stebalien
Copy link
Member

@bardiharborow

If anonymity is the concern, then accessing IPFS through the ipfs.io endpoint is no more anonymous to large scale surveillance than accessing Wikipedia directly, and if anything I trust the Wikimedia Foundation to handle server logs better than ipfs.io. Users of the actual IPFS software will presumably discover the mirror through different means than Google, and will not be impacted by this change.

Ah, I think the confusion may be around the definition of "clearnet". IPFS is a clearnet. That is, it's not a darknet (it provides no anonymity at the moment). Darknets get no benefit because the exit nodes tend to be in countries with strong free speech laws.

Users of the actual IPFS software will presumably discover the mirror through different means than Google, and will not be impacted by this change.

Unlikely. We don't have any IPFS search mechanisms and rely entirely on web search engines. That's probably one of the reasons we don't use rel=canonical links.

@rameshvarun
Copy link

+1 for setting rel='canonical' links. I'm starting to see the mirror pop up frequently on the first page of Google results just from normal everyday use. Canonical links should avoid this duplication and make the mirror a good web citizen.

@nemobis
Copy link

nemobis commented Oct 22, 2018

I agree with adding the rel="canonical": it's annoying to see search duplicates. By not indexing outdated content, you'll also alleviate the concerns with other issues such as #55 #49 .

Actually, what's the purpose of indexing all the pages at all? A noindex meta tag may be appropriate.

@wesleylima
Copy link

The lack of canonical tag comes from the htmls generated by kiwix's mwoffiler. I opened an issue openzim/mwoffliner#564

@nemobis
Copy link

nemobis commented Apr 30, 2019

The lack of canonical tag comes from the htmls generated by kiwix's mwoffiler.

I understand, but you can also add a canonical link in the webserver response headers.

@lidel
Copy link
Member

lidel commented Sep 2, 2019

I fixed this upstream (openzim/mwoffliner#963) 👌
Old snapshots are about to be excluded via /robots.txt (ipfs-inactive/website#334)

Remaining steps before this issue can be closed:

OR:

I will be checking on mwoffliner/kiwix situation, but if someone has spare bandwidth and can to speed things up, please contribute upstream & post updates here.

@lidel
Copy link
Member

lidel commented Feb 15, 2021

This has been fixed by #65 and will be solved upstream when new snapshots are published as part of #60 #61.

@lidel lidel closed this as completed Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants