Skip to content
This repository has been archived by the owner on Feb 9, 2022. It is now read-only.

Tweak wording for duplicated content #359

Merged
merged 1 commit into from
Apr 5, 2018
Merged

Tweak wording for duplicated content #359

merged 1 commit into from
Apr 5, 2018

Conversation

hadley
Copy link
Contributor

@hadley hadley commented Mar 28, 2018

No description provided.

@s-pace
Copy link
Contributor

s-pace commented Mar 29, 2018

@hadley thank you for spotting it.

I would even enhanced it:

It could happen that the crawling populates duplicated data from your website. This is mostly because we have crawled the same page several times (e.g. from different urls). If we have URLs like `http://website.com/page` and `http://website.com/page/` (notice the trailing `/` from the second one), the scraper will consider them as different. This can be fixed by adding a regex to the `stop_urls` in your `config.json`:

WDYT? If you like it, please commit it

@s-pace s-pace merged commit 108caaf into algolia:master Apr 5, 2018
s-pace pushed a commit that referenced this pull request Apr 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants