Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instagram fix #593

Closed
wants to merge 2 commits into from
Closed

Instagram fix #593

wants to merge 2 commits into from

Conversation

richardgetz
Copy link
Contributor

Instagram doesn't allow HEAD requests. These updates fix that.

richardgetz and others added 2 commits May 6, 2020 15:19
Updated Instagram error typ to avid HEAD requests. Should work as nor…
@sdushantha
Copy link
Member

sdushantha commented May 6, 2020

@richardgetz Awesome! We really appreciate you helping out!

Just a quick question, why did you edit some of the lines and make them likes this?

image

EDIT: I think I understand why you did that, you want to follow the 79 char limit, right? Thats seems fine to me. 👍

@richardgetz
Copy link
Contributor Author

I have a beautifier extension on Atom so it must have done that automatically. I also realized this isn't the best fix since you are programmatically updating the data list. Let me see if there is a smarter way to check for the 405 error on the HEAD request to force a GET request.

@richardgetz
Copy link
Contributor Author

Actually, are you programmatically pulling the sites or only their rankings? If it is only the rankings then what I did is fine.

@sdushantha
Copy link
Member

sdushantha commented May 6, 2020

We are programmatically pulling the rankings and then sorting them alphabetically.

Since you are using new error message, method_not_allowed, we need to add it to the tests as well and @hoadlck will probably want to know about this as he is restructuring a lot of the code #590

Edit: ignore what said about the tests, I dont think it needs any changes (I actually know nothing about tests, so I could be VERY wrong about it)

@richardgetz
Copy link
Contributor Author

@hoadlck just a suggestion while you are restructuring. The module cfscrape is able to handle any cloudflare throttling issues you may run into. When I swapped it out locally my search seemed to complete faster.

@sdushantha sdushantha requested a review from hoadlck May 6, 2020 20:00
@hoadlck
Copy link
Contributor

hoadlck commented May 7, 2020

I see the addition of the "method_not_allowed" detection type, but it appears to be exactly the same as "status_code" detection type.

There is a bunch of churn because of cosmetic changes, which makes it hard to review the content. Style changes should be a separate PR. Perhaps I am missing the substantial content.

Adding a new detection type definitely will drive changes to the tests.

@sdushantha I would prefer to have the existing restructuring code merged so future PRs will be based off it. Most of the cosmetic changes in this PR effect code that have been moved around or re-written, so it will cause non-trivial merges with no value. If you would rather push out the restructure merge until later, I can do that.

As far as using cfscrape goes, yes that is the type of direction that we will have to go (Sherlock will have to be Javascript aware). But, this module has a dependancy on Node.js, so it would take some research on how to package the dependency along with Python content. So, I would want to defer that for now.

Copy link
Contributor

@hoadlck hoadlck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the only content change is to change

elif error_type == "status_code":

to

elif error_type == "status_code" or error_type == "method_not_allowed":

This just makes method_not_allowed an alias of status_code. Which is not going to help anything.

Am I missing something?

@richardgetz
Copy link
Contributor Author

richardgetz commented May 7, 2020

If you don't want to change the errorType then you'll have to add a special check or loop back for sites that give a 405 error on head requests. Probably not a bad idea for the rework.

@hoadlck
Copy link
Contributor

hoadlck commented May 9, 2020

OK, I understand. The other formatting changes were adding too much noise.

Adding a new detection type would add too much churn to the code. It also would make it harder to understand. The current detection logic still needs major revisions to allow more flexibility in supporting new types without revamping.

While the restructuring is not done, the direction I am planning is to have a set number of attributes for all sites, but then optional custom attributes for a specific detection method. So, I restructured this as an additional attribute to the existing HTTP Status detection method.

This code has been included in #599

Thanks!

@hoadlck hoadlck closed this May 9, 2020
@richardgetz
Copy link
Contributor Author

richardgetz commented May 9, 2020

I mean your fix was essentially the same change you just added an entirely new value in the json and then changed an if statement earlier on. What would've made more sense was not adding the field in the json and instead checking the error code type prior to defining a HEAD request. This would automatically fix any future issues should another provider longer allow HEAD requests. I would've implemented this to start but was under the impression the rework would address this issue.

@hoadlck
Copy link
Contributor

hoadlck commented May 9, 2020

If you want to do that change, then open a new PR. I was trying to get all development pre-restructure merged into the main line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants