[FEATURE REQUEST] Spider mode for feroxbuster #407

dsaxton · 2021-10-31T23:36:49Z

Is your feature request related to a problem? Please describe.

It would be interesting if feroxbuster had a "spider-mode," which would really just use be the --extract-links behavior without using a word list. This would make for a nice option if ever a user wants to get a quick map of a site without also spraying the server with a lot of requests that are likely to fail.

Describe the solution you'd like

One approach could be something like feroxbuster -u https://example.com --spider which only requests the root path and then recursively fetches based on links that are found. This would pretty much just be an alias that activates functionality that feroxbuster already has, but in a more expressive and user-friendly way.

Describe alternatives you've considered

I've only tried using a very small dummy word list along with --extract-links, but maybe there is a simpler way I haven't thought of.

The text was updated successfully, but these errors were encountered:

epi052 · 2021-11-01T14:17:17Z

Does a single word in the word list and extract links do what you're looking for ?

dsaxton · 2021-11-01T14:38:12Z

Does a single word in the word list and extract links do what you're looking for ?

It does, that was essentially the alternative method I mentioned. An option like this would really just be a syntactic sugar to make it easier to express that type of behavior, but I can also completely understand not wanting the pollute the interface with extra options that aren't really necessary (especially if it's non-trivial to implement).

epi052 · 2021-11-02T00:06:20Z

It does, that was essentially the alternative method I mentioned

That's what I thought you meant, just wanted to be sure. I'm not against adding it. I think the tool has grown a bit beyond the original "simple" tagline, lol. I don't know of anyone that used it in this particular way, but that may be because it wasn't intuitive to invoke the behavior.

Implementation-wise, I don't think it would be much beyond adding the flag. Would need to handle the check for an empty wordlist, and after that just kinda see what the code does... lol. If it handles it mostly gracefully, that'd be basically it.

I've been thinking about adding a 'bag of observed words' kind of thing. Similar to --extract-links except it would be --extract-words and would then be added into the wordlist (think cewl+feroxbuster). turbo intruder added it recently, and i think there's another scanner that does it also. It's not the same thing as spidering, but this issue made me think of it and want to at least put it in writing somewhere, lol.

dsaxton · 2021-11-02T01:04:50Z

I think the tool has grown a bit beyond the original "simple" tagline, lol. I don't know of anyone that used it in this particular way, but that may be because it wasn't intuitive to invoke the behavior.

For sure. Maybe it makes sense to stick to the "brute force" approach, just wanted to throw this out there as a possible enhancement. I've had some trouble finding a good spidering tool, and feroxbuster may be better than existing tools I've looked at even for that.

I've been thinking about adding a 'bag of observed words' kind of thing. Similar to --extract-links except it would be --extract-words and would then be added into the wordlist (think cewl+feroxbuster).

That's a neat idea, would there be any filtering logic, or simply fetch everything that looks like a word and add it (I wonder if doing things like splitting existing links into words might generate good results)?

Also interesting to think how a dynamically changing word list might work in practice. I imagine could do an initial fetch and then have all requests share the same list for the duration of the scan, or are you thinking words would incrementally get added during scanning?

epi052 · 2021-11-02T02:20:07Z

would there be any filtering logic, or simply fetch everything that looks like a word and add it

one idea would be to borrow from NLP ideas. Each word would be filtered first against a set of stop words (the, is, am, was... etc). After that, it'd be added to a structure that keeps track of frequency in the page. Can then filter out based on some pre-chosen TF-IDF (how important the word is in relation to the document) cutoff value.

the other approach would be, just like you said, to simply add any not previously seen word to the wordlist.

I wonder if doing things like splitting existing links into words might generate good results

very similar logic is already in extract links. It seems to work out pretty well in the way its used now. I suspect it would still be useful.

    /// Iterate over a given path, return a list of every sub-path found
    ///
    /// example: `path` contains a link fragment `homepage/assets/img/icons/handshake.svg`
    /// the following fragments would be returned:
    ///   - homepage/assets/img/icons/handshake.svg
    ///   - homepage/assets/img/icons/
    ///   - homepage/assets/img/
    ///   - homepage/assets/
    ///   - homepage/

Also interesting to think how a dynamically changing word list might work in practice

I fear it'd be a non-trivial amount of work compared to how the wordlist works now.

an initial fetch and then have all requests share the same list for the duration of the scan

I'm not entirely sure how you meant this, but the way I interpret it makes it sound like it'd be limited. Ideally, extracted words would be tried on every new directory, and extracted from every new page, updating any future directory scans (basically the "words would incrementally get added during scanning" is how i think it ought to behave)

dsaxton · 2021-11-02T18:32:52Z

I'm not entirely sure how you meant this, but the way I interpret it makes it sound like it'd be limited. Ideally, extracted words would be tried on every new directory, and extracted from every new page, updating any future directory scans (basically the "words would incrementally get added during scanning" is how i think it ought to behave)

I think you're right it would be pretty limited to populate it only once at the start of the scan. I guess I was mostly wondering how complex it would be to have a mutable word list that gets shared / updated by several concurrent processes that are all making requests, but maybe that's not a big deal depending on how it's implemented.

epi052 · 2021-11-10T12:42:39Z

Sorry, I got a bit sidetracked. I looked at some other tools I've used in the past:

I haven't looked at either one any time recently. Did you try either of these? It looks like hakrawler stripped out a lot of its initial functionality.

dsaxton · 2021-11-10T18:24:31Z

Sorry, I got a bit sidetracked. I looked at some other tools I've used in the past:
* https://github.com/s0md3v/Photon

* https://github.com/hakluke/hakrawler
I haven't looked at either one any time recently. Did you try either of these? It looks like hakrawler stripped out a lot of its initial functionality.

Thanks! I wasn't aware of these and will take a look.

epi052 · 2021-11-10T18:26:31Z

i was mostly asking to see how they compared to you running feroxbuster for wordlist-less crawling.

dsaxton · 2021-11-11T16:18:18Z

i was mostly asking to see how they compared to you running feroxbuster for wordlist-less crawling.

Looks like feroxbuster gives about the same number of results as Photon and hakrawler based on a quick check. I did notice though that hakrawler was a lot faster than both feroxbuster and Photon, so maybe there are some opportunities to optimize the scan for feroxbuster. Here was the ferox command (single-slash.txt contains only the line "/") and it took 10-15 seconds to run on my computer:

~ $ feroxbuster -u https://www.yahoo.com -w single-slash.txt --extract-links

 ___  ___  __   __     __      __         __   ___
|__  |__  |__) |__) | /  `    /  \ \_/ | |  \ |__
|    |___ |  \ |  \ | \__,    \__/ / \ | |__/ |___
by Ben "epi" Risher 🤓                 ver: 2.4.0
───────────────────────────┬──────────────────────
 🎯  Target Url            │ https://www.yahoo.com
 🚀  Threads               │ 50
 📖  Wordlist              │ single-slash.txt
 👌  Status Codes          │ [200, 204, 301, 302, 307, 308, 401, 403, 405, 500]
 💥  Timeout (secs)        │ 7
 🦡  User-Agent            │ feroxbuster/2.4.0
 💉  Config File           │ /home/dsaxton/.config/feroxbuster/ferox-config.toml
 🔎  Extract Links         │ true
 🔃  Recursion Depth       │ 4
───────────────────────────┴──────────────────────
 🏁  Press [ENTER] to use the Scan Cancel Menu™
──────────────────────────────────────────────────
200      864l    15813w        0c https://www.yahoo.com/news/m-frosted-flakes-man-kevin-143000649.html
200        1l      171w    16605c https://www.yahoo.com/lib/metro/g/myy/rapidworker_1_2_0.0.40.js
500        1l        2w       28c https://www.yahoo.com/tdv2_fp/api/resource/NotificationHistory.getHistory
WLD        2l        5w        0c Got 403 for https://www.yahoo.com/lib/metro/21cd0d935f464b8aaece4f992787fcd0 (url length: 32)
200        1l        1w       42c https://www.yahoo.com/px.gif
302        1l       14w      260c https://www.yahoo.com/photo?psize=24X24&fallback_url=https%3A%2F%2Fs.yimg.com%2Fdh%2Fap%2Fsocial%2Fprofile%2Fprofile_a24.png&alphatar_photo=true&format=image
200        6l       19w      153c https://www.yahoo.com/p.gif?beaconType=darlaFetcherBeacon&
302        1l       14w      201c https://www.yahoo.com/finance/news/kyle-rittenhouse-ipad-pinch-to-zoom-lawyers-claim-142110207.html
200        1l       13w      158c https://www.yahoo.com/lib/metro/g/myy/advertisement_0.0.20.js
302        1l       14w      215c https://www.yahoo.com/sports/mike-zimmer-says-vikings-player-hospitalized-due-to-covid-19-symptoms-211029008.html
200       68l      110w     1856c https://www.yahoo.com/manifest_desktop_us.json
WLD      143l      380w     4471c Got 403 for https://www.yahoo.com/ws/v3/mailboxes/6c24c0c9766e4978911daf4dc0efde85 (url length: 32)
WLD         -         -         - Wildcard response is dynamic; auto-filtering (4462 + url length) responses; toggle this behavior by using --dont-filter
WLD      143l      380w     4535c Got 403 for https://www.yahoo.com/ws/v3/mailboxes/5d7e967730c54f48b0866fb6591a9fa0c90183519e2d4fb099591cda9de10910cbe18806988d414b9a3f54d69677e177 (url length: 96)
200     1813l    20212w        0c https://www.yahoo.com/
[####################] - 14s      704/704     0s      found:14      errors:0      
[####################] - 13s        1/1       0/s     https://www.yahoo.com
[####################] - 12s        1/1       0/s     https://www.yahoo.com/fpjs/
[####################] - 12s        1/1       0/s     https://www.yahoo.com/myjs/
[####################] - 12s        1/1       0/s     https://www.yahoo.com/
[####################] - 0s         1/1       1/s     https://www.yahoo.com/lifestyle/
[####################] - 0s         2/1       6/s     https://www.yahoo.com/lib/metro/
[####################] - 0s         1/1       1/s     https://www.yahoo.com/lib/metro/g/
[####################] - 0s         1/1       1/s     https://www.yahoo.com/plus/mail/
[####################] - 0s         3/1       8/s     https://www.yahoo.com/ws/v3/mailboxes/

This command finishes in a couple seconds:

echo https://www.yahoo.com | hakrawler

stale · 2021-11-25T17:57:12Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dsaxton · 2021-11-25T23:56:47Z

@epi052 Maybe this is out of the scope for feroxbuster so we can close. FWIW I played around with creating a web crawler in Rust that seems to give some reasonable results: https://github.com/dsaxton/wrake.

It could definitely be improved a lot though; e.g., I think it's throwing out a lot of valid links, and could possibly benefit from using async / await.

epi052 · 2021-11-26T13:02:31Z

@dsaxton So, I've been going back and forth on this one.

I'm of the mind that I don't want ferox creeping off into other realms of related stuff. One of the original reasons I wanted to write it was to have a single tool, that did a single thing really well.

As far as using ferox as a crawler, we could document that it's possible using your workaround, but that it's not necessarily intended to act as a crawler, and there are likely better options (maybe summarizing what you've found in your testing of other tools and what you've learned writing wrake?)

That's ultimately where I've landed on this. I'd like to keep ferox solely as a directory brute-forcing tool.

If you don't want to add any related documentation, I'm absolutely ok with that, just re-close this ticket and folks can find it via search if it ever makes sense.

If you do feel like writing it up, the docs live @ https://github.com/epi052/feroxbuster-docs now.

Thanks again!

epi052 · 2021-11-26T13:06:59Z

I took a look at wrake, and yes, at a quick glance, async / await would still give you a lot more perf than what you're currently getting with just rayon.

dsaxton · 2021-11-29T21:56:01Z

@epi052 Thanks, I'll look into adding something to the docs soonish. I agree though after thinking a bit more that it's good to keep the features focused on brute forcing, so we could say it's possible to use feroxbuster for crawling, but probably not optimal if that's the user's primary goal.

dsaxton · 2021-12-09T18:12:42Z

Put up a PR in the docs repo so closing this

dsaxton added the enhancement New feature or request label Oct 31, 2021

stale bot added the stale label Nov 25, 2021

dsaxton closed this as completed Nov 25, 2021

epi052 reopened this Nov 26, 2021

stale bot removed the stale label Nov 26, 2021

dsaxton closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] Spider mode for feroxbuster #407

[FEATURE REQUEST] Spider mode for feroxbuster #407

dsaxton commented Oct 31, 2021 •

edited

Loading

epi052 commented Nov 1, 2021

dsaxton commented Nov 1, 2021 •

edited

Loading

epi052 commented Nov 2, 2021

dsaxton commented Nov 2, 2021 •

edited

Loading

epi052 commented Nov 2, 2021

dsaxton commented Nov 2, 2021

epi052 commented Nov 10, 2021

dsaxton commented Nov 10, 2021

epi052 commented Nov 10, 2021

dsaxton commented Nov 11, 2021

stale bot commented Nov 25, 2021

dsaxton commented Nov 25, 2021

epi052 commented Nov 26, 2021

epi052 commented Nov 26, 2021

dsaxton commented Nov 29, 2021

dsaxton commented Dec 9, 2021

[FEATURE REQUEST] Spider mode for feroxbuster #407

[FEATURE REQUEST] Spider mode for feroxbuster #407

Comments

dsaxton commented Oct 31, 2021 • edited Loading

epi052 commented Nov 1, 2021

dsaxton commented Nov 1, 2021 • edited Loading

epi052 commented Nov 2, 2021

dsaxton commented Nov 2, 2021 • edited Loading

epi052 commented Nov 2, 2021

dsaxton commented Nov 2, 2021

epi052 commented Nov 10, 2021

dsaxton commented Nov 10, 2021

epi052 commented Nov 10, 2021

dsaxton commented Nov 11, 2021

stale bot commented Nov 25, 2021

dsaxton commented Nov 25, 2021

epi052 commented Nov 26, 2021

epi052 commented Nov 26, 2021

dsaxton commented Nov 29, 2021

dsaxton commented Dec 9, 2021

dsaxton commented Oct 31, 2021 •

edited

Loading

dsaxton commented Nov 1, 2021 •

edited

Loading

dsaxton commented Nov 2, 2021 •

edited

Loading