Skip to content
/ urlgrab Public
forked from IAmStoxe/urlgrab

A golang utility to spider through a website searching for additional links.

Notifications You must be signed in to change notification settings

oXis/urlgrab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to urlgrab 👋

Twitter: DevinStokes

A golang utility to spider through a website searching for additional links with support for JavaScript rendering.

Install

go get -u github.com/iamstoxe/urlgrab

Features

  • Customizable Parallelism
  • Ability to Render JavaScript (including Single Page Applications such as Angular and React)

Usage

Usage of urlgrab.exe:
  -debug
        #  Extremely verbose debugging output. Useful mainly for development.
  -delay int
        # Milliseconds to randomly apply as a delay between requests. (default 2000)
  -depth int
        # The maximum limit on the recursion depth of visited URLs.  (default 2)
  -ignore-query
        # Strip the query portion of the URL before determining if we've visited it yet.
  -ignore-ssl
        # Scrape pages with invalid SSL certificates
  -js-timeout int
        # The amount of seconds before a request to render javascript should timeout. (default 10)
  -json string
        # The filename where we should store the output JSON file.
  -max-body int
        # The limit of the retrieved response body in kilobytes.
        # 0 means unlimited.
        # Supply this value in kilobytes. (i.e. 10 * 1024kb = 10MB) (default 10240)
  -no-head
        # Do not send HEAD requests prior to GET for pre-validation.
  -output-all string
        # The directory where we should store the output files.
  -proxy string
        # The SOCKS5 proxy to utilize (format: socks5://127.0.0.1:8080 OR http://127.0.0.1:8080).
        # Supply multiple proxies by separating them with a comma.
  -random-agent
        # Utilize a random user agent string.
  -render-js
        # Determines if we utilize a headless chrome instance to render javascript.
  -root-domain string
        # The root domain we should match links against.
        # If not specified it will default to the host of --url.
        # Example: --root-domain google.com
  -threads int
        # The number of threads to utilize. (default 5)
  -timeout int
        # The amount of seconds before a request should timeout. (default 10)
  -url string
        # The URL where we should start crawling.
  -user-agent string
        # A user agent such as (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0).
  -verbose
        # Verbose output

Author

👤 Devin Stokes

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐ if this project helped you!

Buy Me A Coffee

About

A golang utility to spider through a website searching for additional links.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 100.0%