Name	Name	Last commit message	Last commit date
Latest commit History 75 Commits
.gitignore	.gitignore
README.md	README.md
go.mod	go.mod
main.go	main.go

Name

Last commit message

Last commit date

75 Commits

Welcome to urlgrab 👋

A golang utility to spider through a website searching for additional links with support for JavaScript rendering.

Install

go get -u github.com/iamstoxe/urlgrab

Features

Customizable Parallelism
Ability to Render JavaScript ^{(including Single Page Applications such as Angular and React)}

Usage

Usage of urlgrab.exe:
  -debug
        #  Extremely verbose debugging output. Useful mainly for development.
  -delay int
        # Milliseconds to randomly apply as a delay between requests. (default 2000)
  -depth int
        # The maximum limit on the recursion depth of visited URLs.  (default 2)
  -ignore-query
        # Strip the query portion of the URL before determining if we've visited it yet.
  -ignore-ssl
        # Scrape pages with invalid SSL certificates
  -js-timeout int
        # The amount of seconds before a request to render javascript should timeout. (default 10)
  -json string
        # The filename where we should store the output JSON file.
  -max-body int
        # The limit of the retrieved response body in kilobytes.
        # 0 means unlimited.
        # Supply this value in kilobytes. (i.e. 10 * 1024kb = 10MB) (default 10240)
  -no-head
        # Do not send HEAD requests prior to GET for pre-validation.
  -output-all string
        # The directory where we should store the output files.
  -proxy string
        # The SOCKS5 proxy to utilize (format: socks5://127.0.0.1:8080 OR http://127.0.0.1:8080).
        # Supply multiple proxies by separating them with a comma.
  -random-agent
        # Utilize a random user agent string.
  -render-js
        # Determines if we utilize a headless chrome instance to render javascript.
  -root-domain string
        # The root domain we should match links against.
        # If not specified it will default to the host of --url.
        # Example: --root-domain google.com
  -threads int
        # The number of threads to utilize. (default 5)
  -timeout int
        # The amount of seconds before a request should timeout. (default 10)
  -url string
        # The URL where we should start crawling.
  -user-agent string
        # A user agent such as (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0).
  -verbose
        # Verbose output

Author

👤 Devin Stokes

Twitter: @DevinStokes
Github: @IAmStoxe

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐ if this project helped you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to urlgrab 👋

Install

Features

Usage

Author

🤝 Contributing

Show your support

About

Releases

Packages

Languages

oXis/urlgrab

Folders and files

Latest commit

History

Repository files navigation

Welcome to urlgrab 👋

Install

Features

Usage

Author

🤝 Contributing

Show your support

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages