Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
bin		bin
lib		lib
spec		spec
CHANGELOG.rdoc		CHANGELOG.rdoc
LICENSE.txt		LICENSE.txt
README.rdoc		README.rdoc
Rakefile		Rakefile
anemone.gemspec		anemone.gemspec

Repository files navigation

Anemone¶ ↑

Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

See anemone.rubyforge.org for more information.

Features¶ ↑

Multi-threaded design for high performance
Tracks 301 HTTP redirects to understand a page’s aliases
Built-in BFS algorithm for determining page depth
Allows exclusion of URLs based on regular expressions
Choose the links to follow on each page with focus_crawl()
HTTPS support
Records response time for each page
CLI program can list all pages in a domain, calculate page depths, and more
Obey robots.txt
In-memory or persistent storage of pages during crawl, using TokyoCabinet or PStore

Examples¶ ↑

See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.

Requirements¶ ↑

nokogiri
robots

Development¶ ↑

To test and develop this gem, additional requirements are:

rspec
fakeweb
tokyocabinet

The latter gem needs Tokyo Cabinet installed on your system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anemone¶ ↑

Features¶ ↑

Examples¶ ↑

Requirements¶ ↑

Development¶ ↑

About

Releases

Packages

License

lpradovera/anemone

Folders and files

Latest commit

History

Repository files navigation

Anemone¶ ↑

Features¶ ↑

Examples¶ ↑

Requirements¶ ↑

Development¶ ↑

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages