Skip to content

Woedenaz/acs-database-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACS DATABASE SCRAPER

A tool written in Rust that can do the following:

  • Scrapes every name of every SCP and writes them to a JSON file
  • Scrapes every SCP page to find use of the Anomaly Classification System
    1. First it finds any page using the Anomaly Classification Bar (Also called the ACS Bar) and pulls specific text from the known structure of the component.
    2. If the ACS Bar is not found, it searches for specific Strings and Text unique to ACS and adds the SCP to the database if they are found
    3. Writes these SCPs using ACS to a JSON file
  • Pulls the backlinks for 3 different components utilizing ACS and writes them to a JSON file
  • Cross compares the current Database JSON with the backlinks JSON and adds any missing pages.

How To Use

To run, use the following command:

cargo run

Without command-line flags, it will not do anything. Utilize the flags and arguments below to customize the tool:

cargo run -- -s -g -b -c --start <number> --end <number> -l <number> -r <number>

In the command line, there are 4 arguments and 4 flags:

Flags

  • --scraper or -s: Enables the base function of scraping the SCP-Wiki for pages using ACS
  • --getnames or -g: Enables the scraping of the SCP Names from the series pages.
  • --backlinks or -b: Enables the scraping of SCPs using the following component pages:
  • --cross or -c: Enables the cross-comparison of the current acs_database.json with the acs_backlinks.json created by the --backlinks flag. Any missing SCPs will be added to the database.

Arguments

  • --start #: The start number used for scraping. The default is 1.
  • --end #: The end number used for scraping. The default is 7999.
  • --limit # or -l #: The number of concurrent threads allowed when scraping the scp-wiki. The default is 10.
  • --retries # or -r #: When calling a initially page fails, this is the number of times it will try before continuing. The default is 5.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages