allgithub

Crawling github data for https://github.com/anvaka/pm/

usage

Prerequisites:

Make sure redis is installed and running on default port
Register github token and set it into GH_TOKEN environment variable.
Install the crawler:

git clone https://github.com/anvaka/ghcrawl
cd ghcrawl
npm i

Now we are ready to index.

Find all users with more than 2 followers

This will use a search API and will go through all users on GitHub who have more than two followers. At the moment there are more than 400k users.

Each search request can return up to 100 records per page, which gives us 400,000 / 100 = 4,000 requests to make. Search API is rate limited at 30 requests per minute. Which means the indexing will take 4,000/30 = 133 - more than two hours:

node findUsersWithFollowers.js

Find all followers

Now that we have all users who have more than two followers, let's index those followers. Bad news we will have to make one request per user. Good news, rate limit is 5,000 requests per hour, which gives us estimated amount of work: 400,000/5,000 = 80 - more than 80 hours of work:

node indexUserFollowers.js

Time to get the graph

Now that we have all users indexed, we can construct the graph:

node makeFollowersGraph.js > github.dot

Layout

Convert graph to binary format:

node --max-old-space-size=4096 ./toBinary.js

Then use ngraph.native for faster graph layout.

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
findUsersWithFollowers.js		findUsersWithFollowers.js
index.js		index.js
indexUserFollowers.js		indexUserFollowers.js
layout.js		layout.js
makeFollowersGraph.js		makeFollowersGraph.js
package.json		package.json
redisNames.js		redisNames.js
toBinary.js		toBinary.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

allgithub

usage

Prerequisites:

Find all users with more than 2 followers

Find all followers

Time to get the graph

Layout

license

About

Releases

Packages

Contributors 2

Languages

License

anvaka/allgithub

Folders and files

Latest commit

History

Repository files navigation

allgithub

usage

Prerequisites:

Find all users with more than 2 followers

Find all followers

Time to get the graph

Layout

license

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages