Skip to content

The-Sad-Zewalian/Reddit-Analysis-Hadoop

Repository files navigation

Mini-Project-1

CIE 427 First Mini Project about Reddit dataset.

Where we make some analyses on 10% of the data using hadoop working paradigm. Please Modify the Hadoop/Java Vars in the bash files to fit your installations. To run any of the following tasks please download dataset file, take a sample, name it "test.txt" , put it in hadoop and run the bashscripts.

Requirement-1

Most discussed/used topics associated with every subreddit and username with focus on the top subreddits


Requirement-2

Rate of replies compared to controversiality of comment/post


Requirement-3

Topics that yield the highest number of upvotes and/or lowest of downvotes


Creative Ideas:

B1 : Rate of deleted users, deleted comments and edited comments compared to controversiality

B2 : % of deleted users/comments and edited comments

B3 : % of negative/positive attitude of comments per subreddits/user


About

Doing Some Analysis on Reddit Data Using Hadoop Paradigm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published