-
This manual is tested on Mac OSX.
-
This manual is tested on course data of EX101x-3T2015 and FP101x-3T2015.
-
Python 2.7 should be well installed in the machine.
-
Downloading MySQL Community Server
-
Installing and recording the initial root password
-
Checking if MySQL installed in
/usr/local/mysql/bin
-
Setting PATH
- Editting bash profile
vim ~/.bash_profile
- Setting the path
PATH=$PATH:/usr/local/mysql/bin
- Adding this in bash file:
alias mysql=/usr/local/mysql/bin/mysql alias mysqladmin=/usr/local/mysql/bin/mysqladmin
- Saving the setting
esc
+:wq
- Sourcing bash profile
source ~/.bash_profile
- Editting bash profile
-
Running MySQL Server
- Mac OS X users can start it on System Preferences
-
Setting the password of root
- Using
mysql -uroot -p
with password to login MySQL - Inputting
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpass');
- Using
-
Installing MySQL Workbench for connecting MySQL Server
-
Installing Connector/Python
- Please check the documents MySQL Connector/Python Developer Guide
- MacOSX users can install it by
sudo pip install mysql-connector-repackaged
- Checking if the package installed by using
import mysql.connector
-
Editting self-contained file
moocdb.sql
by changing the database name as<database name>
-
Open the file
moocdb.sql
by MySQL Workbench and Run the SQL script.
-
Preprocessing daily log files:
- Editing
Path
of uncompressed course daily files in fileuncompressAndFilter.py
- Editing
starttime
andendtime
of target course for processing dailylog files in fileuncompressAndFilter.py
- Editing
course name
in fileuncompressAndFilter.py
for filtering the daily log files of specific course. python uncompressAndFilter.py
generating preprocessed daily log files
- Editing
-
All the daily logs and metadata of the course should be put into the same folder.
- Making a folder for a course. For example, you can name it as
FP101x_3T2015
- Uncompressing all the metadata of the courses (the file
<course name>.zip
) - Putting all the metadata of the course into the folder
- Putting all preprocessed daily log files into the folder
- Making a folder for a course. For example, you can name it as
-
Editing course folder path in
main.py
-
Editing database info in each
.py
files -
Running the code by
python main.py
-
If
null
cannot be added into integer columns in some tables, changing the mode of MySQL to non-strict.- mkdir
mysql
in folder/etc
sudo vi etc/mysql/my.cnf
- Adding content into
my.cnf
[mysqld] sql_mode=NO_ENGINE_SUBSTITUTION
- Restarting MySQL Server
- mkdir
The MOOCdb Project is an open source framework, which sets a shared data model standard for organzing data generated from MOOCs.
The initial schema of moocdb consists of four modules, which are Observing, Submitting, Collaborating and Feedback.
Our current schema is mainly based on the moocdb project. It consists of four modules, which are named as Observations, Submissions, Collaborations and UserModes.
As shown in Figure 1, each module in our schema has several tables of information. The differences between our current schema and the initial moocdb schema are discussed in the following sections.
In original Moocdb schema, Observing mode has five tables, which are observed_event, resources, resources_urls, resources_types and urls.
In our current schema, we merge them into two tables, named observations and resources. This two tables represent the observed events of students and relevant resouces to events.
In original Moocdb schema, Submitting mode has four tables, which are problem_type, problems, submissions, and assessments.
In our current schema, we merge the problem table into problems table. After that, a table named quiz_sessions is added. quiz_sessions is leveraged to represent how users answer sessions of quiz.
In original Moocdb schema, Collaborating mode has two tables, which are collaborations and collaboration_types.
In our current schema, the two original table collaborations and collaboration_types are combined into one table collaborations. forum_sessions is added as a new table, which represents users activities on forum.
In our schema, we have another parts named user modes. which contains four tables named courses, global_user, course_user and user_pii.
Table courses contains the metainfo of courses. Table global_user represent the relations between users and courses. Table course_user represent users' status and grade in courses. Table user_pii represent course users' demographic data.
-
Manually checking the weight of each problem in metadata file "course_structure" and fix them based on course setting on Edx.
-
Building a new table named
problem_structure
by running the fileProblemStructure.py
-
Running the SQL query to join three tables
submissions
,assessments
andproblem_structure
and select information of student submissions and grades we need.SELECT assessments.course_user_id as course_user_id, assessments.grade as grade, assessments.max_grade as max_grade, ps.weight as weight, submissions.problem_id as problem_id, ps.relevant_week as relevant_week, submissions.submission_timestamp as submission_timestamp FROM FP101x_3T2015.assessments AS assessments JOIN FP101x_3T2015.submissions AS submissions ON assessments.assessment_id = submissions.submission_id JOIN FP101x_3T2015.problem_structure AS ps ON submissions.problem_id = ps.problem_id WHERE # select all the pass user assessments.course_user_id IN ( SELECT FP101x_3T2015.course_user.course_user_id FROM FP101x_3T2015.course_user WHERE FP101x_3T2015.course_user.certificate_status <> "notpassing") AND ps.weight > 0 AND assessments.grade > 0 AND assessments.max_grade > 0
-
Exporting the new tables generated in Step 3 into csv files
- if you run the SQL script on MySQL Workbench in Step 3, you also need to setting the limitation of the rows in results
- If you run Step 3 in your codes or R/python environments, you may skip this step.
-
Run R script
ProgressAggregation.r
for the aggregration of student grade by their id and relevant weeks. (can be writen in python later) -
Run R script
ProgressPlot.r
for plotting the number of passing students in each week based on the aggregration in Step 5. (can be writen in python later)