Building Database for Courses Data of MOOCs (EDX)

1. Database Building

1.1 Environmantal Setting

This manual is tested on Mac OSX.
This manual is tested on course data of EX101x-3T2015 and FP101x-3T2015.
Python 2.7 should be well installed in the machine.

1.2 Installing MySQL

Downloading MySQL Community Server
Installing and recording the initial root password
Checking if MySQL installed in /usr/local/mysql/bin
Setting PATH
- Editting bash profile vim ~/.bash_profile
- Setting the path PATH=$PATH:/usr/local/mysql/bin
- Adding this in bash file:
```
alias mysql=/usr/local/mysql/bin/mysql
alias mysqladmin=/usr/local/mysql/bin/mysqladmin
```
- Saving the setting esc + :wq
- Sourcing bash profile source ~/.bash_profile
Running MySQL Server
- Mac OS X users can start it on System Preferences
Setting the password of root
- Using mysql -uroot -p with password to login MySQL
- Inputting SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpass');
Installing MySQL Workbench for connecting MySQL Server
Installing Connector/Python
- Please check the documents MySQL Connector/Python Developer Guide
- MacOSX users can install it by sudo pip install mysql-connector-repackaged
- Checking if the package installed by using import mysql.connector

1.3 Building the database

Editting self-contained file moocdb.sql by changing the database name as <database name>
Open the file moocdb.sql by MySQL Workbench and Run the SQL script.

1.4 Reading course data into the the database

Preprocessing daily log files:
- Editing Path of uncompressed course daily files in file uncompressAndFilter.py
- Editing starttime and endtime of target course for processing dailylog files in file uncompressAndFilter.py
- Editing course name in file uncompressAndFilter.py for filtering the daily log files of specific course.
- python uncompressAndFilter.py generating preprocessed daily log files
All the daily logs and metadata of the course should be put into the same folder.
- Making a folder for a course. For example, you can name it as FP101x_3T2015
- Uncompressing all the metadata of the courses (the file <course name>.zip)
- Putting all the metadata of the course into the folder
- Putting all preprocessed daily log files into the folder
Editing course folder path in main.py
Editing database info in each .py files
Running the code by python main.py

1.5 Problems

If null cannot be added into integer columns in some tables, changing the mode of MySQL to non-strict.
- mkdir mysql in folder /etc
- sudo vi etc/mysql/my.cnf
- Adding content into my.cnf
```
[mysqld]
sql_mode=NO_ENGINE_SUBSTITUTION
```
- Restarting MySQL Server

2. Relations with The Moocdb Project

2.1 The Moocdb Project

The MOOCdb Project is an open source framework, which sets a shared data model standard for organzing data generated from MOOCs.

The initial schema of moocdb consists of four modules, which are Observing, Submitting, Collaborating and Feedback.

2.2 Our current schema

Our current schema is mainly based on the moocdb project. It consists of four modules, which are named as Observations, Submissions, Collaborations and UserModes.

As shown in Figure 1, each module in our schema has several tables of information. The differences between our current schema and the initial moocdb schema are discussed in the following sections.

2.3 Observing

In original Moocdb schema, Observing mode has five tables, which are observed_event, resources, resources_urls, resources_types and urls.

In our current schema, we merge them into two tables, named observations and resources. This two tables represent the observed events of students and relevant resouces to events.

2.4 Submitting

In original Moocdb schema, Submitting mode has four tables, which are problem_type, problems, submissions, and assessments.

In our current schema, we merge the problem table into problems table. After that, a table named quiz_sessions is added. quiz_sessions is leveraged to represent how users answer sessions of quiz.

2.5 Collaborating

In original Moocdb schema, Collaborating mode has two tables, which are collaborations and collaboration_types.

In our current schema, the two original table collaborations and collaboration_types are combined into one table collaborations. forum_sessions is added as a new table, which represents users activities on forum.

2.6 UserModes

In our schema, we have another parts named user modes. which contains four tables named courses, global_user, course_user and user_pii.

Table courses contains the metainfo of courses. Table global_user represent the relations between users and courses. Table course_user represent users' status and grade in courses. Table user_pii represent course users' demographic data.

3. Passing Students Checking

3.1 How many student passed the course in each week?

Manually checking the weight of each problem in metadata file "course_structure" and fix them based on course setting on Edx.
Building a new table named problem_structure by running the file ProblemStructure.py

Running the SQL query to join three tables submissions, assessments and problem_structure and select information of student submissions and grades we need.

SELECT 
    assessments.course_user_id as course_user_id, 
    assessments.grade as grade, 
    assessments.max_grade as max_grade,
    ps.weight as weight,
    submissions.problem_id as problem_id,
    ps.relevant_week as relevant_week,
    submissions.submission_timestamp as submission_timestamp
FROM 
    FP101x_3T2015.assessments AS assessments
    JOIN FP101x_3T2015.submissions AS submissions
    ON assessments.assessment_id = submissions.submission_id
    JOIN FP101x_3T2015.problem_structure AS ps
    ON submissions.problem_id = ps.problem_id        
WHERE # select all the pass user
    assessments.course_user_id IN (
	    SELECT 
		    FP101x_3T2015.course_user.course_user_id 
	    FROM 
		    FP101x_3T2015.course_user
	    WHERE 
		    FP101x_3T2015.course_user.certificate_status <> "notpassing")
    AND ps.weight > 0
    AND assessments.grade > 0
    AND assessments.max_grade > 0

Exporting the new tables generated in Step 3 into csv files
- if you run the SQL script on MySQL Workbench in Step 3, you also need to setting the limitation of the rows in results
- If you run Step 3 in your codes or R/python environments, you may skip this step.
Run R script ProgressAggregation.r for the aggregration of student grade by their id and relevant weeks. (can be writen in python later)
Run R script ProgressPlot.r for plotting the number of passing students in each week based on the aggregration in Step 5. (can be writen in python later)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LogTranslation		LogTranslation
LogFileCleaning.py		LogFileCleaning.py
MOOCdb_Data_Model.png		MOOCdb_Data_Model.png
README.md		README.md
main.py		main.py
moocdb.sql		moocdb.sql
uncompress.py		uncompress.py
uncompressAndFilter.py		uncompressAndFilter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Database for Courses Data of MOOCs (EDX)

1. Database Building

1.1 Environmantal Setting

1.2 Installing MySQL

1.3 Building the database

1.4 Reading course data into the the database

1.5 Problems

2. Relations with The Moocdb Project

2.1 The Moocdb Project

2.2 Our current schema

2.3 Observing

2.4 Submitting

2.5 Collaborating

2.6 UserModes

3. Passing Students Checking

3.1 How many student passed the course in each week?

About

Releases

Packages

Contributors 2

Languages

Yue-ZHAO/DelftX_DBBuild

Folders and files

Latest commit

History

Repository files navigation

Building Database for Courses Data of MOOCs (EDX)

1. Database Building

1.1 Environmantal Setting

1.2 Installing MySQL

1.3 Building the database

1.4 Reading course data into the the database

1.5 Problems

2. Relations with The Moocdb Project

2.1 The Moocdb Project

2.2 Our current schema

2.3 Observing

2.4 Submitting

2.5 Collaborating

2.6 UserModes

3. Passing Students Checking

3.1 How many student passed the course in each week?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages