Data Processing in Python (JEM207)

The course site for the Data Processing in Python from IES. See information on SIS. The course is taught by Martin Hronec and Vítek Macháček

Course description

The aim of the course is to provide a hands-on experience with the data-manipulation techniques in Python. The special emphasis is put on standard libraries such as Pandas, Numpy or Matplotlib and also collecting web data with requests and BeatifiulSoup. The students will also be guided through the modern social-coding and open-source technologies such as GitHub, Jupyter and Open Data.

The students will gain their experience using the data from the IES website and subject evaluation protocols.

The course would make use of the DataCamp online sources to provide the students with reliable and yet simple resources for learning Python programming.

Learning outcomes

After passing the course, the students will be able to download the data from APIs or directly from the web, pre-process it, analyze it and visualize it.

Prerequisities

Econometrics II. (JEB110) is an explicit prerequisite for bachelor students.

The course is designed for students that have at least some basic coding experience. It does not need to be very advanced, but they should be aware of concepts such as for loop ,if and else,variable or function.

No knowledge of Python is required for entering the course.

Materials

Git

Pro Git book, Atlassian Git tutorials, Github resources for learning Git

Python

Resources from the official Python webpage

Documentations

Python, Pandas, Numpy, requests, BeautifulSoup and Matplotlib.

Recommended DataCamp Courses

Others

LearnPython

Learn Python on CodeAcademy

pandas Cookbook

Practical Introduction to Web Scraping in Python

Credits

Passing the course is rewarded with 5 ECTS credits.

Course requirements

The requirement for passing the course are DataCamp assignments (0pts but compulsory), the midterm (30pts) and the final project (70pts).

DataCamp Assignments (0%, compulsory)

4 assignments out of assignments 1-6 submitted on time is required.

Assignment 0 - (Introduction to Git)

not compulsory but strongly recommended. Git is hard and you will need it throughout the course.

Assignment 1 - Submission on 8/10 (Introduction to Python Course)

Python Lists
Python Basics
Function and Packages

Assignment 2 - Submission on 15/10 (Manipulating DataFrames with pandas)

Numpy
Extracting and Transforming Data
Advanced Indexing

Assignment 3 - Submission on 22/10 (Object-Oriented Programming in Python)

Getting ready for object-oriented programming
Deep dive into classes and objects
Fancy classes, fancy objects

Assignment 4 - Submission on 29/10 (Web Scraping in Python Course)

Introduction to HTML
XPaths and Selectors
CSS Locators, Chaining, and Responses

Assignment 5 - Submission on 12/11 (Importing Data in Python (Part 2) Course)

The Intro to SQL for Data Science (full course)

Assignment 6 - Submission on 19/11 (Merging DataFrames with pandas Course)

Concatenating and merging data
Rearranging and reshaping data
Grouping data

Midterm exam (30%)

Description:

November 26th

Final project (70%)

Description:

Students in teams by 2
The task is to download any data from API or directly from the web. These data should be processed and visualized in the Jupyter Notebook, with auxiliary scripts as .py files. The project should be submitted as a GitHub repository.
The selection of the data is entirely up to the students.
More details during the lecture.

See an example project from last year.

Deadlines:

November 12th: Project Topic First Submission

November 26th: Midterm Exam

December 3rd: Project Topic Final Submission

January 21st: Project Submission (to be confirmed)

Evaluation Criteria:

The project use correctly downloaded data from the public API or website.
The download is easily reproducible
The data were cleaned appropriately
The data are visualized
The project is submitted as a public GitHub repository
All team members visibly collaborated on the GitHub repository
The code is readable and well documented
The code is object-oriented
The project's summary is submitted as a jupyter notebook.
Project is distributed as a Python package

Grading scale

A: above 90 (not inclusive)
B: between 80 (not inclusive) and 90 (inclusive)
C: between 70 (not inclusive) and 80 (inclusive)
D: between 60 (not inclusive) and 70 (inclusive)
E: between 50 (not inclusive) and 60 (inclusive)
F: below 50 (inclusive)

Our materials

Jupyter and GitHub intro here

The Jupyter notebook with IES web parser

Course syllabus

Date	Topic	who	Project	HW
1/10	Intro, Jupyter, Git (+ GitHub)	Martin		HW 0
8/10	Strings, Floats, Lists, Dictionaries, Functions	Vítek		HW 1
15/10	Numpy, Pandas, Matplotlib	Martin		HW 2
22/10	Object-Oriented Programming	Martin		HW 3
29/10	HTML, XML, JSON, requests, APIs, BeautifulSoup	Vítek		HW 4
5/11	IES Web Scraper	Vítek
12/11	Introduction to Databases	Vítek	Project Topic Proposal	HW 5
19/11	Advanced Pandas	Martin		HW 6
26/11	MIDTERM	Vítek
3/12	Project Work 1		Project Topic Approval
10/12	Guest Lecture (TBA)	Guest
17/12	Efficient Computing / Parallelization	Martin
7/1	Project Work 2

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
01		01
02		02
data		data
.gitignore		.gitignore
IES_Web.ipynb		IES_Web.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Processing in Python (JEM207)

Course description

Learning outcomes

Prerequisities

Materials

Git

Python

Documentations

Recommended DataCamp Courses

Tools

General Python

pandas

Web Data Formats

Data Visualizations

SQL

Others

Credits

Course requirements

DataCamp Assignments (0%, compulsory)

Midterm exam (30%)

Final project (70%)

Grading scale

Our materials

Course syllabus

About

Releases

Packages

Languages

petrpham/PythonDataIES

Folders and files

Latest commit

History

Repository files navigation

Data Processing in Python (JEM207)

Course description

Learning outcomes

Prerequisities

Materials

Git

Python

Documentations

Recommended DataCamp Courses

Tools

General Python

pandas

Web Data Formats

Data Visualizations

SQL

Others

Credits

Course requirements

DataCamp Assignments (0%, compulsory)

Midterm exam (30%)

Final project (70%)

Grading scale

Our materials

Course syllabus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages