Skip to content

The course site for the Data Processing in Python from IES

Notifications You must be signed in to change notification settings

57424308/PythonDataIES

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Processing in Python (JEM207)

Link to the first lecture.

The course site for the Data Processing in Python from IES. See information on SIS. The course is taught by Martin Hronec, Vítek Macháček and Jan Šíla.

Date Topic who Project HW
29/9 Intro, Jupyter, Git (+ GitHub) Martin
6/10 Strings, Floats, Lists, Dictionaries, Functions Jan
12/10 Seminar (Git + Basic Python) Martin HW 0 & 1
13/10 Numpy, Pandas, Matplotlib Jan HW 2
20/10 Object-Oriented Programming Jan HW 3
26/10 Seminar Jan
27/10 HTML, XML, JSON, requests, APIs, BeautifulSoup Vitek
3/11 IES Web Scraper Vitek HW 4
9/11 Seminar Vitek
10/11 Advanced Pandas Vitek HW 5
17/11 State Holiday --
23/11 Seminar - MIDTERM full house
24/11 Introduction to Databases Jan Project Topic Proposal HW 6
1/12 Efficient Computing Martin
7/12 Parallelization Martin
8/12 Seminar Martin Project Topic Approval
15/12 Guest Lecture + Python BEER TBD
21/12 Project Work 2 (Seminar) full house Work-in-progress
22/12 Project Work 2 full house Work-in-progress

Course requirements

The requirements for passing the course are DataCamp assignments (5pts), the midterm (25pts), work in-progress-presentation (10pts), and the final project - including the final delivery presentation (60pts). At least 50% from the DataCamp assignments and work-in-progress presentation is required for passing the course.

Final project (60%)

  • Students in teams by 2
  • The task is to download any data from API or directly from the web. These data should be processed and visualized in the Jupyter Notebook, with auxiliary scripts consisting of functions and classes definitions as .py files. The project should be submitted as a GitHub repository.
  • The selection of the data is up to the students. (Conditional on our approval.)
  • Git collaboration as a proof of collaboration of both students.
  • More details during the lecture.

See example project from the previous semesters here from last year.

Project work - presentation (10%)

  • Presentation of work-in-progress related to the final project.

Midterm exam (25%)

23/11. Live coding (80 minutes), "open browser", no collaboration between the students. More details during the lecture week before

DataCamp Assignments (5%)

3 assignments out of assignments 1-6 submitted on time is required.

Assignment 0 - Submission on 12/10 (Introduction to Git)

  • Compulsory. Git is hard and you will need it throughout the course.

Assignment 1 - Submission on 12/10 (Introduction to Python Course)

  1. Python Lists
  2. Python Basics
  3. Function and Packages

Assignment 2 - Submission on 13/10 (Manipulating DataFrames with pandas)

  1. Numpy
  2. Extracting and Transforming Data
  3. Advanced Indexing

Assignment 3 - Submission on 20/10 (Object-Oriented Programming in Python)

  1. Getting ready for object-oriented programming
  2. Deep dive into classes and objects
  3. Fancy classes, fancy objects

Assignment 4 - Submission on 3/11 (Web Scraping in Python Course)

  1. Introduction to HTML
  2. XPaths and Selectors
  3. CSS Locators, Chaining, and Responses

Assignment 5 - Submission on 10/11 (Merging DataFrames with pandas Course)

  1. Concatenating and merging data
  2. Rearranging and reshaping data
  3. Grouping data

Assignment 6 - Submission on 24/11 (Importing Data in Python (Part 2) Course)

  1. The Intro to SQL for Data Science (full course)

Recommended DataCamp Courses

Tools

Introduction to Git for Data Science

General Python

Introduction to Python

Intermediate Python for Data Science

pandas

pandas Foundations

Manipulating DataFrames with pandas

Merging DataFrames with pandas

Cleaning Data in Python

Web Data Formats

Importing Data in Python (Part 1)

Importing Data in Python (Part 2)

Web Scraping with Python

Data Visualizations

Introduction to Data Visualization

Interactive Data Visualization in Bokeh

SQL

Introduction to SQL for Data Science

Introduction to Databases in Python

Prerequisities

Econometrics II. (JEB110) is an explicit prerequisite for bachelor students.

The course is designed for students that have at least some basic coding experience. It does not need to be very advanced, but they should be aware of concepts such as for loop ,if and else,variable or function.

No knowledge of Python is required for entering the course.

Credits

Passing the course is rewarded with 5 ECTS credits.

A sneak peek

IES web parser.

Materials

Git

Pro Git book, Atlassian Git tutorials, Github resources for learning Git

Python

Resources from the official Python webpage

Documentations

Python, Pandas, Numpy, requests, BeautifulSoup and Matplotlib.

About

The course site for the Data Processing in Python from IES

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 85.0%
  • Jupyter Notebook 15.0%