Skip to content

Contains Jupyter notebooks and other materials prepared for the course Introduction to Data Science offered at TIFR Hyderabad (https://moldis-group.github.io/teaching.html)

License

Notifications You must be signed in to change notification settings

soumen261/DataScience

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataScience

License: MIT Python3

This repository contains Jupyter notebooks, images, PDFs, etc. prepared for the course Introduction to Data Science offered for Ph. D. students at TIFR Hyderabad (https://moldis-group.github.io/teaching.html)

How to access this material?

First of all, this material is made available on the GitHub to encourage others to access it freely, maintain a local copy, and may be even contribute corrections or new material. So, you can follow the three steps listed below freely, i.e., without having to commit to any responsibilities.

  1. If you think that you will ever use (or reuse) this material (or a part of it) for any purpose, you should sign-in to github by creating an account (maybe google account works for this too), and then click the 'Fork' button on the top-right. Then, you will get a local copy to play with. You will also be notified when any changes is made to this master version. You will be able to merge the new changes to your own copy of this repository. Others can also pull the changes you make in your version.

  2. To download the content to your computer, type the following in a terminal

git clone https://github.com/raghurama123/DataScience.git

or click the 'code' botton above and then click 'Download zip'

If you also Fork the material, then replace 'raghurama123' in the above line with your 'username'

  1. If you want to try the material in a web browser, i.e., to test the code or make small changes and run the code, you can access this repository at the interactive platform Binder by clicking the link: https://mybinder.org/v2/gh/raghurama123/DataScience/HEAD

If you also Fork the material, then replace 'raghurama123' in the above line with your 'username'

Syllabus:

The syllabus of this course is evolving over time. The original plan was to cover the following topics

  1. Data Science: Big Data, Facets of data (structured/unstructured data)
  2. Toolboxes: Python libraries, SCIKIT-Learn, PANDAS
  3. Statistics: Distributions, Outlier, Skewness, Pearson’s/Spearman’s/Kendall’s coefficient, Kernel density
  4. Statistical Inference: Hypothesis testing, Confidence Intervals
  5. Supervised Machine Learning: What is machine learning? Learning curves, Support Vector Machines, Random Forest
  6. Regression: Linear Regression, Logistic Regression
  7. Unsupervised Machine Learning: Clustering, Case studies
  8. Big Data concepts: Handling large data, Hadoop, Spark, NoSQL, Graph databases, Natural language processing, MapReduce

Additional reading:

  1. 10 minutes to Pandas
  2. Introduction to Data Science. A Python Approach to Concepts, Techniques and Applications, Laura Igual, Santi Segu, Springer (2017).
  3. Introducing Data Science, Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Manning (2016).
  4. Learn Git

Data sources:

  1. https://www.kaggle.com/

Contact

For comments, questions, suggestions or requests please write to ramakrishnan@tifrh.res.in

Twitter URL

About

Contains Jupyter notebooks and other materials prepared for the course Introduction to Data Science offered at TIFR Hyderabad (https://moldis-group.github.io/teaching.html)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%