Skip to content

Latest commit

 

History

History
73 lines (55 loc) · 3.04 KB

File metadata and controls

73 lines (55 loc) · 3.04 KB

NLP Modules (1 month)

Outline

S/N Domain Estimated Duration
1 Lectures 1 week
2 Tutorials 1 week
3 Assessment 2 weeks

Lectures

Natural Language Processing

  • Stanford Natural Language Processing Slides
  • Stanford Natural Language Processing Lectures (YouTube)
    • Introduction (1.1)
    • Basic Text Processing (2.1 to 2.5)
    • Minimum Edit Distance (3.1 to 3.3)
    • Language Modeling (4.1 to 4.6)
    • Text Classification (6.1 to 6.9)
    • Sentiment Analysis (7.1 to 7.5)
    • Information Extraction and Named Entity Recognition (9.1 to 9.3)
    • Relation Extraction (10.1)
    • Part-Of-Speech Tagging (12.1 and 12.2)
    • Information Retrieval (18.1 to 18.3)
    • Ranked Information Retrieval (19.1 to 19.5)
    • Semantics (20.1 to 20.5)
    • Question Answering (21.1 to 21.3)
    • Summarization (22.1)
  • You can find the backup to the slides and videos here

Word Embeddings

Tutorials

Natural Language Processing

Word Embeddings

Practical Application

Available Toolkits

  • NLTK
  • scikit-learn
  • fastText
  • Gensim
  • spaCy

Assessment

This section is a hands-on assessment that requires practitioners to attempt the Kaggle Avito Duplicate Ads Detection prediction competition. You are expected to write your own code from scratch using concepts learnt from Kaggle Titanic and NLP methods (e.g. word2vec). Consequently, you'll present your work to your mentor/supervisor.

Kaggle Avito Duplicate Ads Detection

Recommended features and models

Vaex (alternative to Pandas if it's too slow)