NLP-project

NLP project on duplicate quora question pair data to distinguish that if 2 given questions are duplicate in meaning or not. This will help the organization to merge the answer in one thread and will be able to provide better response. Dataset used in this code is here https://www.kaggle.com/datasets/quora/question-pairs-dataset In this project i have done important preprocessing steps on text string. The Dataset is very huge it has around 400k rows so it is computationally very expensive even in colab i was able to run programme with 50k rows then i ran out of RAM. In this project i have used Bag of Word and Word2Vec techniques and results are shown in code. BoW gives better results here. accuracy with 50k rows is 78-80%. This can be increased if we increase the number of rows fed in our model it will require more computaional power. further we can do hyper parameter training to further improve the results.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Duplicate_quora_question_pair_project.ipynb		Duplicate_quora_question_pair_project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-project

About

Releases

Packages

Languages

divyanshusharma1997/NLP-project

Folders and files

Latest commit

History

Repository files navigation

NLP-project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages