Skip to content

Commit

Permalink
Replace pyenchant with pyspellchecker. get rid of quotation marks for…
Browse files Browse the repository at this point in the history
… typing keyword
  • Loading branch information
justanhduc committed Oct 6, 2018
1 parent 1f7a8ab commit 1a030d4
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 20 deletions.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@

This script runs using Python 3.

First, install the required packages. This script only requires ``nltk`` and ``PyEnchant``.
## Requirements

First, install the required packages. This script only requires ``nltk`` and ``pyspellchecker``.

```bash
$ pip3 install -r requirements.txt
```

## Known bugs and fix

If you run the error that the package ``punkt`` doesn't exist, download it by going into your Python environment and running:

```bash
Expand All @@ -30,6 +34,7 @@ this will be fixed by reinstalling certificates
$ /Applications/Python\ 3.x/Install\ Certificates.command
```

## Usage

To query for a certain keyword, run:

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
PyEnchant
pyspellchecker
nltk
29 changes: 11 additions & 18 deletions sotawhat.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import sys
import urllib.error
import urllib.request
import enchant
from spellchecker import SpellChecker
from nltk.tokenize import word_tokenize
from six.moves.html_parser import HTMLParser

Expand Down Expand Up @@ -31,7 +31,7 @@ def get_authors(lines, i):
def get_next_result(lines, start):
"""
Extract paper from the xml file obtained from arxiv search.
Each paper is a dict that contains:
+ 'title': str
+ 'pdf_link': str
Expand Down Expand Up @@ -224,8 +224,9 @@ def get_papers(keyword, num_results=5):
keyword = keyword.lower()
else:
keyword = keyword.lower()
d = enchant.Dict('en_US')
if d.check(keyword):
words = keyword.split()
d = SpellChecker()
if not d.unknown(words):
query_temp = 'https://arxiv.org/search/advanced?advanced=&terms-0-operator=AND&terms-0-term={}&terms-0-field=all&classification-computer_science=y&classification-physics_archives=all&date-filter_by=all_dates&date-year=&date-from_date=&date-to_date=&date-date_type=submitted_date&abstracts=show&size={}&order=-announced_date_first&start={}'
else:
query_temp = 'https://arxiv.org/search/?searchtype=all&query={}&abstracts=show&size={}&order=-announced_date_first&start={}'
Expand Down Expand Up @@ -266,21 +267,13 @@ def get_papers(keyword, num_results=5):
def main():
if len(sys.argv) < 2:
raise ValueError('You must specify a keyword')
if len(sys.argv) > 3:
raise ValueError("Too many arguments")

keyword = sys.argv[1]

if len(sys.argv) == 3:
try:
num_results = int(sys.argv[2])
except:
print('The second argument must be an integer')
return
if num_results <= 0:
raise ValueError('You must choose to show a positive number of results')

else:
try:
num_results = int(sys.argv[-1])
assert num_results > 0, 'You must choose to show a positive number of results'
keyword = ' '.join([arg for arg in sys.argv[1:-1]])
except ValueError:
keyword = ' '.join([arg for arg in sys.argv[1:]])
num_results = 5

get_papers(keyword, num_results)
Expand Down

0 comments on commit 1a030d4

Please sign in to comment.