word counts #1

mmaz · 2020-06-26T23:58:03Z

Really great job on kicking off the wordcount feature Tejas! Excited to see you making progress so fast. Some suggestions on next steps:

It looks like the current script produces a csv of wordcounts for an input list of keywords. I think what we're looking for is rather, a csv of wordcounts for all words present in the .tsv file (after they have been normalized with clean_and_filter). Let me know if you have questions about this
Excellent to see type annotations! Can you also add docstrings please?
use standard __main__ (link)
- I think you can omit sys.argv since you're using argparse
format with black (https://github.com/psf/black)
rename the file to snakecasing (I have a bad habit of camelcasing .ipynb files but I think python files should be lowercased; eventually we will move several of these functions into a library)

Again, great job!! Let me know if you have any questions or if these suggestions don't make sense.

The text was updated successfully, but these errors were encountered:

tejasprabhune · 2020-06-27T18:13:27Z

I made the changes! Let me know what you think.

mmaz · 2020-06-27T19:46:45Z

Nice! One thing to consider is whether it would be better to run clean_and_filter on each input sentence, though (with some refactoring). In other words, will keyword_set have separate entries for lowercase and uppercase words, for example?

It might be helpful to also create a unit test to test some corner cases for this script, and also to document some shortcomings that we aren't currently addressing (such as not combining word stems, which is fine for now). For example:

input = """#sentence
Three apples, three oranges 3 pears & one more pear"""

Your unit test might verify we have

{ "three" : 2,
  "pears": 1,
  "pear": 1,
  "and": 0,
  ...}

mmaz assigned tejasprabhune Jun 26, 2020

mmaz closed this as completed Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word counts #1

word counts #1

mmaz commented Jun 26, 2020 •

edited by tejasprabhune

Loading

tejasprabhune commented Jun 27, 2020

mmaz commented Jun 27, 2020 •

edited

Loading

word counts #1

word counts #1

Comments

mmaz commented Jun 26, 2020 • edited by tejasprabhune Loading

tejasprabhune commented Jun 27, 2020

mmaz commented Jun 27, 2020 • edited Loading

mmaz commented Jun 26, 2020 •

edited by tejasprabhune

Loading

mmaz commented Jun 27, 2020 •

edited

Loading