Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pylint to sniff for PEP8 compliance #507

Closed
ArtPoon opened this issue Jan 23, 2024 · 17 comments
Closed

Add pylint to sniff for PEP8 compliance #507

ArtPoon opened this issue Jan 23, 2024 · 17 comments
Assignees

Comments

@ArtPoon
Copy link
Contributor

ArtPoon commented Jan 23, 2024

Add as a GitHub action

@ArtPoon ArtPoon self-assigned this Feb 6, 2024
@GopiGugan
Copy link
Contributor

  1. Go to "Actions" tab
  2. Search for pylint
  3. Click on configure
  4. Add the following yaml:
name: Pylint check

on:
  pull_request:
    branches:
      - "master"
      - "dev"

jobs:
  pylint-check:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.10'

    - name: Create virtual environment
      run: python -m venv venv
      working-directory: ${{ github.workspace }}

    - name: Activate virtual environment and install dependencies
      run: |
        source venv/bin/activate
        python -m pip install --upgrade pip
        deactivate
      working-directory: ${{ github.workspace }}

    - name: Analyze code with Pylint
      run: |
        source venv/bin/activate
        pylint $(git ls-files '*.py')
        deactivate
      working-directory: ${{ github.workspace }}

I've attached the list of warnings that pylint is currently throwing. We should resolve these before setting this Github Action or else our pull requests will fail
pylint_report.log

@ArtPoon ArtPoon assigned WilliamZekaiWang and unassigned ArtPoon Mar 5, 2024
@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 5, 2024

@WilliamZekaiWang can you please split a new branch off dev and start working through the above log?

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 5, 2024

python3 -m venv venv  # install virtual environment
source venv/bin/activate  # start environment
pip install pylint  # run this first time to install PyLint
pylint $(git ls-files '*.py')
deactivate  # exit environment

@GopiGugan
Copy link
Contributor

@WilliamZekaiWang you do not need to address warnings for the deprecated files in covizu/deprecated. We should be able to ignore that directory when running pylint

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 12, 2024

  • @WilliamZekaiWang you can have free reign to determine what style changes are unnecessary and can be added to a list of exceptions for pylint
  • A lot of the minor changes might be easier to deal with using an automated tool such as autopep8, although I would try this out on one script file first and run the unit tests to make sure that nothing has changed functionally

@WilliamZekaiWang
Copy link
Collaborator

I've used autopep8 for a lot of the files, it seems pretty good for the most part. The way autopep8 works is that you can control how aggressive it is with its changes by adding --aggressive however many times you want. I found that with no aggressives, it only removes white space at the end of lines, but as you add 2, it starts to introduce line breaks for lines exceeding > 100 characters in length (extra aggressives seemed to do nothing). I used 2 --aggressive for the files and it seemed to keep most things working!

There are a few things that I realized autopep8 can't change which I'm still manually editing:

  • Wildcard imports
  • Vague Exceptions
  • Snake case/All caps for constant naming conventions
  • Docstrings
  • Too many arguments in the function (>5)
  • Too many local variables in the function (>15)
  • Too many nested blocks (>5)
  • Redefining local variables

Sounds like a lot, but it does genuinely lighten the workload

I also used flynt to auto change .format to f-strings. It doesn't work too well, but it can get some of them.

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Mar 26, 2024

Thanks @WilliamZekaiWang for the update!

@WilliamZekaiWang
Copy link
Collaborator

finishing up now, I have finished all the files in the main directory and the covizu directory. They all seemed to pass the unittests as well

generally, there's one small thing that I'm iffy on changing:

def retrieve_genomes(
        by_lineage_in,
        known_seqs,
        ref_file,
        outgroup=None,
        earliest=True,
        callback=None):

For this function in timetree.py, pylint doesn't like it when there are greater than 5 arguments. However, I'm not sure how to change functions in these scenarios. One thought I had is that I change a few of these arguments to be a dictionary, such that when we call it:

retrieve_genomes(
        by_lineage_in,
        known_seqs,
        ref_file,
        {outgroup: None,
        earliest: True,
        callback: None})

also, pylint requires a docstring at the top of the file to explain what it does. I'm not quite sure what to add for most of the files

file specifically, there were 2 things I couldn't quite figure out. The first I can't find that error existing. The second error, I assumed value error?

batch.py:299:8: E0611: No name 'DuplicateDatabase' in module 'psycopg2.errors' (no-name-in-module)

line 299: from psycopg2.errors import DuplicateDatabase
beadplot.py:253:15: W0718: Catching too general exception BaseException (broad-exception-caught)
        
try:
   ctree = Phylo.read(args.tree, 'newick')
except ValueError as e:

@GopiGugan
Copy link
Contributor

  • Regarding the number of arguments for functions, we can increase the threshold to 6
  • Please list out the functions you aren't sure about and we can work on adding docstrings together
  • pyscopg2 - version 2.9.9 - We'll have to see if DuplicateDatabase was deprecated

@WilliamZekaiWang
Copy link
Collaborator

WilliamZekaiWang commented May 15, 2024

I have finished pep8-ing all files including utils and unittests

I think it's ok if we don't have docstrings for unittests. There are also a few TODO/FIXMEs in the files which raises a concern on pylint

specific functions/classes I don't really understand that need docstring:

  • pangolin class in pangolin_utils.py
  • 'SC2locator' in 'seq_utils.py'

Another except that I'm not sure the specific catch would be:

try:
robjects.r('''
set.seed(123456)
tree = read.nexus(tree_filename)
sequence_labels = read.csv(sequence_labels_file)
colnames(sequence_labels) = c("index", "value")
#Adjust tree to include branches of length 0 on identical sequences
tip_count = table(sequence_labels$index)
add_tip_count = data.frame(tip_count - 1)
for (tip_place_in_table in 1:nrow(add_tip_count)){
tip_name = add_tip_count[tip_place_in_table,1]
freq = add_tip_count[tip_place_in_table,2]
if(freq != 0){
for (counter in 1:freq){
tree <- bind.tip(tree, paste0(tip_name,"_", counter), edge.length = 0,
where=which(tree$tip.label == tip_name))
}
}
}
#Run skyline estimation
alpha = betacoal.maxlik(tree)
skyline = (skyline.multi.phylo(tree, alpha$p1))
#Output skyline estimation
pop_sizes <- head(skyline$population.size, n = 5)
mean_pop_size <- mean(pop_sizes, na.rm = TRUE)
''')
ne_out = list(robjects.r('mean_pop_size'))[0]
except Exception as error:
print(f'Error in finding Ne, {error}')
ne_out = ''
# Remove the temporary file
os.remove(tree_filename.name)
return ne_out

Functions that have greater than 6 arguments:

  • filter_problematic in gisaid_utils.py
  • make_beadplots in batch_utils.py
  • Callback's __init__ in progress_utils.py,

Functions that have too many local variables (>15) that I couldn't really figure out a way to break down:

  • make_beadplots in batch_utils.py
  • parse_mutations in seq_utils.py
  • stream_local in local.py.

make_beadplots also has too many branches

I should also add I tried my best to solve the >15 local variables by adding more function. I think most of them help with readability except parse_nexus in timetree.py, which I think I made more messy

2 classes had too few public methods:

  • SC2locator in seq_utils.py
  • Callback in progress_utils.py

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 21, 2024

  • increase threshold for number of arguments that can be passed to a function
  • force pylint to skip make_beadplots and other functions that have too many local variables or branches
  • write unit tests for functions that don't have them!

@ArtPoon
Copy link
Contributor Author

ArtPoon commented May 21, 2024

I forgot that we are in the middle of refactoring make_beadplots.py anyways in another branch, so pylint isn't wrong, the function is hideous (#497)
Let's wait until these changes are merged into dev

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 4, 2024

  • for testing a function that returns a Bio.Phylo.BaseTree object, I use the BaseTree or Clade constructor and manually build a small test fixture to compare against the function output
  • for scripts/mut_annotations_edit.py, a constellation is a combination of mutations that is characteristic of a specific SARS-CoV-2 lineage

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 4, 2024

You might find examples of unit tests for tree objects in

""" Returns a Phylo.BaseTree object given Newick string """

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jun 11, 2024

Wait for PR #529 to be resolved before making a new PR from this branch into dev

@GopiGugan
Copy link
Contributor

@WilliamZekaiWang to create PR

@ArtPoon
Copy link
Contributor Author

ArtPoon commented Jul 16, 2024

Next step will be to issue a PR from dev to master, for which we will want to have set up GitHub actions for screening PEP8 compliance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants