calculate_annotation_association chokes on columns with only one unique value #1

tobsecret · 2020-09-09T21:53:44Z

Expected behavior

Ignore columns in the annotation table that have only one unique value.

data = phdc.ProteomicsData(
    phospho = phospho,
    protein = protein,
    normed_phospho = normed_phospho,
    modules = modules,
    possible_regulator_list = possible_regulator_list,
)
data.add_annotations(
    #Filtering for tumor samples incidentally also makes it so all values in the Type column are 'Tumor'
    annotations.loc[annotations.Type=='Tumor'], 
    pd.Series(col_types)
) 
data.calculate_annotation_association(cat_method='RRA', cont_method='spearmanr')

Either calculate_annotation_association should ignore columns with only a single unique value, or add_annotations should treat them differently, e.g. drop them in their own attribute:

################### new code ####################
        self.non_unique_annotations = annotations[
            [column for column in annotations.columns if annotations[column].unique.__len__() ==1]]
        annotations = annotations.drop(self.non_unique_annotations.columns)
###############################################
        self.categorical_annotations = binarize_categorical(
            annotations,
            annotations.columns[column_types == 0]
        )
        self.continuous_annotations = annotations[
            annotations.columns[column_types == 1]
        ].astype(float)

Observed behavior

Columns in the annotation DataFrame that are made up of only a single unique value will result in a KeyError because after converting each column to a dummy variable, those columns that only had a single value to begin with will only have True as a single value.
As a result, when calculate_annotation_association tries to pull the True and False rows for such a column, it does not find any rows containing False:

KeyError                                  Traceback (most recent call last)
<ipython-input-47-e84a77b246a7> in <module>
      1 data.add_annotations(annotations.loc[phospho.columns], pd.Series(col_types))
----> 2 data.calculate_annotation_association(cat_method='RRA', cont_method='spearmanr')

/gpfs/data/ruggleslab/phosphodisco/phosphodisco/phosphodisco/classes.py in calculate_annotation_association(self, cat_method, cont_method, **multitest_kwargs)
    502         cat_annots = self.categorical_annotations
    503 
--> 504         cat = categorical_score_association(
    505             cat_annots,
    506             self.module_scores,

/gpfs/data/ruggleslab/phosphodisco/phosphodisco/phosphodisco/annotation_association.py in categorical_score_association(annotations, module_scores, cat_method, **test_kws)
    151         temp = annotations[col].reset_index()
    152         temp = temp.groupby(col)[indname].apply(list)
--> 153         results[col] = scores.apply(
    154             compare_fn,
    155             axis=1

~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   6876             kwds=kwds,
   6877         )
-> 6878         return op.get_result()
   6879 
   6880     def applymap(self, func) -> "DataFrame":

~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/apply.py in get_result(self)
    184             return self.apply_raw()
    185 
--> 186         return self.apply_standard()
    187 
    188     def apply_empty_result(self):

~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/apply.py in apply_standard(self)
    293 
    294             try:
--> 295                 result = libreduction.compute_reduction(
    296                     values, self.f, axis=self.axis, dummy=dummy, labels=labels
    297                 )

pandas/_libs/reduction.pyx in pandas._libs.reduction.compute_reduction()

pandas/_libs/reduction.pyx in pandas._libs.reduction.Reducer.get_result()

/gpfs/data/ruggleslab/phosphodisco/phosphodisco/phosphodisco/annotation_association.py in <lambda>(row)
    146 
    147     compare_fn = lambda row: categorial_methods[cat_method](
--> 148         row[temp[True]], row[temp[False]], **test_kws
    149     )[1]
    150     for col in annotations.columns:

~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4402         k = self._convert_scalar_indexer(k, kind="getitem")
   4403         try:
-> 4404             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4405         except KeyError as e1:
   4406             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False
```

The text was updated successfully, but these errors were encountered:

tobsecret · 2020-09-10T04:13:04Z

add_annotations should probably also separate out any categorical columns where every value is different - as can be true for e.g. Sample.ID, Patient.ID. These columns add a disproportionate amount of meaningless statistical tests to our statistical experiments, which comes back to bite us when we correct for multiple testing.

tobsecret self-assigned this Sep 9, 2020

tobsecret mentioned this issue Nov 9, 2021

calculate_annotation_association chokes on columns with only one unique value ruggleslab/phosphodisco#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate_annotation_association chokes on columns with only one unique value #1

calculate_annotation_association chokes on columns with only one unique value #1

tobsecret commented Sep 9, 2020

tobsecret commented Sep 10, 2020

calculate_annotation_association chokes on columns with only one unique value #1

calculate_annotation_association chokes on columns with only one unique value #1

Comments

tobsecret commented Sep 9, 2020

Expected behavior

Observed behavior

tobsecret commented Sep 10, 2020