You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ignore columns in the annotation table that have only one unique value.
data=phdc.ProteomicsData(
phospho=phospho,
protein=protein,
normed_phospho=normed_phospho,
modules=modules,
possible_regulator_list=possible_regulator_list,
)
data.add_annotations(
#Filtering for tumor samples incidentally also makes it so all values in the Type column are 'Tumor'annotations.loc[annotations.Type=='Tumor'],
pd.Series(col_types)
)
data.calculate_annotation_association(cat_method='RRA', cont_method='spearmanr')
Either calculate_annotation_association should ignore columns with only a single unique value, or add_annotations should treat them differently, e.g. drop them in their own attribute:
Columns in the annotation DataFrame that are made up of only a single unique value will result in a KeyError because after converting each column to a dummy variable, those columns that only had a single value to begin with will only have True as a single value.
As a result, when calculate_annotation_association tries to pull the True and False rows for such a column, it does not find any rows containing False:
KeyError Traceback (most recent call last)
<ipython-input-47-e84a77b246a7> in <module>
1 data.add_annotations(annotations.loc[phospho.columns], pd.Series(col_types))
----> 2 data.calculate_annotation_association(cat_method='RRA', cont_method='spearmanr')
/gpfs/data/ruggleslab/phosphodisco/phosphodisco/phosphodisco/classes.py in calculate_annotation_association(self, cat_method, cont_method, **multitest_kwargs)
502 cat_annots = self.categorical_annotations
503
--> 504 cat = categorical_score_association(
505 cat_annots,
506 self.module_scores,
/gpfs/data/ruggleslab/phosphodisco/phosphodisco/phosphodisco/annotation_association.py in categorical_score_association(annotations, module_scores, cat_method, **test_kws)
151 temp = annotations[col].reset_index()
152 temp = temp.groupby(col)[indname].apply(list)
--> 153 results[col] = scores.apply(
154 compare_fn,
155 axis=1
~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
6876 kwds=kwds,
6877 )
-> 6878 return op.get_result()
6879
6880 def applymap(self, func) -> "DataFrame":
~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/apply.py in apply_standard(self)
293
294 try:
--> 295 result = libreduction.compute_reduction(
296 values, self.f, axis=self.axis, dummy=dummy, labels=labels
297 )
pandas/_libs/reduction.pyx in pandas._libs.reduction.compute_reduction()
pandas/_libs/reduction.pyx in pandas._libs.reduction.Reducer.get_result()
/gpfs/data/ruggleslab/phosphodisco/phosphodisco/phosphodisco/annotation_association.py in <lambda>(row)
146
147 compare_fn = lambda row: categorial_methods[cat_method](
--> 148 row[temp[True]], row[temp[False]], **test_kws
149 )[1]
150 for col in annotations.columns:
~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
869 key = com.apply_if_callable(key, self)
870 try:
--> 871 result = self.index.get_value(self, key)
872
873 if not is_scalar(result):
~/scratch/miniconda3/envs/phdis/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4402 k = self._convert_scalar_indexer(k, kind="getitem")
4403 try:
-> 4404 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4405 except KeyError as e1:
4406 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: False
```
The text was updated successfully, but these errors were encountered:
add_annotations should probably also separate out any categorical columns where every value is different - as can be true for e.g. Sample.ID, Patient.ID. These columns add a disproportionate amount of meaningless statistical tests to our statistical experiments, which comes back to bite us when we correct for multiple testing.
Expected behavior
Ignore columns in the annotation table that have only one unique value.
Either
calculate_annotation_association
should ignore columns with only a single unique value, oradd_annotations
should treat them differently, e.g. drop them in their own attribute:Observed behavior
Columns in the annotation DataFrame that are made up of only a single unique value will result in a
KeyError
because after converting each column to a dummy variable, those columns that only had a single value to begin with will only haveTrue
as a single value.As a result, when calculate_annotation_association tries to pull the
True
andFalse
rows for such a column, it does not find any rows containingFalse
:The text was updated successfully, but these errors were encountered: