ECOD implementation does not correspond to what is described in the paper (and as it stands the skewness part does nothing here because of it) #453

L-F-A · 2022-11-03T19:47:19Z

Summary:

Looking at the implementation of decision_function (same for the parallel version) in ecod.py, this is not what is described in the paper (https://arxiv.org/abs/2201.00382).

Furthermore, as a side effect of this implementation, the score obtained when considering the skewness does absolutely nothing ever aka:

self.O = np.maximum(self.U_l, self.U_r)
self.O = np.maximum(self.U_skew , self.O)

will always give the same as just (one can always comment # self.O = np.maximum(self.U_skew , self.O) and see without even having to think about it that I am right)

self.O = np.maximum(self.U_l, self.U_r)

This is a big problem in a sense that we are doing something for nothing. The method as is can be useful, but it is not ECOD as described in the paper.

Explanation:

Here is a in depth explanation:

This correctly calculates the negative log of the cdf left, right and skew for each feature of each data point.
However, the next operation will choose for each feature of each datapoint which of the left or right score is maximum for that feature and data point and forget the other one.

This is not what is described in the paper. Here (line 153) we are left with a matrix where each element is either the max left score or max right score for each feature. There is no notion of max at the feature level in the paper as far as I can tell, it is at the data point level. Then the next line of code does not do anything at all.

Indeed, self.U_skew as calculated (see line 150 and 151 above) is a matrix where each element is either the feature left or right score depending on the skewness sign of that feature. But each element of self.O is already the max of either the left or right score of that feature and thus taking the max again with either left or right score from skew just give the same answer, we already have the max for that feature. Therefore, this way of implementing the score eliminates any effect of the skew score, which I thought was one of the main points of the paper.

Finally, the sum for each feature is taken aka for a row (data point), the sum is a mix of left and right score as each feature has different max:

This is not what is described in Eqs. 4, 5 and 6 of the paper.

In conclusion:

PYOD:

The implementation here is: Find a left and right score for each feature of a data point, keep only the max one for each feature and then sum these scores together. This is kind of inverting additions and max operations as compared to the paper.

PAPER
In the paper it is rather: Find a left, right and skew (left or right depending on sign of skewness) score for each feature of a data point. Independently sum these scores over the features, left with left, right with right and skew with skew. Choose the max of these three values for each datapoint. This is very different than what is implemented.

Discussion:

Seems that the way to implement what is in the paper if one wants that algorithm would be to replace the code of lines 153 to 159 with something like:

We sum across features independently left, right, skew

self.O_left = self.U_l.sum(axis=1)
self.O_right = self.U_r.sum(axis=1)
self.O_skew = self.U_skew.sum(axis=1)

For each row we keep the max value of score between left, right and skew

O_tmp = np.concatenate((self.O_left.reshape(-1, 1), self.O_right.reshape(-1, 1), self.O_skew.reshape(-1, 1)), axis=1)

Self.O = np.max(O_tmp, axis=1)

Decision score is just self.O

If hasattr(self, ‘X_train’):
Decision_scores_ = self.O[-original_size:]
else:
Decision_scores_ = self.O

Lucew · 2022-12-02T21:42:31Z

General

Hey Louis, Hey Yue,

I went through this issue, read the paper, and compared the formulas. IMHO Louis is correct. The summation definitely comes before the maximum selection (see steps 6 and 7 below) and should not be done per feature tail probability but on the cumulated log probability.

Please extend the collapsible to see the steps I'm referring to as a screenshot from the original paper.

In the algorithmic overview, the left tail probability U_l and right tail probability U_r are clearly computed by summing over j, which is the feature dimension index. This is done in step 6. Only after that (in step 7) is the maximum selected. Additionally, it also makes logical sense to first build the joint (log) probability of a certain sample with j features and then search for the maximum probability of being an outlier.

IMHO it is the same case for the COPOD algorithm! If we decide this is unintentional and fix this, we should touch COPOD as well. See the algorithm excerpt and code in the collapsible below. One can quickly see COPOD is implemented quite similarly and has the same behavior.

Copied from the COPOD Implementation

self.U_l = -1 * np.log(np.apply_along_axis(ecdf, 0, X))
self.U_r = -1 * np.log(np.apply_along_axis(ecdf, 0, -X))

skewness = np.sign(skew(X, axis=0))
self.U_skew = self.U_l * -1 * np.sign(skewness - 1) + self.U_r * np.sign(skewness + 1)
self.O = np.maximum(self.U_skew, np.add(self.U_l, self.U_r) / 2)

Additional remarks

Other than that, I think even the U_Skew array is not computed entirely correctly. But this is only very minor. The problem here is that we add up U_r and U_l in the case where the estimated skewness of a feature column is zero. For an explanation extend the collapsible below.

Let's analyze the code:

self.U_l = -1 * np.log(np.apply_along_axis(ecdf, 0, X))
self.U_r = -1 * np.log(np.apply_along_axis(ecdf, 0, -X))

skewness = np.sign(skew(X, axis=0))

self.U_skew = self.U_l * -1 * np.sign(skewness - 1) + self.U_r * np.sign(skewness + 1)

self.O = np.maximum(self.U_l, self.U_r)
self.O = np.maximum(self.U_skew, self.O)

The variable skewness refers to a vector with as many values as we have features per sample. Let's check skewness = np.sign(skew(X, axis=0)):

skew < 0 -> np.sign(skew) = -1
skew = 0 -> np.sign(skew) = 0
skew > 1 -> np.sign(skew) = 1

The algorithm states we should select left tail probability or right probability per sample and feature based on the feature skewness.

Further left tail probability U_l should be selected if skewness is negative. The right tail probability U_r should be selected if skewness is zero or positive. Let's check when np.sign(skewness - 1) and np.sign(skewness + 1) which are used in self.U_skew = self.U_l * -1 * np.sign(skewness - 1) + self.U_r * np.sign(skewness + 1):

skew < 0 -> skewness = -1
a) np.sign(skewness - 1) = np.sign(-1-1) = -1 -> U_l gets selected
b) np.sign(skewness + 1) = np.sign(-1+1) = 0 -> U_r omitted
skew = 0 -> skewness = 0
a) np.sign(skewness - 1) = np.sign(0-1) = -1 -> U_l gets selected
b) np.sign(skewness + 1) = np.sign(0+1) = 1 -> U_r gets selected
skew > 0 -> skewness = 1
a) np.sign(skewness - 1) = np.sign(1-1) = 0 -> U_l gets omitted
b) np.sign(skewness + 1) = np.sign(1+1) = 1 -> U_r gets selected

So in the second case, both U_l and U_r get selected, which is not as written in the paper. The algorithm there states in the case of skewness = 0 U_r should get selected. I know this is a very unlikely case as skewness would need to be exactly and numerical zero which is most likely never the case for an empirically estimated skewness. But I thought if we discuss how to fix the ECOD we should consider everything.

I would maybe fix it like this

skewness = skew(X, axis=0)
self.U_skew = self.U_l
self.U_skew[skewness >= 0] = self.U_l[skewness >= 0]

or

skewness_weight = skew(X, axis=0) < 0
self.U_skew = self.U_l *  skewness_weight + self.U_r * (~skewness_weight)

I also have ideas on how to rewrite the estimated ECDF in order to get rid of the statsmodels dependency and only have NlogN runtime instead of 2NlogN as it is currently. This would require implementing the ECDF by myself which is doable.

If I have time this weekend, I might work on this!
Nice catch Louis!

Best regards
Lucas

yzhao062 · 2022-12-04T17:06:35Z

This is nice. PR welcomed!!!!

Lucew · 2022-12-20T16:50:31Z

Hey!

Unfortunately, it took me a while...Christmas and Advent of Code got in the way.

I just now implemented an ECDF estimator that runs in less time than the previous one since it reuses the feature columns instead of searching for each value in them.

I also have ideas on how to rewrite the estimated ECDF in order to get rid of the statsmodels dependency and only have NlogN runtime instead of 2NlogN as it is currently. This would require implementing the ECDF by myself which is doable.

I will open another issue and a pull request for that once the CI tests are successful. This would yield the possibility of completely getting rid of the statsmodels dependency.

After that, I will now go and fix the issue we discussed here.

Lucas

This was referenced Dec 20, 2022

Statsmodels package is only used for one function #466

Open

Implementing ECDF Estimator and deleting Statsmodels dependency #467

Merged

Lucew pushed a commit to Lucew/pyod that referenced this issue Dec 21, 2022

fix ecod according to issue yzhao062#453

3ea3b49

Lucew pushed a commit to Lucew/pyod that referenced this issue Mar 30, 2023

fix ecod according to issue yzhao062#453

792810a

Lucew pushed a commit to Lucew/pyod that referenced this issue Mar 30, 2023

fix copod according to issue yzhao062#453

d04e566

Lucew mentioned this issue Mar 30, 2023

Fix ECOD and COPOD implementation according to #453 #493

Open

7 tasks

Lucew mentioned this issue Jun 3, 2024

ECOD and COPOD decision functions switched #561

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECOD implementation does not correspond to what is described in the paper (and as it stands the skewness part does nothing here because of it) #453

ECOD implementation does not correspond to what is described in the paper (and as it stands the skewness part does nothing here because of it) #453

L-F-A commented Nov 3, 2022

Lucew commented Dec 2, 2022 •

edited

Loading

yzhao062 commented Dec 4, 2022

Lucew commented Dec 20, 2022

ECOD implementation does not correspond to what is described in the paper (and as it stands the skewness part does nothing here because of it) #453

ECOD implementation does not correspond to what is described in the paper (and as it stands the skewness part does nothing here because of it) #453

Comments

L-F-A commented Nov 3, 2022

We sum across features independently left, right, skew

For each row we keep the max value of score between left, right and skew

Decision score is just self.O

Lucew commented Dec 2, 2022 • edited Loading

General

Additional remarks

yzhao062 commented Dec 4, 2022

Lucew commented Dec 20, 2022

Lucew commented Dec 2, 2022 •

edited

Loading