Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECOD and COPOD decision functions switched #561

Open
Hellsice opened this issue Apr 30, 2024 · 3 comments
Open

ECOD and COPOD decision functions switched #561

Hellsice opened this issue Apr 30, 2024 · 3 comments

Comments

@Hellsice
Copy link

Hellsice commented Apr 30, 2024

fixed the problem in my latest pull request. But i'll leave this here for context if needed, and to get an answer on the question about aggregating.

I found a small inconsistency between the ECOD and COPOD decision functions in the repository and how they were explained in their corresponding papers.

The COPOD paper (arXiv:2009.09463v1) specifies: "We take the maximum of the negative log of the probability generated by the left tail empirical copula, right tail empirical copula and skewness corrected empirical copula to be the outlier score."
yet the code shows:
self.O = np.maximum(self.U_skew, np.add(self.U_l, self.U_r) / 2)

which would be an aggregation of the left and right ECDF.

the ECOD paper (arXiv:2201.00382v3) specifies: "we aggregate its tail probabilitieŝ F_left and F_right to come up with a final outlier score."

yet the code shows:
self.O = np.maximum(self.U_l, self.U_r)
self.O = np.maximum(self.U_skew, self.O)

which is the maximum between left, right and SC.

Could they have been switched at some point?

And a question regarding the aggregating. What is the benefit of average aggregating opposed to addition?
Why not use addition as the data is normalized afterwards anyway.
As for the outliers, if i have heavy left-tailed data, the calculated neglog of the left-tailed cdf will be much higher than the neglog of the right-tailed one, to which it would be negligible. It is also very similar to the skew correction

@Lucew
Copy link
Contributor

Lucew commented Jun 3, 2024

Hi! Have you checked #453? I made a pull request regarding this issue #493. I will look at this later in more detail, but maybe you are quicker. Perhaps there are some parallels.

@Hellsice
Copy link
Author

Hellsice commented Jun 3, 2024

Hi! Have you checked #453? I made a pull request regarding this issue #493. I will look at this later in more detail, but maybe you are quicker. Perhaps there are some parallels.

Hadn't seen that one, but looking at it, it seems like https://github.com/yzhao062/pyod/issues/453 found the same problem i did. Though they also have some problems with the skew correction, on which i can't comment as i am just a bachelor student using this repo for my thesis and really don't understand the topic well enough. I just noticed that COPOD and ECOD decision functions seemed to have been swapped.

@Lucew
Copy link
Contributor

Lucew commented Jun 4, 2024

I also checked again. You are correct, the wording for both algorithms is different. Confusingly, the Pseudocode in algorithm 1 (in both papers) looks similar. Currently, I don't have time to look further into that, but there seems to be some mix-up in the implementation as well. Maybe, there are additional fixes necessary

Discussion #548 seems to go in a similar direction, I will be watching this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants