Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic Features #4281

Closed
Fish-Soup opened this issue May 12, 2021 · 3 comments
Closed

Periodic Features #4281

Fish-Soup opened this issue May 12, 2021 · 3 comments

Comments

@Fish-Soup
Copy link

Fish-Soup commented May 12, 2021

Summary

Create periodic features such that there is no discontinuity in feature space where there shouldn't be one. E.g the 365 day of year should be adjacent to the 1st. I would imagine the API would work similar to the specification of categorical features with the additional component of the mimimum and maximum feature value that are equivalent. (E.g hour 0 and 24 of day are the same.)

Motivation

I primarily work with timeseries forecasting and common features we use are hour of day, day of week or day of year. If we take the day of year for example and I use the feature the the 365 day is not adjacent to the 1st day in feature space, but is it is in actuality. The model has to learn these days are likely to be similar rather than starting from that prior. I am often in the position where I have some data for Jan to say May, my prediction for December would have the day of year feature being built off May's data when in fact it should be more like Jan. This would also provide an additional constraint that should help the model fit better in the case of hour of day or day of week. Other periodic features could be angle.

Description

From a user perspective I imagine that we specify which features are periodic and what is the min and max feature values that are equivalent e.g 0 and 24 for hours. This could be done by passing a Dict[feature_name, Tuple[minval, maxval]] in the same way as the categorical features are defined.

Internally in the tree algorithm, in order to split the periodic feature, 2 leaf boundaries would have to be defined initially for a given feature, so the best pair of boudaries would be chosen. After which I imagine the algorithm working as it currently does.
In the hour of day example the optimum first split might be defined with hours 3 and 12, in which case one leaf is hour 3<h<12 and the other is 12<h<24 & 0<h<3.

If linear_tree=True then only one initial split would be required to fit a linear relationship.

References

Periodic constraints have been implemented in the pygam package.
https://pygam.readthedocs.io/en/latest/notebooks/tour_of_pygam.html

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@Fish-Soup
Copy link
Author

@candalfigomoro
I've seen that workaround, it's oknwhennyoubhavr complete days for the range. Buy in my case in often don't have say the second part of the year. And you can get the features creating like thr reverse pattern in the second half of the year.

Thanks though I might try it put again and see how it peforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants