Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/transf date #29

Merged
merged 3 commits into from
Sep 23, 2022
Merged

Feature/transf date #29

merged 3 commits into from
Sep 23, 2022

Conversation

armgilles
Copy link
Contributor

Allow datetime column in Eurybia

Reference Issues/PRs : #28

Signed-off-by: Gillesa <arm.gilles@gmail.com>
Signed-off-by: Gillesa <arm.gilles@gmail.com>
@ThomasBouche
Copy link
Collaborator

Great !

Maybe your transformation is not in method _analyze_consistency, but before, to be more clearer on a transformation of datasets.
you can add a method to do all your transformation and execute in the compile

Signed-off-by: Gillesa <arm.gilles@gmail.com>
@armgilles
Copy link
Contributor Author

  • Create a method to check columns datetime before _analyze_consistency
  • Fix name doublon in test.
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from eurybia.data.data_loader import data_loading
from eurybia import SmartDrift


house_df, house_dict = data_loading('house_prices')
house_df_learning = house_df.loc[house_df['YrSold'] == 2006]
house_df_2007 = house_df.loc[house_df['YrSold'] == 2007]

y_df_learning=house_df_learning['SalePrice'].to_frame()
X_df_learning=house_df_learning[house_df_learning.columns.difference(['SalePrice','YrSold'])]
y_df_2007=house_df_2007['SalePrice'].to_frame()
X_df_2007=house_df_2007[house_df_2007.columns.difference(['SalePrice','YrSold'])]

# Create random columns dates
X_df_learning['random_col_date'] = np.random.choice(pd.date_range(start='01/01/2000', end='31/12/2006'), size=len(X_df_learning))
X_df_learning['other_random_col_date'] = np.random.choice(pd.date_range(start='01/01/2000', end='31/12/2006'), size=len(X_df_learning))

X_df_2007['random_col_date'] = np.random.choice(pd.date_range(start='01/01/2007', end='31/12/2007'), size=len(X_df_2007))
X_df_2007['other_random_col_date'] = np.random.choice(pd.date_range(start='01/01/2007', end='31/12/2007'), size=len(X_df_2007))

# Just a Random models
regressor = RandomForestRegressor(n_estimators=2).fit(X_df_learning[['1stFlrSF', '2ndFlrSF']],
                                                      y_df_learning
)

# Should be ok & informed user of transformation
SD = SmartDrift(df_current=X_df_2007,
                df_baseline=X_df_learning,
                dataset_names={"df_current": "2007 dataset", "df_baseline": "Learning dataset"}
               )
SD.compile()
# Column random_col_date will be dropped and transformed in df_current by : random_col_date_year, random_col_date_month, random_col_date_day
# Column other_random_col_date will be dropped and transformed in df_current by : other_random_col_date_year, other_random_col_date_month, other_random_col_date_day
# Column random_col_date will be dropped and transformed in df_baseline by : random_col_date_year, random_col_date_month, random_col_date_day
# Column other_random_col_date will be dropped and transformed in df_baseline by : other_random_col_date_year, other_random_col_date_month, other_random_col_date_day

# Should raise error
SD = SmartDrift(df_current=X_df_2007,
                df_baseline=X_df_learning,
                deployed_model=regressor,
                dataset_names={"df_current": "2007 dataset", "df_baseline": "Learning dataset"}
               )
SD.compile()
# TypeError: df_current have datetime column. You should drop it

@ThomasBouche
Copy link
Collaborator

Great, Thanks for you contribution !

@ThomasBouche ThomasBouche merged commit f60be26 into MAIF:master Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants