Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How I Can Apply Inverse Transform #1858

Closed
mustfkeskin opened this issue Aug 4, 2023 · 4 comments
Closed

How I Can Apply Inverse Transform #1858

mustfkeskin opened this issue Aug 4, 2023 · 4 comments
Labels
question Further information is requested

Comments

@mustfkeskin
Copy link

How I can return to my original feature set.
What is the equivalent of the inverse transform function in scikit-learn here?

After train retrieval model i want to know my original user and itemid :)

@mustfkeskin mustfkeskin added the question Further information is requested label Aug 4, 2023
@rnyak
Copy link
Contributor

rnyak commented Aug 7, 2023

@mustfkeskin your mapped values are stored in the categories folder when you run Categorify op. For your item_id column you will see unique.tem_id parquet file under the categories folder. Categorify op does the mapping as follows:

  • 0 is reserved for padding. so you should not have any 0 in your transformed data
  • 1 is reserved for Nulls. so if you have any nulls in any categorical columns, they are mapped to 1
  • OOVs are mapped to 2
  • the regular encoding starts from 3. the most frequent item in a categorical col is encoded as 3 , the second most frequent as 4 , so on so fort..

the index column is your encoded values. From there you can write a simple pandas mapping script to revert back the encoded ids to original ids.

@mustfkeskin
Copy link
Author

Thank u @rnyak
This solved my problem.
This is a problem for newbies like me. There was no example of this in the tutorials.

unique_query_sku_df = pd.read_parquet("../data/categories/categories/unique.query_sku.parquet")
unique_query_sku_df["index"] = unique_query_sku_df.index
unique_query_sku_df.head()


query_embs_df = pd.merge(query_embs_df,
                         unique_query_sku_df, 
                         how="inner",
                         on="index")
query_embs_df = query_embs_df[["query_sku", "embeddings"]]
query_embs_df.columns = ["id", "embeddings"]

@hkristof03
Copy link

@rnyak would it be possible to include the reverse transformation of the item IDs as a built-in utility mechanism?

@CarloNicolini
Copy link

@rnyak would it be possible to include the reverse transformation of the item IDs as a built-in utility mechanism?

I also take advantage of the categories/unique.feature.parquet file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants