Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

feature to explode join columns #80

Open
wants to merge 1 commit into
base: LS/link_between_columns_kept
Choose a base branch
from

Conversation

MarionBla
Copy link
Contributor

The columns concerned with join need to be in row view (and not in list)
But it is necesary to keep the link between this(or these) column(s) and the others
The idea is to use the trees created to get the link between columns and the join column in the dataframe.

@MarionBla MarionBla requested a review from SaulLu August 28, 2020 15:12
Copy link
Contributor

@Jasopaum Jasopaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand anything about the logic but I left a few comments about how we can write stuff in Python, it would make the code a bit easier to read

Comment on lines +274 to +275
for j in range(i + 1, len(elements)):
element2 = elements[j]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write for element2 in elements[i+1:]:

if degres[j] > 0:
if isinstance(self.df[cols[j]].iloc[0], list):
lst_cols.append(cols[j])
degres[j] = degres[j] - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is both in the if and the else so you can put it outside
Also a = a - 1 can be written a -= 1 in Python

for index, row in df.iterrows():
for col in lst_cols:
for element in row[col]:
dico[col].append(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole function can be written

from collections import defaultdict

def to_list(self, df, lst_cols):
    dict = defaultdict(list)
    for _, row in df.iterrows():
        for col in lst_cols:
            dict[col].extend(row[col])
    return dict

But aren't you mixing the elements from the different rows here?

dict_degre = self.compute_degres(element)
if dict_degre:
cols = [self.elements.get_element(i).col_name for i in dict_degre.keys()]
degres = [dict_degre[i] for i in dict_degre.keys()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to degres = [val for i in dict_degre.values()]

for element in self.elements.get_subset_elements(goal="join"):
dict_degre = self.compute_degres(element)
if dict_degre:
cols = [self.elements.get_element(i).col_name for i in dict_degre.keys()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i in dict_degre will run through the keys (.keys() is not needed)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants