-
Notifications
You must be signed in to change notification settings - Fork 2
feature to explode join columns #80
base: LS/link_between_columns_kept
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand anything about the logic but I left a few comments about how we can write stuff in Python, it would make the code a bit easier to read
for j in range(i + 1, len(elements)): | ||
element2 = elements[j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write for element2 in elements[i+1:]:
if degres[j] > 0: | ||
if isinstance(self.df[cols[j]].iloc[0], list): | ||
lst_cols.append(cols[j]) | ||
degres[j] = degres[j] - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is both in the if and the else so you can put it outside
Also a = a - 1
can be written a -= 1
in Python
for index, row in df.iterrows(): | ||
for col in lst_cols: | ||
for element in row[col]: | ||
dico[col].append(element) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole function can be written
from collections import defaultdict
def to_list(self, df, lst_cols):
dict = defaultdict(list)
for _, row in df.iterrows():
for col in lst_cols:
dict[col].extend(row[col])
return dict
But aren't you mixing the elements from the different rows here?
dict_degre = self.compute_degres(element) | ||
if dict_degre: | ||
cols = [self.elements.get_element(i).col_name for i in dict_degre.keys()] | ||
degres = [dict_degre[i] for i in dict_degre.keys()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to degres = [val for i in dict_degre.values()]
for element in self.elements.get_subset_elements(goal="join"): | ||
dict_degre = self.compute_degres(element) | ||
if dict_degre: | ||
cols = [self.elements.get_element(i).col_name for i in dict_degre.keys()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in dict_degre
will run through the keys (.keys() is not needed)
The columns concerned with join need to be in row view (and not in list)
But it is necesary to keep the link between this(or these) column(s) and the others
The idea is to use the trees created to get the link between columns and the join column in the dataframe.