Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop is46 #48

Merged
merged 6 commits into from
Aug 13, 2021
Merged

Develop is46 #48

merged 6 commits into from
Aug 13, 2021

Conversation

tacazares
Copy link
Member

This pull request has made changes to the training data generator to fix the bug in #46. The fix that was implemented is here:

            try:
                # Get the target matrix
                target_vector = np.array(binding_stream.values(chrom_name, start, end)).T

            except:
                # TODO change length of array
                target_vector = np.zeros(1024)

Additional updates to the docs were made. I also made changes to the setup.py file by including the import of pyyaml. There is still an error with sklearn and the new updates that is fixed modifying the import statements in modisco.

Cazares, Tareian added 6 commits August 6, 2021 15:11
…al error

This push adds a flag to training that will allow you randomly generate the reverse complement examples for training in addition to the reference samples. The pyyaml req was added. The invalid interval bounds error was also patched by trying to get the input matrix and passing a matrix of zeros if the interval is not found.
@FaizRizvi FaizRizvi linked an issue Aug 11, 2021 that may be closed by this pull request
@tacazares tacazares linked an issue Aug 11, 2021 that may be closed by this pull request
@tacazares tacazares merged commit 1bd9727 into develop Aug 13, 2021
@tacazares tacazares deleted the develop_IS46 branch August 13, 2021 19:09
### `--shuffle_cell_type`

If shuffle_cell_type, then shuffle training ROI cell type label. Default: `False`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition Tareain! This should make it easier to understand for the user.

@@ -115,7 +116,8 @@ def run_training(args):
quant=args.quant,
batch_size=args.batch_size,
target_scale_factor=args.target_scale_factor,
shuffle_cell_type=args.shuffle_cell_type
shuffle_cell_type=args.shuffle_cell_type,
rev_comp_train=args.rev_comp
)

# Create keras.utils.sequence object from validation generator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the rev_comp flag will help make our publication figures.

# Our workaround for issue#42 is to provide a zero matrix for that position
try:
# Get the target matrix
target_vector = np.array(binding_stream.values(chrom_name, start, end)).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that we do not need to Transpose this at the end.

targets_batch.append(bin_vector)
except:
# TODO change length of array
target_vector = np.zeros(1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the fix for the issue. Do we need to count the number of times this part is executed in the handful of TFs where it actually is implemented? (Might be overkill)

@@ -24,24 +24,23 @@ def get_description():
license="Apache-2.0",
include_package_data=True,
packages=find_packages(),
install_requires=["tensorflow==2.5.0",#add -gpu
install_requires=["tensorflow==2.5.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be installing "tensorflow==2.5.0" and "tensorflow-gpu==2.5.0"? or should the user decide on his own?

"deeplift",
"seaborn",
"pyyaml",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition!

Copy link
Member

@FaizRizvi FaizRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice code Tareian! I think it looks better from before.

"deeplift",
"seaborn",
"pyyaml",
"graphviz",
"shap @ git+https://github.com/AvantiShri/shap.git@master#egg=shap",
"modisco @ git+https://github.com/XiaotingChen/tfmodisco.git@0.5.9.2#egg-modisco"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ask Xiaoting to modify his install of modisco so we do not run into the sklearn issues when intalling maxatac? (detailed in Issue #47)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid Interval Bounds Error When Training Reverse_complement training flag
2 participants