Develop is46 #48

tacazares · 2021-08-11T16:28:33Z

This pull request has made changes to the training data generator to fix the bug in #46. The fix that was implemented is here:

            try:
                # Get the target matrix
                target_vector = np.array(binding_stream.values(chrom_name, start, end)).T

            except:
                # TODO change length of array
                target_vector = np.zeros(1024)

Additional updates to the docs were made. I also made changes to the setup.py file by including the import of pyyaml. There is still an error with sklearn and the new updates that is fixed modifying the import statements in modisco.

…al error This push adds a flag to training that will allow you randomly generate the reverse complement examples for training in addition to the reference samples. The pyyaml req was added. The invalid interval bounds error was also patched by trying to get the input matrix and passing a matrix of zeros if the interval is not found.

FaizRizvi · 2021-08-13T19:47:48Z

docs/train.md

+### `--shuffle_cell_type`
+
+If shuffle_cell_type, then shuffle training ROI cell type label. Default: `False`
+


Nice addition Tareain! This should make it easier to understand for the user.

FaizRizvi · 2021-08-13T19:48:21Z

maxatac/analyses/train.py

@@ -115,7 +116,8 @@ def run_training(args):
                            quant=args.quant,
                            batch_size=args.batch_size,
                            target_scale_factor=args.target_scale_factor,
-                            shuffle_cell_type=args.shuffle_cell_type
+                            shuffle_cell_type=args.shuffle_cell_type,
+                            rev_comp_train=args.rev_comp
                            )

    # Create keras.utils.sequence object from validation generator


Adding the rev_comp flag will help make our publication figures.

FaizRizvi · 2021-08-13T19:55:15Z

maxatac/utilities/training_tools.py

+            # Our workaround for issue#42 is to provide a zero matrix for that position
+            try:
+                # Get the target matrix
+                target_vector = np.array(binding_stream.values(chrom_name, start, end)).T


I have a feeling that we do not need to Transpose this at the end.

FaizRizvi · 2021-08-13T19:56:15Z

maxatac/utilities/training_tools.py

-                    targets_batch.append(bin_vector)
+            except:
+                # TODO change length of array
+                target_vector = np.zeros(1024)


This should be the fix for the issue. Do we need to count the number of times this part is executed in the handful of TFs where it actually is implemented? (Might be overkill)

FaizRizvi · 2021-08-13T19:57:55Z

setup.py

@@ -24,24 +24,23 @@ def get_description():
      license="Apache-2.0",
      include_package_data=True,
      packages=find_packages(),
-      install_requires=["tensorflow==2.5.0",#add -gpu
+      install_requires=["tensorflow==2.5.0",


Should we be installing "tensorflow==2.5.0" and "tensorflow-gpu==2.5.0"? or should the user decide on his own?

FaizRizvi · 2021-08-13T19:58:09Z

setup.py

                        "deeplift",
                        "seaborn",
+                        "pyyaml",


Nice addition!

FaizRizvi

Nice code Tareian! I think it looks better from before.

FaizRizvi · 2021-08-13T19:59:31Z

setup.py

                        "deeplift",
                        "seaborn",
+                        "pyyaml",
                        "graphviz",
                        "shap @ git+https://github.com/AvantiShri/shap.git@master#egg=shap",
                        "modisco @ git+https://github.com/XiaotingChen/tfmodisco.git@0.5.9.2#egg-modisco"


Can we ask Xiaoting to modify his install of modisco so we do not run into the sklearn issues when intalling maxatac? (detailed in Issue #47)

Cazares, Tareian added 6 commits August 6, 2021 15:11

cleanup docs

1e48b35

Added pybedtools import

87c14eb

Update training_tools.py

627c683

Update docs

33e1e8b

Update README.md

b227f68

FaizRizvi linked an issue Aug 11, 2021 that may be closed by this pull request

Reverse_complement training flag #45

Closed

tacazares linked an issue Aug 11, 2021 that may be closed by this pull request

Invalid Interval Bounds Error When Training #46

Closed

tacazares requested a review from FaizRizvi August 13, 2021 19:00

tacazares merged commit 1bd9727 into develop Aug 13, 2021

tacazares deleted the develop_IS46 branch August 13, 2021 19:09

FaizRizvi reviewed Aug 13, 2021

View reviewed changes

setup.py

"deeplift",

"seaborn",

"pyyaml",

Copy link

Member

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition!

FaizRizvi reviewed Aug 15, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop is46 #48

Develop is46 #48

tacazares commented Aug 11, 2021

FaizRizvi Aug 13, 2021

FaizRizvi Aug 13, 2021

FaizRizvi Aug 13, 2021

FaizRizvi Aug 13, 2021

FaizRizvi Aug 13, 2021

FaizRizvi Aug 13, 2021

FaizRizvi left a comment

FaizRizvi Aug 13, 2021

		### `--shuffle_cell_type`

		If shuffle_cell_type, then shuffle training ROI cell type label. Default: `False`

Develop is46 #48

Develop is46 #48

Conversation

tacazares commented Aug 11, 2021

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment

FaizRizvi left a comment

Choose a reason for hiding this comment

FaizRizvi Aug 13, 2021

Choose a reason for hiding this comment