Implemented the ability to train rewards in preference comparison against multiple policies #529

levmckinney · 2022-08-17T05:18:39Z

Description

This pull request adds the MixtureOfTrajectoryGenerators class which allows preference comparison to be trained on data from multiple sources. This includes the ability to train against multiple agents and train against a mixture of data from an agent and a dataset.

To facilitate this I have made modifications to HeirachicalLogger class allowing for arbitrary prefixes to be added so that different trajectory generators logs don't clobber each other.

Testing

Changes to the HeirachicalLogger class have been unit tested. MixtureOfTrajectoryGenerators has also been united test. I've also included new integration tests for the train_preference_comparison.py script to test the use case of having multiple policies.

…gging

codecov · 2022-08-18T21:05:47Z

Codecov Report

Merging #529 (408e248) into master (91c66b7) will increase coverage by 0.03%.
The diff coverage is 99.29%.

❗ Current head 408e248 differs from pull request most recent head 4df8551. Consider uploading reports for the commit 4df8551 to get more accurate results

@@            Coverage Diff             @@
##           master     #529      +/-   ##
==========================================
+ Coverage   97.16%   97.19%   +0.03%     
==========================================
  Files          85       85              
  Lines        7646     7769     +123     
==========================================
+ Hits         7429     7551     +122     
- Misses        217      218       +1

Impacted Files	Coverage Δ
src/imitation/algorithms/preference_comparisons.py	`99.04% <97.82%> (-0.13%)`	⬇️
...ion/scripts/config/train_preference_comparisons.py	`85.71% <100.00%> (+0.38%)`	⬆️
.../imitation/scripts/train_preference_comparisons.py	`97.22% <100.00%> (+0.50%)`	⬆️
src/imitation/util/logger.py	`100.00% <100.00%> (ø)`
tests/algorithms/test_preference_comparisons.py	`100.00% <100.00%> (ø)`
tests/scripts/test_scripts.py	`100.00% <100.00%> (ø)`
tests/util/test_logger.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

AdamGleave · 2022-08-21T03:55:48Z

@levmckinney let me know if there's any particular areas you'd like my feedback on, I'll aim to try and do a high-level review but may not have time to look at it in detail.

@Rocamonde can you do a style/technical review on Monday?

Rocamonde · 2022-08-22T13:09:24Z

When reading through the logging code, I see we are maintaining a somewhat advanced and general logging infrastructure for hierarchical logging. I'm surprised there's not an already existing way to handle this without a bespoke implementation. What are other people doing? (i.e. in similar ML projects) Are our logging needs very specific and different to the way this is normally done? It might be interesting to spin off the logging module into e.g. an SB3 PR, so people don't have to reinvent the wheel every time.

Rocamonde

Some initial comments. Have not looked at the tests yet. Let me know what you think and when/if you implement any changes from my suggestions, and I can take another look.

src/imitation/util/logger.py

Rocamonde · 2022-08-22T13:05:39Z

src/imitation/util/logger.py

@@ -72,27 +74,44 @@ def _update_name_to_maps(self) -> None:
        self.name_to_excluded = self._logger.name_to_excluded

    @contextlib.contextmanager
-    def accumulate_means(self, subdir: types.AnyPath) -> Generator[None, None, None]:
+    def add_prefix(self, prefix: str) -> Generator[None, None, None]:


If the prefix is removed after leaving the context, and only one prefix is supported as input, why are you adding support for a list of prefixes? How and why would I use multiple prefixes? (Is the expectation that I should enter nested prefix contexts, as that might be quite hard to read and understand in practice, e.g. if this happens in different files or function calls.) I also think this prefix/name idea should be explained with an examlpe in e.g. the class-level docstring.

If I'm understanding correctly how this is supposed to be used, entering this should be disallowed if one is using an accumulate_means context, otherwise that would mess with the path where the rest of the logs are being recorded. Do you agree and if so do you think you can add a way to throw an error?

If I'm understanding correctly how this is supposed to be used, entering this should be disallowed if one is using an accumulate_means context, otherwise that would mess with the path where the rest of the logs are being recorded. Do you agree and if so do you think you can add a way to throw an error?

Agreed, I've added a runtime error for this case.

I've added a doctest documenting how to use it.

src/imitation/scripts/train_preference_comparisons.py

src/imitation/algorithms/preference_comparisons.py

AdamGleave · 2022-08-23T03:51:37Z

When reading through the logging code, I see we are maintaining a somewhat advanced and general logging infrastructure for hierarchical logging. I'm surprised there's not an already existing way to handle this without a bespoke implementation. What are other people doing? (i.e. in similar ML projects) Are our logging needs very specific and different to the way this is normally done? It might be interesting to spin off the logging module into e.g. an SB3 PR, so people don't have to reinvent the wheel every time.

What we are doing is kind of weird -- you don't normally have several RL trainers that all need to log to the same place -- but it could come up in some other applications, e.g. population-based training. You could look at how rllib handles this.

…ault to true

levmckinney · 2022-08-24T06:05:24Z

@levmckinney let me know if there's any particular areas you'd like my feedback on, I'll aim to try and do a high-level review but may not have time to look at it in detail.

@AdamGleave Could you look at the changes to the HierarchicalLogger?

src/imitation/algorithms/preference_comparisons.py

Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Juan Rocamonde <juancarlosrocamonde@gmail.com>

Rocamonde

The new changes LGTM. Do complete the docstring (see comment) before merging. Others might want to have a chance to take a look too before then.

src/imitation/util/logger.py

src/imitation/scripts/train_preference_comparisons.py

src/imitation/util/logger.py

AdamGleave

Left some minor comments, but overall looks good. Only did a shallow review. Let me know if there's anything specific I should pay closer attention to.

src/imitation/algorithms/preference_comparisons.py

AdamGleave · 2022-09-14T02:07:34Z

src/imitation/scripts/train_preference_comparisons.py

@@ -133,6 +134,9 @@ def train_preference_comparisons(
            be allocated to each iteration. "hyperbolic" and "inverse_quadratic"
            apportion fewer queries to later iterations when the policy is assumed
            to be better and more stable.
+        share_training_steps_among_agents: If True (default), when training with


Having default behavior preserve total training timesteps budget sems good to me. Integer division // vs something more clever that handles remainders seems unimportant -- we usually have hundreds of thousands of timesteps and single-digit numbers of agents.

AdamGleave · 2022-09-14T02:08:20Z

src/imitation/scripts/train_preference_comparisons.py

+                    relabel_reward_fn=relabel_reward_fn,
+                )
+            else:
+                agent = rl_common.load_rl_algo_from_path(


There'll be hilariously little diversity between agents in this case, but not much we can do there. (Support loading different agents I guess? But that seems overkill for what's a rare use case.)

AdamGleave · 2022-09-14T02:14:03Z

src/imitation/scripts/train_preference_comparisons.py

        else:
+            single_agent = False


Not single agent in the sense that it's... zero agent? This is a bit counterintuitive.

AdamGleave · 2022-09-14T02:14:15Z

src/imitation/scripts/train_preference_comparisons.py

@@ -251,7 +281,7 @@ def save_callback(iteration_num):
                        "checkpoints",
                        f"{iteration_num:04d}",
                    ),
-                    allow_save_policy=bool(trajectory_path is None),
+                    allow_save_policy=single_agent,


We can't save checkpoints if there's multiple agents? That's a little sad.

* Skeleton of regularization techniques * Added loss and weight regularizer, update_param fn protocol * Fix type error * Added logging * Renamed reg to regularization, passed logger down * Fixes linting issues * Silly linting error only visible on CircleCI * Update src/imitation/regularization/__init__.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/regularization/__init__.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/regularization/__init__.py Co-authored-by: Adam Gleave <adam@gleave.me> * Make regularizer initialization control flow more readable * Typing improvements * Multiple fixes re: types and assertions * Improved input validation * Add support for Lp norms, remove singleton L1/L2 implementations * Add support for arbitrary regularizers, improve input validation * Typing and linting * Fix silly errata causing test to fail * Add missing docstring args * Restructured folder into submodules and improved typing by adding generics * Updated imports after restructuring folder * Added tests for updaters.py * Added tests for regularizers.py, except weight decay * Fixed tests for lp regularization * Added tests for weight decay * Linting / formatting * Linting / typing * Final tests to improve code coverage * Tweaks for code coverage * Tweaks for code coverage v2 * Fix logging issues * Formatting * Formatting (docstring) * Fix file open issue * Linting * Remove useless conversion to float * Replace assert with ValueError * Check for lambda being negative in the scaler * Guard against losses being negative in interval param scaler * Split tests up to cover more cases * Clean up repetitive code for readability * Remove old TODO message * Merge RewardTrainer seeds into one * Fix interval param tests to new error messages * Move regularization input validation to factory class * Remake the regularizer factory for better API design * Fix bugs and tests in new factory design * Added docstring to regularizer factory. * Update src/imitation/regularization/regularizers.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/regularization/regularizers.py Co-authored-by: Adam Gleave <adam@gleave.me> * Add todo to refactor once #529 is merged. * Rename regularize to regularize_and_backward * Fix bug in tests and docstrings * Rename mode to prefix * Added exceptions to docstrings * Make type ignore only specific to pytype * Add verbatim double-`` to some docstrings * Change phrasing in docstring Co-authored-by: Adam Gleave <adam@gleave.me>

Co-authored-by: Adam Gleave <adam@gleave.me>

* Add support for prefix context manager in logger (from #529) * Added back accidentally removed code * Replaced preference comparisons prefix with ctx manager * Fixed errors * Docstring fixes * Address PR comments * Point SB3 to master to include bug fix * Format / fix tests for context manager * Switch to sb3 1.6.1 * Formatting * Remove comment

#534) * Initial mypy configuration * Fix types: test_envs.py * Fix types: conftest.py * Fix types: tests/util * Fix types: tests/scripts * Fix types: tests/rewards * Fix types: tests/policies * Incorrect decorator in update_stats method form networks.py::BaseNorm * Fix types: tests/algorithms (adersarial and bc) * Fix types: tests/algorithms (dagger and pc) * Fix types: tests/data * Linting * Linting * Fix types: algorithms/preference_comparisons.py * Fix types: algorithms/mce_irl.py * Formatting, fixed minor bug * Clarify why types are ignored * Started fixing types on algorithms/density.py * Linting * Linting (add back type ignore after reformatting) * Fixed types: imitation/data/types.py * Fixed types (started): imitation/data/ * Fixed types: imitation/data/buffer.py * Fixed bug in buffer.py * Fixed types: imitation/data/rollout.py * Fixed types: imitation/data/wrappers.py * Improve makefile to support automatic cache cleaning * Fixed types: imitation/testing/ * Linting, fixed wrong return type in rewards.predict_processed_all * Fixed types: imitation/policies/ * Formatting * Fixed types: imitation/rewards/ * Fixed types: imitation/rewards/ * Fixed types: imitation/scripts/ * Fixed types: imitation/util/ and formatting * Linting and formatting * Bug fixes for test errors * Linting and typing * Improve typing in algorithms * Formatting * Bug fix * Formatting * Fixes suggested by Adam. * Fix mypy version. * Update TabularPolicy.predict to match base class * Fix not checking for dones * Change for loop to dict comprehension * Remove is_ensemble to clear up type checking errors * Reduce code duplication and general cleanup * Fix type annotation of step_dict * Change List to Sequence * Fix density.py::DensityAlgorithm._set_demo_from_batch * Fixed n_steps (OnPolicyAlgorithm) * Fix errors in tests * Include some suggestions into rollout.py and preference_comparisons.py * Formatting * Fix setter error as per python/mypy#5936 * add reason for assertion. * Fix style guide violation: https://google.github.io/styleguide/pyguide.html#22-imports * Update src/imitation/scripts/parallel.py Co-authored-by: Adam Gleave <adam@gleave.me> * Move kwargs to the end. * Swap order of expert_policy_type and expert_policy_path validation check * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update tests/rewards/test_reward_fn.py Co-authored-by: Adam Gleave <adam@gleave.me> * Explicit random state setting and fix corresponding tests (except notebooks, sacred config, scripts) * Fix notebooks; add script to clean notebooks * Fix all tests. * Formattting. * Additional fixes * Linting * Remove automatically generated `_api` docs files too on `make clean` * Fix docstrings. * Fix issue with next(iter(iterable)) * Formatting * Remove whitespace * Add TODO message to remove type ignore later * Remove unnecessary assertion. * Fixed types in density.py set_demonstrations * Added type ignore to pytype bug * Fix_get_first_iter_element and add tests * Bugfix in BC and tests -- masked as previously iterator ran out too early! * Remove makefile for now * Added link to SB3 issue for future reference. * Fix types of train_imitation Only return "expert_stats" if all trajectories have reward. * Modify assert in test_bc to reflect correct type * Add ci/clean_notebooks.py to CI checks * Improve clean_notebooks.py by allowing checking only mode. * Add ipynb notebook checks to CI * Add support for explicit files for notebook cleaning * Clean notebooks * Small improvements in util.py * Replace TransitionKind with TransitionsMinimal * Delete unused statement in test * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Make type ignore specific to pytype * Linting * Migrate from RandomState (deprecated) to Generator * Add backticks to error message * Create "AnyNorm" alias * Small fix * Add additional checks to shapes in _set_demo_from_batch * Fix RolloutStatsComputer type * Improved logging/messages in clean_notebooks.py * Fix issues resulting from merge * Bug fix * Bug fix (wasn't really fixed before) * Fixed docs example of BC * Fix bugs resulting from merge * Fix docs (dagger.rst) caught by sphinx CI * Add mypy to CI * Continue fixing miscellaneous type errors * Linting * Fix issue with normalize_input_layer type * Add support for checking presence of generic type ignores * Allow subdirectories in notebook clean * Add full typing support for TransitionsMinimal as a sequence * Fix types for density.py * Misc fixes * Add support for prefix context manager in logger (from #529) * Added back accidentally removed code * Replaced preference comparisons prefix with ctx manager * Fixed errors * Bug fixes * Docstring fixes * Fix bug in serialize.py * Fixed codecheck by pointing notebook checks to docs * Add rng to mce_irl.rst (doctest) * Add rng to density.rst (doctest) * Fix remaining rst files * Increase sample size to reduce flakiness * Ignore files not passing mypy for now * Comment in wrong line * Comment in wrong line * Move excluded files to argument * Add quotes to mypy arg call * Fix CI mypy call * Fix CI yaml * Break ignored files up into one line each * Address PR comments * Point SB3 to master to include bug fix * Do not follow imports for ignored files * Format / fix tests for context manager * Switch to sb3 1.6.1 * Formatting * Remove unused import * Remove unused fixture * Add coveragerc file * Add utils test * Add tests and asserts * Add test to synthetic gatherer * Add trajectory unwrap tests * Formatting * Remove bracket typo * Fix .coveragerc instruction * Improve density algo coverage and bug fixes * Fix bug in test * Add pragma no cover updates * Minor coverage tweaks * Fix iterator test * Update ci/check_typeignore.py Co-authored-by: Adam Gleave <adam@gleave.me> * DRY clean_notebooks.py * Minor tweak in check_typeignore.py * Added all checks to CI * Move imports to top-level * Move main to main() function in script * Minor fixes * Remove tweak after new SB3 release * Split main() into helper functions. * Fix edge case of n=0 in seed generator * Update src/imitation/scripts/common/rl.py Co-authored-by: Adam Gleave <adam@gleave.me> * Fix general type ignore in src * Fix type ignore errors * Formatting * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/scripts/common/rl.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Replace todo with todo+issue link * Add explicit type ignore arg * Add excluded files to code_checks.sh * Unbreak line * Misc fixes * Add training to the density algorithm * Fix no attribute error * Type ignore to `with raises` test * Remove unused import * Check typeignore for all SRC files * Clean notebooks * Remove unused import * Ignore the file itself from typeignore check * Add exception to docstring * Fix bad naming for clean_notebooks Co-authored-by: Adam Gleave <adam@gleave.me>

…570)" This reverts commit a738955.

…#529) (#570)"" This reverts commit 849f7e7.

* Revert "Add support for prefix context manager in logger (from #529) (#570)" This reverts commit a738955. * Revert "Revert "Add support for prefix context manager in logger (from #529) (#570)"" This reverts commit 849f7e7. * Fix format of error string * Put the f in the right place

* Initial mypy configuration * Initial change to get the PR up * Initial review at replacing os.path * Bug fixes from tests * Fix types: test_envs.py * Fix types: conftest.py * Fix types: tests/util * Fix types: tests/scripts * Fix types: tests/rewards * Fix types: tests/policies * Incorrect decorator in update_stats method form networks.py::BaseNorm * Fix types: tests/algorithms (adersarial and bc) * Fix types: tests/algorithms (dagger and pc) * Fix types: tests/data * Linting * Linting * Fix types: algorithms/preference_comparisons.py * Fix types: algorithms/mce_irl.py * Formatting, fixed minor bug * Clarify why types are ignored * Started fixing types on algorithms/density.py * Linting * Linting (add back type ignore after reformatting) * Fixed types: imitation/data/types.py * Fixed types (started): imitation/data/ * Fixed types: imitation/data/buffer.py * Fixed bug in buffer.py * Fixed types: imitation/data/rollout.py * Fixed types: imitation/data/wrappers.py * Improve makefile to support automatic cache cleaning * Fixed types: imitation/testing/ * Linting, fixed wrong return type in rewards.predict_processed_all * Fixed types: imitation/policies/ * Formatting * Fixed types: imitation/rewards/ * Fixed types: imitation/rewards/ * Fixed types: imitation/scripts/ * Fixed types: imitation/util/ and formatting * Linting and formatting * Bug fixes for test errors * Linting and typing * Improve typing in algorithms * Formatting * Bug fix * Formatting * Fixes suggested by Adam. * Fix mypy version. * Fix bugs * Remove unused imports * Formatting * Added parse_path func and refactored code to use it * Fix typing, linting * Update TabularPolicy.predict to match base class * Fix not checking for dones * Change for loop to dict comprehension * Remove is_ensemble to clear up type checking errors * Reduce code duplication and general cleanup * Fix type annotation of step_dict * Change List to Sequence * Fix density.py::DensityAlgorithm._set_demo_from_batch * Fixed n_steps (OnPolicyAlgorithm) * Fix errors in tests * Include some suggestions into rollout.py and preference_comparisons.py * Formatting * Fix setter error as per python/mypy#5936 * add reason for assertion. * Fix style guide violation: https://google.github.io/styleguide/pyguide.html#22-imports * Update src/imitation/scripts/parallel.py Co-authored-by: Adam Gleave <adam@gleave.me> * Move kwargs to the end. * Swap order of expert_policy_type and expert_policy_path validation check * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update tests/rewards/test_reward_fn.py Co-authored-by: Adam Gleave <adam@gleave.me> * Explicit random state setting and fix corresponding tests (except notebooks, sacred config, scripts) * Fix notebooks; add script to clean notebooks * Fix all tests. * Formattting. * Additional fixes * Linting * Remove automatically generated `_api` docs files too on `make clean` * Fix docstrings. * Fix issue with next(iter(iterable)) * Formatting * Remove whitespace * Add TODO message to remove type ignore later * Remove unnecessary assertion. * Fixed types in density.py set_demonstrations * Added type ignore to pytype bug * Fix_get_first_iter_element and add tests * Bugfix in BC and tests -- masked as previously iterator ran out too early! * Remove makefile for now * Added link to SB3 issue for future reference. * Fix types of train_imitation Only return "expert_stats" if all trajectories have reward. * Modify assert in test_bc to reflect correct type * Add ci/clean_notebooks.py to CI checks * Improve clean_notebooks.py by allowing checking only mode. * Add ipynb notebook checks to CI * Add support for explicit files for notebook cleaning * Clean notebooks * Small improvements in util.py * Replace TransitionKind with TransitionsMinimal * Delete unused statement in test * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/util/util.py Co-authored-by: Adam Gleave <adam@gleave.me> * Make type ignore specific to pytype * Linting * Migrate from RandomState (deprecated) to Generator * Add backticks to error message * Create "AnyNorm" alias * Small fix * Add additional checks to shapes in _set_demo_from_batch * Fix RolloutStatsComputer type * Improved logging/messages in clean_notebooks.py * Fix issues resulting from merge * Bug fix * Bug fix (wasn't really fixed before) * Fixed docs example of BC * Fix bugs resulting from merge * Fix docs (dagger.rst) caught by sphinx CI * Add mypy to CI * Continue fixing miscellaneous type errors * Linting * Fix issue with normalize_input_layer type * Add support for checking presence of generic type ignores * Allow subdirectories in notebook clean * Add full typing support for TransitionsMinimal as a sequence * Fix types for density.py * Misc fixes * Add support for prefix context manager in logger (from #529) * Added back accidentally removed code * Replaced preference comparisons prefix with ctx manager * Fixed errors * Bug fixes * Docstring fixes * Fix bug in serialize.py * Fixed codecheck by pointing notebook checks to docs * Add rng to mce_irl.rst (doctest) * Add rng to density.rst (doctest) * Fix remaining rst files * Increase sample size to reduce flakiness * Ignore files not passing mypy for now * Comment in wrong line * Comment in wrong line * Move excluded files to argument * Add quotes to mypy arg call * Fix CI mypy call * Fix CI yaml * Break ignored files up into one line each * Address PR comments * Point SB3 to master to include bug fix * Small bug fixes * Small bug fixes * Sort import * Linting * Do not follow imports for ignored files * Fix tests for context managers * Format / fix tests for context manager * Switch to sb3 1.6.1 * Formatting * Upgrade Python version in Windows CI * Remove unused import * Remove unused fixture * Add coveragerc file * Add utils test * Add tests and asserts * Add test to synthetic gatherer * Add trajectory unwrap tests * Formatting * Remove bracket typo * Fix .coveragerc instruction * Improve density algo coverage and bug fixes * Fix bug in test * Add pragma no cover updates * Minor coverage tweaks * Fix iterator test * Add test for parse_path * Updates on sacred util * Mark type ignore rule * Mark type ignore rule * Miscellaneous bug fixes and improvements * Reformat hanging line * Ignore parse path checks for windows * Add trailing comma * Minor changes * No newline end of file * Update src/imitation/data/types.py Co-authored-by: Adam Gleave <adam@gleave.me> * Update src/imitation/data/types.py Co-authored-by: Adam Gleave <adam@gleave.me> * Include suggestions from Adam Co-authored-by: Adam Gleave <adam@gleave.me>

Rocamonde · 2022-10-25T16:03:57Z

What's the status of this? I borrowed from this PR partially in other PRs that have been merged to avoid blockage / reinventing the wheel, so it probably requires some re-writing.

AdamGleave · 2022-10-25T19:53:08Z

@levmckinney, what's outstanding here? I know you're busy now, but I can likely find someone else to finish the PR off.

However, IIRC the policy ensemble didn't help performance much, so perhaps it's served its purpose (of running those experiments) and we should just cherry-pick any helpful features from it then close this PR.

levmckinney · 2022-11-18T02:27:56Z

@levmckinney, what's outstanding here? I know you're busy now, but I can likely find someone else to finish the PR off.

However, IIRC the policy ensemble didn't help performance much, so perhaps it's served its purpose (of running those experiments) and we should just cherry-pick any helpful features from it then close this PR.

I think we can close this. The main thing we might want to cherry-pick out is the changes to the hierarchical logger. However, that only really matters if we are planning to move forward with this logging framework.

levmckinney added 4 commits August 8, 2022 21:35

implemented MixtureOfTrajectoryGeneratos

a7eda89

added add_prefix method to HierarchicalLogger

1a9ab5d

moved logger.accumulate_means into AgentTrainer

d442baa

added option for multiple agents to train_preference_comparison

003bbcd

levmckinney requested a review from yawen-d August 17, 2022 05:24

levmckinney added 5 commits August 18, 2022 10:25

reduced the length of the prefix used in mixture of generators for lo…

f6a3d44

…gging

Merge branch 'master' into policy_ensemble

33110aa

improved logic for whether to try and checkpoint policy

8c5433d

fixed final save checkpoint call

83f8b8b

fixed log key when using prefix on windows

803ffde

levmckinney marked this pull request as ready for review August 18, 2022 21:42

levmckinney requested a review from AdamGleave August 20, 2022 18:36

AdamGleave requested a review from Rocamonde August 21, 2022 03:54

Rocamonde reviewed Aug 22, 2022

View reviewed changes

clarified doc string and added runtime error

21e11fe

levmckinney mentioned this pull request Aug 22, 2022

Switch to using pathlib #531

Closed

levmckinney added 3 commits August 22, 2022 15:56

responded to reviewers comments

5c9fc1e

fixed logic bug

e478ec3

fixed test

75c0b80

levmckinney added 2 commits August 22, 2022 21:43

added pragma: no cover to test case which needs it

73ace03

added doctest to logger explaining behavour

b0b3b93

levmckinney requested a review from Rocamonde August 24, 2022 03:58

levmckinney added 2 commits August 23, 2022 22:13

Merge branch 'master' into policy_ensemble

dc60be8

added option to split training steps among the agents and made it def…

2958d08

…ault to true

levmckinney commented Aug 24, 2022

View reviewed changes

src/imitation/algorithms/preference_comparisons.py Show resolved Hide resolved

Rocamonde and others added 8 commits September 3, 2022 18:53

Formatter

07fa830

Fix "else" that was accidentally removed.

dd72656

Roll back change

709cb42

Added description to tests

06084ad

Unnested with statements

a2b790f

removed duplicate sentence in documentation

510abd8

Apply suggestions from code review

232b56e

Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Juan Rocamonde <juancarlosrocamonde@gmail.com>

Merge branch 'master' into policy_ensemble

09670d3

levmckinney requested a review from Rocamonde September 7, 2022 23:35

Rocamonde approved these changes Sep 8, 2022

View reviewed changes

src/imitation/util/logger.py Show resolved Hide resolved

src/imitation/scripts/train_preference_comparisons.py Show resolved Hide resolved

taufeeque9 reviewed Sep 12, 2022

View reviewed changes

src/imitation/util/logger.py Outdated Show resolved Hide resolved

levmckinney added 2 commits September 12, 2022 23:08

Merge branch 'master' into policy_ensemble

a5ab914

fixed comment in doctest

408e248

Rocamonde added a commit that referenced this pull request Sep 13, 2022

Add todo to refactor once #529 is merged.

b6cb2cc

AdamGleave approved these changes Sep 14, 2022

View reviewed changes

Update src/imitation/algorithms/preference_comparisons.py

4df8551

Co-authored-by: Adam Gleave <adam@gleave.me>

taufeeque9 mentioned this pull request Sep 15, 2022

Improve bagging method for ensemble training (#530) #542

Merged

Rocamonde added a commit that referenced this pull request Sep 26, 2022

Add support for prefix context manager in logger (from #529)

e56c0e3

Rocamonde mentioned this pull request Sep 26, 2022

Add support for prefix context manager in logger (from #529) #570

Merged

dfilan added a commit that referenced this pull request Oct 10, 2022

Revert "Add support for prefix context manager in logger (from #529) (#…

849f7e7

…570)" This reverts commit a738955.

dfilan added a commit that referenced this pull request Oct 10, 2022

Revert "Revert "Add support for prefix context manager in logger (from …

cc4c69b

…#529) (#570)"" This reverts commit 849f7e7.

AdamGleave closed this Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented the ability to train rewards in preference comparison against multiple policies #529

Implemented the ability to train rewards in preference comparison against multiple policies #529

levmckinney commented Aug 17, 2022 •

edited

Loading

codecov bot commented Aug 18, 2022 •

edited

Loading

AdamGleave commented Aug 21, 2022

Rocamonde commented Aug 22, 2022

Rocamonde left a comment

Rocamonde Aug 22, 2022

Rocamonde Aug 22, 2022

levmckinney Aug 22, 2022

levmckinney Aug 24, 2022

AdamGleave commented Aug 23, 2022

levmckinney commented Aug 24, 2022

Rocamonde left a comment

AdamGleave left a comment

AdamGleave Sep 14, 2022

AdamGleave Sep 14, 2022

AdamGleave Sep 14, 2022

AdamGleave Sep 14, 2022

Rocamonde commented Oct 25, 2022

AdamGleave commented Oct 25, 2022

levmckinney commented Nov 18, 2022

Implemented the ability to train rewards in preference comparison against multiple policies #529

Implemented the ability to train rewards in preference comparison against multiple policies #529

Conversation

levmckinney commented Aug 17, 2022 • edited Loading

Description

Testing

codecov bot commented Aug 18, 2022 • edited Loading

Codecov Report

AdamGleave commented Aug 21, 2022

Rocamonde commented Aug 22, 2022

Rocamonde left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdamGleave commented Aug 23, 2022

levmckinney commented Aug 24, 2022

Rocamonde left a comment

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rocamonde commented Oct 25, 2022

AdamGleave commented Oct 25, 2022

levmckinney commented Nov 18, 2022

levmckinney commented Aug 17, 2022 •

edited

Loading

codecov bot commented Aug 18, 2022 •

edited

Loading