Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open gc by default #18836

Merged
merged 7 commits into from
Aug 2, 2019
Merged

Open gc by default #18836

merged 7 commits into from
Aug 2, 2019

Conversation

sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Jul 26, 2019

Enable GC when PADDLE_ON_INFERENCE is off.

This PR also fixes bug of GC when using conditional_block op.

See CE Debug here.

There are 3 models in CE Debug failed.

[22:30:44][Step 3/3] task: model_cycle_gan
[22:30:44][Step 3/3] passed:  False
[22:30:44][Step 3/3] infos [d_train_cost] pass
[22:30:44][Step 3/3] [g_train_cost] pass
[22:30:44][Step 3/3] [duration] failed, diff ratio: 0.06092244084538961 larger than 0.05.
[22:30:44][Step 3/3] kpis keys ['d_train_cost', 'g_train_cost', 'duration']
[22:30:44][Step 3/3] kpis values [[[2.12036180496]], [[12.1869783401]], [[1.05452570915]]]
[22:30:44][Step 3/3] task: model_icnet
[22:30:44][Step 3/3] passed:  False
[22:30:44][Step 3/3] infos [train_cost] pass
[22:30:44][Step 3/3] [train_duration] failed, diff ratio: 0.04116670226811877 larger than 0.02.
[22:30:44][Step 3/3] kpis keys ['train_cost', 'train_duration']
[22:30:44][Step 3/3] kpis values [[[2.1491]], [[41.469499]]]
[22:30:44][Step 3/3] task: model_ocr_recognition
[22:30:44][Step 3/3] passed:  False
[22:30:44][Step 3/3] infos [train_cost] pass
[22:30:44][Step 3/3] [train_duration] failed, diff ratio: 0.06440406463978306 larger than 0.06.
[22:30:44][Step 3/3] kpis keys ['train_cost', 'train_duration']
[22:30:44][Step 3/3] kpis values [[[58.981466]], [[46.207872]]]
[22:31:02][Step 3/3] model_cycle_gan model_icnet model_ocr_recognition 
  • model_cycle_gan: CE Debug uses legacy CycleGAN code in release/1.1 branch, which uses Executor and there is no cache. In develop branch of models, GC is enabled in CycleGAN model.
  • model_icnet: I find that this model fails randomly because I have tested a PR which does nothing to do with memory optimization. It also fails with 4.48% differences in train duration. See here.
  • model_ocr_recognition: do not know why. The results I tested on my machine is 39.386209s(develop) vs 39.1990187s(This PR). The difference is very small.

I have tried to run CE Debug for the second time. All tasks have passed. Please see here.

@sneaxiy sneaxiy force-pushed the open_gc_by_default branch 4 times, most recently from 4efff0f to 3b239df Compare July 29, 2019 03:45
@sneaxiy sneaxiy force-pushed the open_gc_by_default branch 2 times, most recently from 97bf814 to 10e3d1d Compare July 29, 2019 13:44
@sneaxiy sneaxiy closed this Jul 30, 2019
@sneaxiy sneaxiy reopened this Jul 30, 2019
@sneaxiy sneaxiy changed the title [WIP] Open gc by default Open gc by default Jul 31, 2019
@@ -118,7 +118,7 @@ void GarbageCollector::Add(Container &&objs, Callback &&callback) {

GarbageQueue *garbage_queue = nullptr;
{
std::lock_guard<std::mutex> guard(mutex_);
std::lock_guard<std::mutex> guard(*mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that if the max_memory_size_ is little 1, there doesn't need a lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Copy link
Contributor

@chengduoZH chengduoZH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -46,7 +46,7 @@ class GarbageCollector {

platform::DeviceContext *dev_ctx_;
std::unique_ptr<GarbageQueue> garbages_;
mutable std::mutex mutex_;
mutable std::unique_ptr<std::mutex> mutex_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this checking is unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants