Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sleep after update manifests to avoid sibling explosion #959

Merged
merged 2 commits into from
Aug 28, 2014

Conversation

shino
Copy link
Contributor

@shino shino commented Aug 22, 2014

Another candidate solution #882 of based on the @kuenishi 's idea (see also #905).

Idea is simple:
sleep some interval after get-modify-update manifests depending on #(siblings)

knobs

  • manifest_siblings_bp_threashold (default 5)

    threshold to trigger backpressure by sleep

  • manifest_siblings_bp_coefficient (default 200 [msec])

    Coefficient of sleep interval, which is manifest_siblings_bp_coefficient * #(siblings)

  • manifest_siblings_bp_max_sleep (default 30*1000 [msec] = 30sec)

    Max sleep interval. Mean sleep interval is determined by minimum of this value and the above.

Actual sleep interval is uniformly randomized between
(0.5 * mean sleep interval, 1.5 * mean sleep interval).

@shino shino added this to the 1.5.1 milestone Aug 22, 2014
@shino shino force-pushed the feature/mp-backpressure-simple-sleep branch from 90b6e73 to 075af4f Compare August 22, 2014 06:28
@shino
Copy link
Contributor Author

shino commented Aug 27, 2014

Did some measurement.

Details are in [1] and [2]

Setup:

  • 4-node riak cluster + 1 riak-cs

  • gof3r - riak-cs - riak in single box,

    other 3 riak nodes are other 3 boxes

  • Scope: investigate behaivior up to 80 or 160 concurrency

  • Upload 2GB file by MP, part size 5MB = 5242880 bytes

  • S3 client is gof3r (in s3gof3r), with patch to change timeout from 5sec to 300sec

  • target branches: feature/mp-backpressure and feature/mp-backpressure-simple-sleep

Summary:

  • Both feature/mp-backpressure and feature/mp-backpressure-simple-sleep are
    better behaivior (=smaller siblings) above concurrency 40, than release/1.5.
  • feature/mp-backpressure-simple-sleep is slightly better (shorter) elapsed time
    than feature/mp-backpressure at concurrency 80.
  • feature/mp-backpressure, which has retry logic, managed to retry even
    after client already closed the connection. It leads unnecessary load.
  • feature/mp-backpressure-simple-sleep with manifest_siblings_bp_max_sleep
    changed to 30 sec suppress numbers of siblings at concurrency 160.
    Although, it leads to 50+ siblings.

[1] https://gist.github.com/shino/2c4079d4e489ccd57ab2
[2] https://gist.github.com/shino/4617a628408537ef6a98
[3] https://github.com/rlmcpherson/s3gof3r/tree/master/gof3r

@shino
Copy link
Contributor Author

shino commented Aug 27, 2014

Extended max sleep interval at f3cbbd9, updated the description of this PR.

@kuenishi
Copy link
Contributor

Note that s3gof3r by default has 5 sec retry, which leads to sibling explosion anyway. This patch expects ther clients will retry within longer interval, say, > 30~60 sec.

There is a tradeoff between number of concurrent part uploads, its retry interval. Choose carefully the knobs listed above.

borshop added a commit that referenced this pull request Aug 28, 2014
Add sleep after update manifests to avoid sibling explosion

Reviewed-by: kuenishi
@kuenishi
Copy link
Contributor

@borshop merge

@borshop borshop merged commit f3cbbd9 into release/1.5 Aug 28, 2014
@kuenishi kuenishi deleted the feature/mp-backpressure-simple-sleep branch August 28, 2014 16:48
@shino
Copy link
Contributor Author

shino commented Oct 24, 2014

Memo to self:

To disable the backpressure sleep completely:

> application:set_env(riak_cs, manifest_siblings_bp_threashold, infinity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants