Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Batch disk operations #7895

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

hickeng
Copy link
Member

@hickeng hickeng commented May 4, 2018

This adds disk batching to the endpointVM so operations such as volume
filesystem creation can occur in parallel.
It also adjusts the volume cache to remove a lock that seralizes all
creates. This is NOT complete as of this commit - it requires singleflight
or similar wrapping the volume create path, and also consideration as to
the other inspect/delete operations as they apply to a partially completed
volume.

This should also allow an operation to specify if it can tolerate some
latency. The desire behind this is so that disk detach operations for pull
can be deferred and batched with the attach operation for the subsequent
child disk which should halve the number of reconfigure operations
associated with a pull request.

This has additional modifications to the docker personality processing for
multiple anonymous volumes to parallelize the volume creation.

This means that most concurrent volume operations batch, reducing
container creation time by ~50% with 10 anonymous volumes in my tests.

Related: #5677
Fixes: #7929

@hickeng hickeng changed the title WIP: Batch disk operations WIP: Batch disk operations [skip ci] May 4, 2018
@hickeng hickeng requested a review from a team as a code owner June 16, 2018 00:38
@@ -98,6 +99,22 @@ func main() {
extraconfig.SetLogLevel(log.DebugLevel)
}

if vchConfig.Diagnostics.DebugLevel > 4 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here solely to allow capture of the raw soap requests issued to vSphere.
This may be a useful capability to keep, but likely needs tidying up somewhat.

@zjs
Copy link
Member

zjs commented Sep 10, 2018

The disk-specific portions of this should be preserved, but the batching logic should be replaced with the general-purpose batching from #8034 .

@zjs zjs added kind/defect/performance Behavior that is functionally correct, but performs worse than intended source/customer Reported by a customer, directly or via an intermediary labels Sep 10, 2018
This adds disk batching to the endpointVM so operations such as volume
filesystem creation can occur in parallel.
It also adjusts the volume cache to remove a lock that seralizes all
creates. This is NOT complete as of this commit - it requires singleflight
or similar wrapping the volume create path, and also consideration as to
the other inspect/delete operations as they apply to a partially completed
volume.

This should also allow an operation to specify if it can tolerate some
latency. The desire behind this is so that disk detach operations for pull
can be deferred and batched with the attach operation for the subsequent
child disk which should halve the number of reconfigure operations
associated with a pull request.
This adds singleflight for device listing to avoid large numbers of
duplicate calls after having batched disk reconfigure operations.

This also adds debug logic for capturing the govmomi soap output at debug
> 4
This was a minor change to resolve a compile error from failure
to reference the `shared` return value from singleflight.

I don't know if that's actually needed, but adjusted to preserve
both the TODO comment and a separate return path based on shared
vs unique result.

Corrects a missing package name in a type reference, and a spelling
mistake to allow for compilation.

A basic test at this point shows we can pull images, create
containers with volumes attached, and parallel creation of volumes
(via multiple concurrent `docker volume create` commands) shows
batching of reconfigure operations occurring.
@hickeng hickeng force-pushed the disk-batching branch 2 times, most recently from 15ba6ec to 41de10a Compare December 3, 2022 00:05
Observed an outofspace question causing a stun of the VCH until
answering the question via the UI. In this case the question was
msg.hbacommon.outofspace

That has NOT been fully resolved. The following steps have been
taken:
1. switch to retry on transient errors in vm.reconfigure in
   disk manager
2. added Retry answer to the question so the operation responds
   in a predictable manner instead of stunning the VCH
   indefinitely.

Additionally async error handlers have been added on some disk
management paths to allow for increased parallelization of
attach/detact.
@hickeng hickeng changed the title WIP: Batch disk operations [skip ci] WIP: Batch disk operations Dec 3, 2022
Prior test was unnecessarily complex and resulting in
a non-useful message in error case

The automation token can be added into drone using the drone cli:
* drone secret add
or
* login to the drone webui, go to profile->secrets, and update via
the box in the top right

https://ci-vic.vmware.com/vmware/vic/settings/secrets at time of
writing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage Storage-related functionality cla-not-required component/persona/docker component/portlayer/storage kind/defect/performance Behavior that is functionally correct, but performs worse than intended source/customer Reported by a customer, directly or via an intermediary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support concurrent creation of volumes (vmdks)
4 participants