Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal usage of multiple segment cache locations #7641

Open
dclim opened this issue May 11, 2019 · 2 comments
Open

Suboptimal usage of multiple segment cache locations #7641

dclim opened this issue May 11, 2019 · 2 comments

Comments

@dclim
Copy link
Contributor

dclim commented May 11, 2019

Affected Version

0.14.0-incubating

Description

There seems to be room for improvement in adding logic to distribute segments among multiple segment cache locations which could speed up overall I/O if the locations are backed by different physical drives. Right now, the algorithm seems to pick one location, fills it up to capacity, and then moves on to the next one.

An example test configuration:

druid.server.maxSize=35000000000
druid.segmentCache.locations=[{"path"\:"/mnt/var/druid/segment-cache","maxSize"\:"5000000000"}, {"path"\:"/mnt1/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt2/var/druid/segment-cache","maxSize"\:"10000000000"}, {"path"\:"/mnt3/var/druid/segment-cache","maxSize"\:"10000000000"}]

Some point-in-time snapshots of disk usage:

t=1
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  3.9G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T   77M  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=2
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  952M  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=3
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  5.4G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T   77M  1.7T   1% /mnt3

t=4
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  937M  1.7T   1% /mnt3

t=5
/dev/nvme0n1    1.7T   79M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  5.1G  1.7T   1% /mnt3

t=6
/dev/nvme0n1    1.7T  859M  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.2G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  9.2G  1.7T   1% /mnt3

t=7 (steady state, all locations are 'full')
/dev/nvme0n1    1.7T  4.6G  1.7T   1% /mnt
/dev/nvme1n1    1.7T  9.3G  1.7T   1% /mnt1
/dev/nvme2n1    1.7T  9.2G  1.7T   1% /mnt2
/dev/nvme3n1    1.7T  9.2G  1.7T   1% /mnt3

Additionally, I'm seeing some misleading WARN level logs when it attempts to use a location that is at capacity and fails, after which it moves onto the next location:

2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt1/var/druid/segment-cache:268,514,257]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 WARN [ZkCoordinator] org.apache.druid.segment.loading.StorageLocation - Segment[tpch_lineitem_1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z_2018-04-03T22:29:17.915Z_1:548,961,952] too large for storage[/mnt2/var/druid/segment-cache:237,382,486]. Check your druid.segmentCache.locations maxSize param
2019-05-11T06:45:07,615 INFO [ZkCoordinator] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://imply-awstest-druid/a336020b-9c30-4ecd-bd15-4cd4433998c3/segments/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1/index.zip] to outDir[/mnt3/var/druid/segment-cache/tpch_lineitem/1997-03-01T00:00:00.000Z_1997-04-01T00:00:00.000Z/2018-04-03T22:29:17.915Z/1]
@sashidhar
Copy link
Contributor

@dclim , I would like to work on this. I'll revert back with my understanding before making any changes.

@sashidhar
Copy link
Contributor

sashidhar commented Jul 1, 2019

@dclim

  1. I want to understand if the proposed algorithm has any side effects like data locality. With the current algorithm all the segments of a particular interval might land up in the same location (correct me). With the new algorithm the segments of the same interval will be distributed among multiple locations. Does this have any effect on the query performance ?

  2. SegmentLoaderLocalCacheManager.loadSegmentWithRetry(..)

private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws 
SegmentLoadingException
  {
    for (StorageLocation loc : **locations**) {
      if (loc.canHandle(segment)) {
        File storageDir = new File(loc.getPath(), storageDirStr);
        try {
          loadInLocationWithStartMarker(segment, storageDir);
          return loc;
        }
        catch (SegmentLoadingException e) {  ....  }
      }
    }
    throw new SegmentLoadingException("Failed to load segment %s in all locations.", segment.getId());
  }

Whenever loadSegmentWithRetry() is called, the for loop above picks the same location (first from the list initially) until the location's capacity is exhausted. Once a location's capacity is exhausted it picks another.

The implementation of the proposed algorithm will look something like below,

      ....
private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws 
SegmentLoadingException
  {
      StorageLocation nextLocation = **getNextLocation();**
      if (nextLocation.canHandle(segment)) {
           // load segment
           return nextLocation;
      }
      ...
      ...
}

private StorageLocation getNextLocation() {
 **// Loop through the **locations** in a round robin fashion.** 
    return nextLocation;
}

sashidhar pushed a commit to sashidhar/incubator-druid that referenced this issue Jul 6, 2019
…egments to multiple segment cache locations
dclim pushed a commit that referenced this issue Sep 28, 2019
* #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations

* Fixing indentation

* WIP

* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy

* fixing code style

* Fixing test

* Adding a method visible only for testing, fixing tests

* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes

* fixing the conditional statement

* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy

* to trigger CI build

* Add documentation for the selection strategy configuration

* to re trigger CI build

* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes

* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail

* Implementing review comments. Added tests for StorageLocationSelectorStrategy

* Checkstyle fixes

* Adding java doc comments for StorageLocationSelectorStrategy interface

* checkstyle

* empty commit to retrigger build

* Empty commit

* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file

* Impl review comments including updating docs as suggested

* Removing checkLocationConfigForNull(), @notempty annotation serves the purpose

* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy

* Fixing the round robin iterator

* Removed numLocationsToTry, updated java docs

* changing property attribute value from tier to type

* Fixing assert messages
Fokko pushed a commit to Fokko/druid that referenced this issue Sep 30, 2019
* apache#7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations

* Fixing indentation

* WIP

* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy

* fixing code style

* Fixing test

* Adding a method visible only for testing, fixing tests

* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes

* fixing the conditional statement

* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy

* to trigger CI build

* Add documentation for the selection strategy configuration

* to re trigger CI build

* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes

* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail

* Implementing review comments. Added tests for StorageLocationSelectorStrategy

* Checkstyle fixes

* Adding java doc comments for StorageLocationSelectorStrategy interface

* checkstyle

* empty commit to retrigger build

* Empty commit

* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file

* Impl review comments including updating docs as suggested

* Removing checkLocationConfigForNull(), @notempty annotation serves the purpose

* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy

* Fixing the round robin iterator

* Removed numLocationsToTry, updated java docs

* changing property attribute value from tier to type

* Fixing assert messages
gianm pushed a commit to implydata/druid-public that referenced this issue Oct 1, 2019
* apache#7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations

* Fixing indentation

* WIP

* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy

* fixing code style

* Fixing test

* Adding a method visible only for testing, fixing tests

* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes

* fixing the conditional statement

* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy

* to trigger CI build

* Add documentation for the selection strategy configuration

* to re trigger CI build

* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes

* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail

* Implementing review comments. Added tests for StorageLocationSelectorStrategy

* Checkstyle fixes

* Adding java doc comments for StorageLocationSelectorStrategy interface

* checkstyle

* empty commit to retrigger build

* Empty commit

* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file

* Impl review comments including updating docs as suggested

* Removing checkLocationConfigForNull(), @notempty annotation serves the purpose

* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy

* Fixing the round robin iterator

* Removed numLocationsToTry, updated java docs

* changing property attribute value from tier to type

* Fixing assert messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants