Harbor 2.6 - Jobservice cannot run replicated with RWO PVCs #1320

Tonkari · 2022-10-12T10:21:39Z

Summary

Jobservice can only run as a single pod when the underlying kubernetes cluster does not support ReadWriteMany access mode.

Environment

We are using this chart to deploy harbor on a kubernetes cluster that only supports ReadWriteOnce for persistent volumes. Currently we are running 3 replicas for the jobservice component. We are using the database JobLogger and S3 as storage backend for the registry.

Problem

When installing or updating to chart version 1.10.0 (Harbor 2.6) only one jobservice pod will become available, while the others stay in "ContainerCreating" phase. This happens because they all try to mount the same new scan data export volume. As a workaround we can only reduce the number of replicas to 1, which we have been trying to avoid.
The chart does not allow for any configuration regarding the component. If persistence is enabled, a single scan data export volume will always be created and used for the deployment.

Possible solutions

Separate persistence option for scan data export

Without knowing, whether it's necessary to persist the scan data exports, a possible solution would be to add a separate option to disable persistence for this feature. Instead, exports could be stored in the pod's filesystem - long enough to download them - and disappear when the pod gets terminated. Since I have not really dug in the harbor code, I'm not sure if this is realistic.

Store scan data exports in the database

The exports could be stored in the database. Same as above, I do not know if it's realistic, especially regarding the size of the generated csv files.

External storage support for scan data export

Since the registry already has an S3 bucket to use for storage, the same could be done for the jobservice component. Whether the same bucket with a different path or a different bucket should be used, can be discussed.

Use statefulSet for jobservice

The jobservice deployment could be converted to a statefulSet, and a volumeClaimTemplate could be created for the scan data exports. For the file JobLogger the persistentVolumeClaim could remain the same.

Update docs

If none of the above solutions are desirable, this incompatibility should be mentioned in the docs.

r-ising · 2022-10-14T08:27:58Z

We have the same issue after upgrading to harbor 2.6

niklasweimann · 2022-10-14T08:29:34Z

+1

janosmiko · 2022-10-14T20:02:52Z

+1

Duke1616 · 2022-10-25T09:29:17Z

+1

blufor · 2022-10-26T17:03:44Z

I can confirm this is a valid bug

maxdanilov · 2022-10-27T16:39:27Z

Struggling with the same issue, I went for an approach of choosing HA over scan data export, by manually moving from PVC volume type to emptyDir:

      - name: job-scandata-exports
        persistentVolumeClaim:
          claimName: registry-harbor-jobservice-scandata

to

      - name: job-scandata-exports
        emptyDir: {}

very meh (scan data export does not seem to work at all), but at least jobservice can run in the HA mode.

derekcha · 2022-12-06T07:40:33Z

+1

Vad1mo · 2023-01-05T09:53:26Z

Thanks for the report, I will bring this issue up during the next community meeting, please join.

https://github.com/goharbor/community/wiki/Harbor-Community-Meetings.

chlins · 2023-01-11T08:02:59Z

@Tonkari Hi, there is another volume mounted as RWO mode by default in the jobservice before v2.6,

harbor-helm/values.yaml

Line 234 in 5734a04

accessMode: ReadWriteOnce

so did you change this by switching the job log location to database?

Tonkari · 2023-01-11T10:30:56Z

@chlins Yes, we did. I described our environment in the issue description. The possibility to use the database for the job log was also the reason why I proposed to use the database for the exports as well.
Any solution for the export issue could also be applied to the job log data, so RWO clusters would not have to use the database. Currently it's the only way to get this running replicated.

chlins · 2023-01-14T08:47:59Z

@maxdanilov Hi, what problem did you encounter for scan data export when you changed the volume to emptyDir?

chlins · 2023-01-14T08:54:36Z

@chlins Yes, we did. I described our environment in the issue description. The possibility to use the database for the job log was also the reason why I proposed to use the database for the exports as well. Any solution for the export issue could also be applied to the job log data, so RWO clusters would not have to use the database. Currently it's the only way to get this running replicated.

@Tonkari Did you use Kubernetes in the AWS or Azure, or config any affinity/antiAffinity policies? In my environment, scale up the jobservice replica to 3, 3 pods can all be ready as be scheduled to the same node.

Tonkari · 2023-01-14T11:57:55Z

@chlins Yes, we did specify anti-affinity. Having all Pods run on the same node is not an option for us, because we exchange nodes fairly often. Also it only compensates pod level failues, and node level failures would be met with downtime. We are also running our cluster in multiple availability zones. A volume in a single availability zone does not protect us from zone level failures.

maxdanilov · 2023-01-15T18:31:46Z

@chlins we're running Harbor on GCP, with multiple instances of the jobservice and the anti-affinity for the pods to run on different nodes. If using the standard configuration from the chart, the second replica of jobservice won't start since a persistent volume in GCP can't be mounted to multiple nodes in RW mode.

chlins · 2023-01-16T01:18:34Z

@chlins we're running Harbor on GCP, with multiple instances of the jobservice and the anti-affinity for the pods to run on different nodes. If using the standard configuration from the chart, the second replica of jobservice won't start since a persistent volume in GCP can't be mounted to multiple nodes in RW mode.

Yes, the HA is the problem that this issue should fix, but another question as you mentioned when you change the volume type to emptyDir, the scan data export function not work at all, could you share the details for this problem like jobservice logs?

maxdanilov · 2023-01-16T12:54:17Z

@chlins

It seems to not complain when exporting project CVEs:

2023-01-16T12:51:22Z [INFO] [/jobservice/worker/cworker/c_worker.go:77]: Job incoming: {"name":"SCAN_DATA_EXPORT","id":"bda3478934353bb315599b43","t":1673873482,"args":null}
2023-01-16T12:51:22Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://registry-harbor-core:80/api/v2.0/internalconfig
2023-01-16T12:51:22Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://registry-harbor-core:80/api/v2.0/internalconfig
2023-01-16T12:51:22Z [INFO] [/pkg/scan/export/filter_processor.go:53]: Retrieved user id :6 for user name : <redacted>
2023-01-16T12:51:22Z [INFO] [/pkg/scan/export/filter_processor.go:234]: User <redacted> is not sys admin. Selecting projects with admin roles for export.
2023-01-16T12:51:22Z [INFO] [/pkg/scan/export/filter_processor.go:65]: Selected 1 projects administered by user <redacted>
2023-01-16T12:51:22Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://registry-harbor-core:80/api/v2.0/internalconfig
2023-01-16T12:51:24Z [INFO] [/jobservice/runner/redis.go:152]: Job 'SCAN_DATA_EXPORT:bda3478934353bb315599b43' exit with success

But the files in the Event Log bar on the right are empty when downloaded (despite images in the project having vulnerabilities). I'm not selecting any filters and trying to export everything.

thangamani-arun · 2023-02-01T01:23:52Z

Same with us. we are running on 2.4.x, however to upgrade to 2.6 its failing to unmount or mount with a PVC has ReadWriteOnce.

@Vad1mo Does it solve upgrade failures if we use S3 for these HA pods?

chlins · 2023-02-01T01:32:36Z

@chlins

It seems to not complain when exporting project CVEs:

2023-01-16T12:51:22Z [INFO] [/jobservice/worker/cworker/c_worker.go:77]: Job incoming: {"name":"SCAN_DATA_EXPORT","id":"bda3478934353bb315599b43","t":1673873482,"args":null}
2023-01-16T12:51:22Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://registry-harbor-core:80/api/v2.0/internalconfig
2023-01-16T12:51:22Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://registry-harbor-core:80/api/v2.0/internalconfig
2023-01-16T12:51:22Z [INFO] [/pkg/scan/export/filter_processor.go:53]: Retrieved user id :6 for user name : <redacted>
2023-01-16T12:51:22Z [INFO] [/pkg/scan/export/filter_processor.go:234]: User <redacted> is not sys admin. Selecting projects with admin roles for export.
2023-01-16T12:51:22Z [INFO] [/pkg/scan/export/filter_processor.go:65]: Selected 1 projects administered by user <redacted>
2023-01-16T12:51:22Z [INFO] [/pkg/config/rest/rest.go:47]: get configuration from url: http://registry-harbor-core:80/api/v2.0/internalconfig
2023-01-16T12:51:24Z [INFO] [/jobservice/runner/redis.go:152]: Job 'SCAN_DATA_EXPORT:bda3478934353bb315599b43' exit with success

But the files in the Event Log bar on the right are empty when downloaded (despite images in the project having vulnerabilities). I'm not selecting any filters and trying to export everything.

The function works from the log, no CVE found may be caused by other problem like filter or permission.

chlins · 2023-02-01T01:35:37Z

Same with us. we are running on 2.4.x, however to upgrade to 2.6 its failing to unmount or mount with a PVC has ReadWriteOnce.

@Vad1mo Does it solve upgrade failures if we use S3 for these HA pods?

@thangamani-arun The temporary workaround is to change the scandata volume to emptyDir, we'll fix this issue completely in the coming patch.

maxdanilov · 2023-02-01T10:34:45Z

@chlins

The function works from the log, no CVE found may be caused by other problem like filter or permission.

I don't think it's the case: the export was run with user having access to everything and no filters were applied (with vulnerabilities being present in the exported repos).

zyyw self-assigned this Oct 20, 2022

zyyw added the review/watch-list label Oct 20, 2022

zyyw added the help wanted Extra attention is needed label Dec 7, 2022

eugenberend mentioned this issue Dec 8, 2022

Add an ability to disable PVC for scandata #1354

Closed

zyyw mentioned this issue Jan 5, 2023

Jobservice cannot run replicated with RWO PVCs goharbor/harbor#18067

Closed

chlins removed the help wanted Extra attention is needed label Jan 11, 2023

This was referenced Jan 31, 2023

fix: remove scandata volume in jobservice #1378

Merged

[cherry-pick] fix: remove scandata volume in jobservice #1379

Merged

[Cherry-pick] fix: remove scandata volume in jobservice #1380

Merged

zyyw closed this as completed in #1378 Feb 16, 2023

zyyw mentioned this issue Feb 22, 2023

Add scanDataExports setting explanation to README #1401

Closed

zyyw mentioned this issue Mar 8, 2023

Unable to attach or mount volumes: unmounted volumes=[job-scandata-exports] #1435

Closed

This was referenced Mar 20, 2023

CVE export error "404 page not found" goharbor/harbor#18380

Closed

Filed to export CVE #1444

Closed

sebglon mentioned this issue Nov 9, 2023

[harbor] remove scandata on jobservice bitnami/charts#20862

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harbor 2.6 - Jobservice cannot run replicated with RWO PVCs #1320

Harbor 2.6 - Jobservice cannot run replicated with RWO PVCs #1320

Tonkari commented Oct 12, 2022

r-ising commented Oct 14, 2022

niklasweimann commented Oct 14, 2022

janosmiko commented Oct 14, 2022

Duke1616 commented Oct 25, 2022

blufor commented Oct 26, 2022

maxdanilov commented Oct 27, 2022 •

edited

Loading

derekcha commented Dec 6, 2022

Vad1mo commented Jan 5, 2023

chlins commented Jan 11, 2023 •

edited

Loading

Tonkari commented Jan 11, 2023

chlins commented Jan 14, 2023

chlins commented Jan 14, 2023

Tonkari commented Jan 14, 2023

maxdanilov commented Jan 15, 2023 •

edited

Loading

chlins commented Jan 16, 2023

maxdanilov commented Jan 16, 2023 •

edited

Loading

thangamani-arun commented Feb 1, 2023

chlins commented Feb 1, 2023

chlins commented Feb 1, 2023

maxdanilov commented Feb 1, 2023

Harbor 2.6 - Jobservice cannot run replicated with RWO PVCs #1320

Harbor 2.6 - Jobservice cannot run replicated with RWO PVCs #1320

Comments

Tonkari commented Oct 12, 2022

Summary

Environment

Problem

Possible solutions

Separate persistence option for scan data export

Store scan data exports in the database

External storage support for scan data export

Use statefulSet for jobservice

Update docs

r-ising commented Oct 14, 2022

niklasweimann commented Oct 14, 2022

janosmiko commented Oct 14, 2022

Duke1616 commented Oct 25, 2022

blufor commented Oct 26, 2022

maxdanilov commented Oct 27, 2022 • edited Loading

derekcha commented Dec 6, 2022

Vad1mo commented Jan 5, 2023

chlins commented Jan 11, 2023 • edited Loading

Tonkari commented Jan 11, 2023

chlins commented Jan 14, 2023

chlins commented Jan 14, 2023

Tonkari commented Jan 14, 2023

maxdanilov commented Jan 15, 2023 • edited Loading

chlins commented Jan 16, 2023

maxdanilov commented Jan 16, 2023 • edited Loading

thangamani-arun commented Feb 1, 2023

chlins commented Feb 1, 2023

chlins commented Feb 1, 2023

maxdanilov commented Feb 1, 2023

maxdanilov commented Oct 27, 2022 •

edited

Loading

chlins commented Jan 11, 2023 •

edited

Loading

maxdanilov commented Jan 15, 2023 •

edited

Loading

maxdanilov commented Jan 16, 2023 •

edited

Loading