Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COS objetstore client MaxIdleConnsPerHost too few #4479

Closed
hanjm opened this issue Jul 24, 2021 · 1 comment · Fixed by #4482
Closed

COS objetstore client MaxIdleConnsPerHost too few #4479

hanjm opened this issue Jul 24, 2021 · 1 comment · Fixed by #4482

Comments

@hanjm
Copy link
Member

hanjm commented Jul 24, 2021

Thanos, Prometheus and Golang version used: v0.21.1

Object Storage Provider: COS

What happened:
the thanos tools bucket web run after some duration, there will a lot of error logs like dial tcp 169.xxx:443: connect: cannot assign requested address. netstats show many TIME_WAIT connection.
image

image

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
run thanos tools web on a objectstore which has many object.

Full logs to relevant components:

Logs

2021-07-24T19:55:27.837291407+08:00 level=debug ts=2021-07-24T11:55:27.837083718Z caller=fetcher.go:321 component=block.BaseFetcher msg="fetching meta data" concurrency=32
2021-07-24T19:57:05.947864137+08:00 level=error ts=2021-07-24T11:57:05.94755018Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="incomplete view: 8 errors: meta.json file exists: 01F3Z65VWTE6C5DSB7P6XZWQ2W/meta.json: head cos object: Head \"https://${cos url, i replaced}/01F3Z65VWTE6C5DSB7P6XZWQ2W%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3YZA8HTSR577AXRYFSJ057K/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3YZA8HTSR577AXRYFSJ057K%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3Z65VXQ5YR6MCG6K29NG5Z2/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3Z65VXQ5YR6MCG6K29NG5Z2%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3Z65VXHMKDV684WB7XJ0VJJ/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3Z65VXHMKDV684WB7XJ0VJJ%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3Z65VWKX3T5QMAH92V0NJZ8/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3Z65VWKX3T5QMAH92V0NJZ8%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3Z65VX487XNYTSZ7VDWA65H/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3Z65VX487XNYTSZ7VDWA65H%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3Z65XVZKBXMT7Y4EN4MH7VX/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3Z65XVZKBXMT7Y4EN4MH7VX%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address; meta.json file exists: 01F3ZD1N3Z15EPT9X677B5H913/meta.json: head cos object: Head \"https://*******.cos.ap-guangzhou.myqcloud.com/01F3ZD1N3Z15EPT9X677B5H913%2Fmeta.json\": dial tcp *.*.*.*:443: connect: cannot assign requested address"
2021-07-24T19:57:05.947916124+08:00 level=debug ts=2021-07-24T11:57:05.947797162Z caller=fetcher.go:321 component=block.BaseFetcher msg="fetching meta data" concurrency=32

Anything else we need to know:

Environment:

  • OS (e.g. from /etc/os-release): Linux
  • Kernel (e.g. uname -a): 4.14.105
  • Others:
    inspect metrics
$ curl -s *${thanos tools web addr}/metrics|grep fail
# HELP thanos_bucket_blocks_meta_sync_failures_total Total blocks metadata synchronization failures
# TYPE thanos_bucket_blocks_meta_sync_failures_total counter
thanos_bucket_blocks_meta_sync_failures_total 4
thanos_bucket_blocks_meta_synced{state="failed"} 123
# HELP thanos_objstore_bucket_operation_failures_total Total number of operations against a bucket that failed, but were not expected to fail in certain way from caller perspective. Those errors have to be investigated.
# TYPE thanos_objstore_bucket_operation_failures_total counter
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="attributes"} 0
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="delete"} 0
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="exists"} 753
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="get"} 0
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="get_range"} 0
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="iter"} 0
thanos_objstore_bucket_operation_failures_total{bucket="***",operation="upload"} 0
# HELP thanos_status Represents status (0 indicates failure, 1 indicates success) of the component.
@hanjm hanjm changed the title block fetcher not read body when close, which lead leak Block fetcher not read body when close, which lead leak Jul 24, 2021
hanjm added a commit to hanjm/thanos that referenced this issue Jul 24, 2021
@hanjm hanjm changed the title Block fetcher not read body when close, which lead leak Block seems lead leak Jul 24, 2021
@hanjm hanjm changed the title Block seems lead leak Block fetcher seems lead leak Jul 24, 2021
hanjm added a commit to hanjm/thanos that referenced this issue Jul 24, 2021
@hanjm hanjm changed the title Block fetcher seems lead leak COS objetstore client MaxIdleConnsPerHost too low Jul 24, 2021
@hanjm hanjm changed the title COS objetstore client MaxIdleConnsPerHost too low COS objetstore client MaxIdleConnsPerHost too few Jul 24, 2021
@hanjm
Copy link
Member Author

hanjm commented Jul 24, 2021

I think the root cause it the cos client use http.MaxIdleConnsPerHost=2, but it concurrent num of fetch metadata is 32, then many http connection create and close. too many TIME_WAIT.

hanjm added a commit to hanjm/thanos that referenced this issue Jul 25, 2021
…s-io#4479)

The http.Transport is not auto-tuning and one size does not seem to fit all cases.
Add `HTTPConfig` Allow to tune most of the http.Transport parameters and `MaxIdleConnsPerHost` tune to 100.

Signed-off-by: hanjm <hanjinming@outlook.com>
hanjm added a commit to hanjm/thanos that referenced this issue Jul 25, 2021
…s-io#4479)

The http.Transport is not auto-tuning and one size does not seem to fit all cases.
Add `HTTPConfig` Allow to tune most of the http.Transport parameters and `MaxIdleConnsPerHost` tune to 100.

Signed-off-by: hanjm <hanjinming@outlook.com>
bwplotka pushed a commit that referenced this issue Jul 27, 2021
* Fix cos client MaxIdleConnsPerHost too low, too many TIME_WAIT (#4479)
The http.Transport is not auto-tuning and one size does not seem to fit all cases.
Add `HTTPConfig` Allow to tune most of the http.Transport parameters and `MaxIdleConnsPerHost` tune to 100.

Signed-off-by: hanjm <hanjinming@outlook.com>

* Add docs about cos object store config `http_config`

Signed-off-by: hanjm <hanjinming@outlook.com>

* Add changelog

Signed-off-by: hanjm <hanjinming@outlook.com>

* Add copyright

Signed-off-by: hanjm <hanjinming@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant