Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding user home space is flaky #3749

Closed
pascalwengerter opened this issue May 9, 2022 · 10 comments
Closed

Finding user home space is flaky #3749

pascalwengerter opened this issue May 9, 2022 · 10 comments
Labels
OCIS-Fastlane Planned outside of the sprint Status:Accepted Type:Bug

Comments

@pascalwengerter
Copy link
Contributor

Describe the bug

A clear and concise description of what the bug is.

Steps to reproduce

Steps to reproduce the behavior:

  1. Log in as any user
  2. Loading personal space fails unreliably

Expected behavior

Solid loading of personal space

Actual behavior

Flakiness, both when navigating and when doing full-page reloads

Logs

Success case

{"level":"info","service":"gateway","pkg":"rgrpc","traceid":"00000000000000000000000000000000","time":"2022-05-09T18:00:18+02:00","message":"listing public shares"}
{"level":"info","service":"sharing","pkg":"rgrpc","traceid":"00000000000000000000000000000000","publicshareprovider":"list","time":"2022-05-09T18:00:18+02:00","message":"list public share"}
{"level":"debug","service":"sharing","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0","from":"tcp://127.0.0.1:62003","uri":"/cs3.sharing.link.v1beta1.LinkAPI/ListPublicShares","start":"09/May/2022:18:00:18 +0200","end":"09/May/2022:18:00:18 +0200","time_ns":845542,"code":"OK","time":"2022-05-09T18:00:18+02:00","message":"unary"}
{"level":"debug","service":"gateway","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0","from":"tcp://127.0.0.1:61919","uri":"/cs3.gateway.v1beta1.GatewayAPI/ListPublicShares","start":"09/May/2022:18:00:18 +0200","end":"09/May/2022:18:00:18 +0200","time_ns":2089375,"code":"OK","time":"2022-05-09T18:00:18+02:00","message":"unary"}
{"level":"debug","service":"ocdav","name":"com.owncloud.web.ocdav","traceid":"00000000000000000000000000000000","md":{"type":2,"id":{"storage_id":"1284d238-aa92-42ce-bdc4-0b0000009157$058bff95-6708-4fe5-91e4-9ea3d377588b","opaque_id":"058bff95-6708-4fe5-91e4-9ea3d377588b"},"etag":"\"6c6e688fc3d18187aa9cc3c409027ea0\"","mime_type":"httpd/unix-directory","mtime":{"seconds":1652111982,"nanos":543652874},"path":"/1284d238-aa92-42ce-bdc4-0b0000009157$058bff95-6708-4fe5-91e4-9ea3d377588b","permission_set":{"add_grant":true,"create_container":true,"delete":true,"get_path":true,"get_quota":true,"initiate_file_download":true,"initiate_file_upload":true,"list_grants":true,"list_container":true,"list_file_versions":true,"list_recycle":true,"move":true,"remove_grant":true,"purge_recycle":true,"restore_file_version":true,"restore_recycle_item":true,"stat":true,"update_grant":true},"owner":{"idp":"https://localhost:9200","opaque_id":"058bff95-6708-4fe5-91e4-9ea3d377588b","type":1},"arbitrary_metadata":{"metadata":{"http://owncloud.org/ns/favorite":""}}},"ns":"","role":{"Name":"manager"},"dav-permissions":"RDNVCK","time":"2022-05-09T18:00:18+02:00","message":"converted PermissionSet"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request":"fe47dac0-3a5a-42e4-b33d-1f50e000fc46","remote-addr":"127.0.0.1:61924","method":"PROPFIND","status":207,"path":"/remote.php/dav/spaces/1284d238-aa92-42ce-bdc4-0b0000009157$058bff95-6708-4fe5-91e4-9ea3d377588b","duration":15.399459,"bytes":1083,"time":"2022-05-09T18:00:18+02:00","message":"access-log"}

Failure case


{"level":"debug","service":"proxy","claims":{"email":"moss@example.org","email_verified":false,"family_name":"Moss","given_name":"Maurice","name":"Maurice Moss","preferred_username":"moss","sub":"PBcG4gQTJZl-U92i@5HvWa9OJl9iiQyTS8ov76TiCiyp0Oray0MJKf-d4yPge5mDBA_x21oXJtHoXbY3U--Qeeg"},"user":{"id":{"idp":"https://localhost:9200","opaque_id":"058bff95-6708-4fe5-91e4-9ea3d377588b","type":1},"username":"moss","mail":"moss@example.org","display_name":"Maurice Moss","groups":["509a9dcd-bb37-4f4f-a01a-19dca27d9cfa"],"opaque":{"map":{"roles":{"decoder":"json","value":"WyI3MTg4MTg4My0xNzY4LTQ2YmQtYTI0ZC1hMzU2YTJhZmRmN2YiXQ=="}}},"uid_number":99,"gid_number":99},"time":"2022-05-09T18:00:20+02:00","message":"associated claims with user"}
{"level":"debug","service":"gateway","pkg":"rgrpc","traceid":"00000000000000000000000000000000","method":"/cs3.gateway.v1beta1.GatewayAPI/CreateHome","time":"2022-05-09T18:00:20+02:00","message":"skipping auth"}
{"level":"debug","service":"gateway","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"grpc-go/1.46.0","from":"tcp://127.0.0.1:61958","uri":"/cs3.storage.registry.v1beta1.RegistryAPI/GetStorageProviders","start":"09/May/2022:18:00:20 +0200","end":"09/May/2022:18:00:20 +0200","time_ns":70209,"code":"OK","time":"2022-05-09T18:00:20+02:00","message":"unary"}
{"level":"debug","service":"storage-users","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"grpc-go/1.46.0","from":"tcp://127.0.0.1:61959","uri":"/cs3.storage.provider.v1beta1.ProviderAPI/CreateStorageSpace","start":"09/May/2022:18:00:20 +0200","end":"09/May/2022:18:00:20 +0200","time_ns":312750,"code":"OK","time":"2022-05-09T18:00:20+02:00","message":"unary"}
{"level":"debug","service":"gateway","pkg":"rgrpc","traceid":"00000000000000000000000000000000","user-agent":"grpc-go/1.46.0","from":"tcp://127.0.0.1:61917","uri":"/cs3.gateway.v1beta1.GatewayAPI/CreateHome","start":"09/May/2022:18:00:20 +0200","end":"09/May/2022:18:00:20 +0200","time_ns":775625,"code":"OK","time":"2022-05-09T18:00:20+02:00","message":"unary"}
{"level":"debug","service":"proxy","policy":"ocis","method":"PROPFIND","prefix":"/remote.php/","path":"/remote.php/dav/spaces/1284d238-aa92-42ce-bdc4-0b0000009157$058bff95-6708-4fe5-91e4-9ea3d377588b","routeType":"prefix","time":"2022-05-09T18:00:20+02:00","message":"director found"}
{"level":"debug","service":"idm","scope":"ldap","time":"2022-05-09T18:00:20+02:00","message":"http: proxy error: dial tcp 127.0.0.1:38525: connect: connection refused"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request":"072b9b89-e029-40ee-beb7-7581dd95f25e","remote-addr":"127.0.0.1:61924","method":"PROPFIND","status":502,"path":"/remote.php/dav/spaces/1284d238-aa92-42ce-bdc4-0b0000009157$058bff95-6708-4fe5-91e4-9ea3d377588b","duration":4.445,"bytes":0,"time":"2022-05-09T18:00:20+02:00","message":"access-log"}

Setup

oCIS on curren master (today 13:00)

@pascalwengerter
Copy link
Contributor Author

pascalwengerter commented May 9, 2022

Hint: Seems to fail for Golang version <1.18 initial suspect was that a Go version bump would resolve it, for me (and others) it still fails on Go v1.18.X

@micbar
Copy link
Contributor

micbar commented May 9, 2022

The problem happens here https://github.com/cs3org/reva/blob/edge/pkg/storage/utils/filelocks/filelocks.go#L144

We need to find a solution to suppress the error.

@micbar micbar added Status:Accepted OCIS-Fastlane Planned outside of the sprint labels May 9, 2022
@butonic
Copy link
Member

butonic commented May 10, 2022

Do we need to always supress the error, or only if we tried to delete a file that does no longer exist?

@pascalwengerter
Copy link
Contributor Author

Do we need to always supress the error, or only if we tried to delete a file that does no longer exist?

From what I witness and the way I understand it the problem occurs when trying to fetch the personal space or upload files into it (also initially when no files are present, but not limited to that)

@butonic
Copy link
Member

butonic commented May 10, 2022

Hm, I can trigger

2022-05-10T07:08:17Z ERR failed to list storage spaces error="remove /home/vscode/.ocis/storage/users/spaces/so/me-admin-user-id-0000-000000000000/nodes/so/me/-a/dm/in-user-id-0000-000000000000.flock: no such file or directory" filters=[{"Term":{"Id":{"opaque_id":"some-admin-user-id-0000-000000000000!some-admin-user-id-0000-000000000000"}},"type":2}] pkg=rgrpc service=storage-users status={"code":15,"message":"error listing spaces","trace":"00000000000000000000000000000000"} traceid=00000000000000000000000000000000

by sending 50-100 concurrent PROPFINDS to a spaces endpoint.

for f in `seq 1 100`; do curl -o /dev/null -s -w "%{http_code}\n" -k -X PROPFIND 'https://cloud.ocis.test/dav/spaces/1284d238-aa92-42ce-bdc4-0b0000009157$some-admin-user-id-0000-000000000000' -u 'admin:admin' & ; done

Which is the locking issue mentioned in #3749 (comment)

But in your case I was stumbling over http: proxy error: dial tcp 127.0.0.1:38525: connect: connection refused .
@pascalwengerter do you see .flock: no such file or directory in your log when listing fails?

@pascalwengerter
Copy link
Contributor Author

I posted my logs above and I don't remember seeing any .flock logs there, happy to help with debugging again later today? @dragotin and @lookacat can also reproduce it IIRC

@lookacat
Copy link
Contributor

lookacat commented May 10, 2022

yes I get this error with the latest master and latest build for osx. For me it fails every time sadly :/

@lookacat
Copy link
Contributor

lookacat commented May 10, 2022

First load of the page after login

2022-05-10T10:00:24+02:00 ERR error initializing metadata client error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:9215: socket: too many open files\"" service=ocis
2022-05-10T10:00:24+02:00 ERR failed to list storage spaces error="xattrs: Unable to lock file for read: could not acquire lock after wait" filters=[{"Term":{"Id":{"opaque_id":"bda94c72-625e-4a4b-80ca-d9c046534c50!bda94c72-625e-4a4b-80ca-d9c046534c50"}},"type":2}] pkg=rgrpc service=storage-users status={"code":15,"message":"error listing spaces","trace":"00000000000000000000000000000000"} traceid=00000000000000000000000000000000
2022-05-10T10:00:24+02:00 ERR error sending get quota grpc request service=graph
2022-05-10T10:00:28+02:00 ERR Could not add default role error="{\"id\":\"go.micro.client\",\"code\":503,\"detail\":\"connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:9191: socket: too many open files\\\"\",\"status\":\"Service Unavailable\"}" service=proxy
2022-05-10T10:00:28+02:00 ERR failed to get roles for user error="{\"id\":\"go.micro.client\",\"code\":503,\"detail\":\"connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:9191: socket: too many open files\\\"\",\"status\":\"Service Unavailable\"}" service=ocs userid=bda94c72-625e-4a4b-80ca-d9c046534c50
2022-05-10T10:00:29+02:00 ERR Could not load roles error="{\"id\":\"go.micro.client\",\"code\":503,\"detail\":\"connection error: desc = \\\"transport: Error while dialing dial tcp 127.0.0.1:9191: socket: too many open files\\\"\",\"status\":\"Service Unavailable\"}" service=proxy
2022-05-10T10:00:29+02:00 ERR Could not add default role error="{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}" service=proxy
2022-05-10T10:00:29+02:00 ERR Could not load roles error="{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}" service=proxy

this happens after i reload a couple of times

@kobergj
Copy link
Collaborator

kobergj commented May 10, 2022

Reason is the flock issue. Closing this in favor of #3757

@kobergj kobergj closed this as completed May 10, 2022
@butonic
Copy link
Member

butonic commented May 10, 2022

@lookacat your logs indicate a problem with
Error while dialing dial tcp 127.0.0.1:9215: socket: too many open files

That is a different root cause. could you open a new issue, as #222 is closed.

To check the current number of open file descriptors you can use the command in #268 (comment)

The solution might be as easy as raising the limit of open file descriptors. The microservice architecture opens quite a lot of tcp connections between different ports.

@mmattel I cannot find something on file descriptors / ulimits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCIS-Fastlane Planned outside of the sprint Status:Accepted Type:Bug
Projects
None yet
Development

No branches or pull requests

5 participants