Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Test 5-22- NFS Volume portlayer server error #5819

Closed
sgairo opened this issue Jul 25, 2017 · 19 comments · Fixed by #6044
Closed

Nightly Test 5-22- NFS Volume portlayer server error #5819

sgairo opened this issue Jul 25, 2017 · 19 comments · Fixed by #6044
Labels
component/portlayer kind/defect Behavior that is inconsistent with what's intended priority/p0 source/scenario Found via a scenario failure team/container team/foundation

Comments

@sgairo
Copy link
Contributor

sgairo commented Jul 25, 2017

Seen in 6.5:
5-22-NFS-Volume.zip

Seen in Personality:

Jul 24 2017 07:02:41.332Z DEBUG * stdout attach end Mon Jul 24 07:02:41 UTC 2017
Jul 24 2017 07:02:41.332Z ERROR container attach: stdout (50b6f4a03b7a6d57dab7f00642d8eb580dd915095e6f9345b84af6b7bb22650c): Server error from portlayer: &{0 } (*models.Error) is not supported by the ByteStreamConsumer, can be resolved by supporting Writer/BinaryUnmarshaler interface
Jul 24 2017 07:02:41.332Z INFO  cleaned up connections to 50b6f4a03b7a6d57dab7f00642d8eb580dd915095e6f9345b84af6b7bb22650c. Checking errors
Jul 24 2017 07:02:41.332Z ERROR container attach error: Server error from portlayer: &{0 } (*models.Error) is not supported by the ByteStreamConsumer, can be resolved by supporting Writer/BinaryUnmarshaler interface
Jul 24 2017 07:02:41.332Z ERROR Operation failed with errors.apiError{error:(*errors.errorString)(0xc420168430), statusCode:500}
Jul 24 2017 07:02:41.332Z DEBUG [ END ] [github.com/vmware/vic/pkg/retry.DoWithConfig:73] [30.776609454s] 
Jul 24 2017 07:02:41.332Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/engine/backends.(*Container).ContainerAttach:1596] [30.776642372s] 50b6f4a03b7a6d57dab7f00642d8eb580dd915095e6f9345b84af6b7bb22650c
Jul 24 2017 07:02:41.332Z ERROR Handler for POST /v1.25/containers/50b6f4a03b7a6d57dab7f00642d8eb580dd915095e6f9345b84af6b7bb22650c/attach returned error: Server error from portlayer: &{0 } (*models.Error) is not supported by the ByteStreamConsumer, can be resolved by supporting Writer/BinaryUnmarshaler interface
Jul 24 2017 07:02:41.332Z ERROR Error Hijacking: Server error from portlayer: &{0 } (*models.Error) is not supported by the ByteStreamConsumer, can be resolved by supporting Writer/BinaryUnmarshaler interface
Jul 24 2017 07:05:22.053Z ERROR Will stop trying again. Operation failed with errors.apiError{error:(*errors.errorString)(0xc420840280), statusCode:500}
Jul 24 2017 07:05:22.053Z DEBUG [ END ] [github.com/vmware/vic/pkg/retry.DoWithConfig:73] [3m10.13676052s] 
Jul 24 2017 07:05:22.054Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/engine/backends.(*Container).ContainerStart:841] [3m10.136822193s] 50b6f4a03b7a6d57dab7f00642d8eb580dd915095e6f9345b84af6b7bb22650c
Jul 24 2017 07:05:22.054Z ERROR Handler for POST /v1.25/containers/50b6f4a03b7a6d57dab7f00642d8eb580dd915095e6f9345b84af6b7bb22650c/start returned error: Server error from portlayer: unable to wait for process launch status: Post https://10.161.246.183/sdk: context deadline exceeded
Jul 24 2017 07:05:22.121Z DEBUG Client context cancelled, stop sending events 


Initial Thoughts: Might be related to #5015 so possibly and #5215 
@sgairo sgairo added component/portlayer component/test Tests not covered by a more specific component label priority/p0 labels Jul 25, 2017
@sgairo
Copy link
Contributor Author

sgairo commented Jul 26, 2017

Seen in 6.5 7/25 Nightly Report:
5-22-NFS-Volume.zip

@mdubya66 mdubya66 added the kind/defect Behavior that is inconsistent with what's intended label Jul 26, 2017
@sgairo
Copy link
Contributor Author

sgairo commented Aug 1, 2017

Seen in 6.5 7/31 Nightly Report:
5-22-NFS-Volume.zip

And in 6.0:
5-22-NFS-Volume.zip

@mhagen-vmware mhagen-vmware removed the component/test Tests not covered by a more specific component label label Aug 2, 2017
@jzt jzt self-assigned this Aug 2, 2017
@jzt jzt added this to the Sprint 14 milestone Aug 2, 2017
@emlin emlin assigned emlin and unassigned jzt Aug 11, 2017
@emlin
Copy link
Contributor

emlin commented Aug 12, 2017

Found sth interesting in the portlayer log:

The connection is created at Jul 31 2017 07:15:22.774Z, but the waiting thread still report CDE at Jul 31 2017 07:15:48.940Z, 26 seconds later.

Jul 31 2017 07:15:22.773Z DEBUG attach connector: Unblocking for 4d2369c72b83fd4caf29cf17d168aa294e62a62cf27c84182c4830fc4244b045
Jul 31 2017 07:15:22.774Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/attach/communication.(*interaction).Unblock:227]
Jul 31 2017 07:15:48.940Z ERROR attach connector: Connection not found error for id:4d2369c72b83fd4caf29cf17d168aa294e62a62cf27c84182c4830fc4244b045: context deadline exceeded

@emlin
Copy link
Contributor

emlin commented Aug 14, 2017

There are two problems in this issue:

  • Error message like (*models.Error) is not supported by the ByteStreamConsumer, can be resolved by supporting Writer/BinaryUnmarshaler interface is not the real error returned by PL.
    This problem is because go-swagger cannot parse error correctly, which might because of the vendor code version github.com/go-openapi/runtime.
    The go-openapi/runtime version is updated in go-swagger latest code, but we might not want to update go-swagger atm. @casualjim is there an easy way to fix this issue in vendored go-swagger code?

  • Connection was created, but still reported CDE in PL attach, and that will eventually cause persona return the above error
    Is still looking at this issue

@mhagen-vmware
Copy link
Contributor

mhagen-vmware commented Nov 17, 2017

Re-opening this based on the dupe of #5941:

Manual-Test-Cases.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-2479-container-logs.zip

Nov 15 2017 18:10:03.539Z ERROR Handler for POST /v1.25/containers/409e9c12ff94c26d6e98787ccd8c566f1e34c2c7ff90df129c7d7eb1a6550401/attach returned error: Server error from portlayer: [GET /interaction/{id}/stderr][404] containerGetStderrNotFound 

@matthewavery
Copy link
Contributor

This looks like an attach streaming issue which might be seen by other tests as well. This is definitely coming from one of the exec calls(most likely). And we should have some much better logic in place surrounding failed attach behavior soon. Once that PR is in for the exec path we may revisit this and think about closing it once again.

@mhagen-vmware
Copy link
Contributor

Seen in 1-06-Docker-Run here: https://ci.vcna.io/vmware/vic/15691

@hickeng
Copy link
Member

hickeng commented Jan 18, 2018

I don't think that 1-06-Docker-Run failure is related to whatever this is. I have commented here with some analysis, but it relates to a failure to successfully pull redis:alpine due to a 503 Service unavailable from ESX when updating the docker repository kvdata.

@matthewavery matthewavery added the source/scenario Found via a scenario failure label Jan 25, 2018
@sgairo sgairo removed this from the Sprint 15 milestone Feb 6, 2018
@sgairo
Copy link
Contributor Author

sgairo commented Feb 12, 2018

@AngieCris
Copy link
Contributor

Seen in nightly 02/22/18
VC version: 6.5
Test suite: 5-22-NFS-Volume
Test case: Docker Single Write and Read to/from File from one Container using NFS Volume (this step happens before reboot)

docker command that failed:
docker -H 10.193.4.99:2376 --tls exec -i createFileContainer sh -c "echo 'The Texas and Chile flag look similar.' > /mydata/test_nfs_file.txt"

error message:
Server error from portlayer: [GET /interaction/{id}/stderr][404] containerGetStderrNotFound

(the other failure in the test suite follows from this failure.)

portlayer log before VM reboot shows a CDE during container handler attach:

Feb 22 2018 09:37:55.034Z DEBUG [ END ]  [vic/lib/apiservers/portlayer/restapi/handlers.(*ContainersHandlersImpl).GetHandler:207] [261.798517ms] 6aaf58b7b97fe529cce608df49de542f0a6753300e6f2e3c8345b2d9622c3954
Feb 22 2018 09:37:55.035Z DEBUG [BEGIN]  [vic/lib/apiservers/portlayer/restapi/handlers.(*TaskHandlersImpl).WaitHandler:251]
Feb 22 2018 09:37:55.035Z DEBUG op=298.94: [NewOperation] op=298.94 [vic/lib/apiservers/portlayer/restapi/handlers.(*TaskHandlersImpl).WaitHandler:252]
Feb 22 2018 09:37:55.035Z DEBUG [BEGIN]  [vic/lib/portlayer/exec.HandleFromInterface:172]
Feb 22 2018 09:37:55.035Z DEBUG [ END ]  [vic/lib/portlayer/exec.HandleFromInterface:172] [4.721µs] 
Feb 22 2018 09:37:55.035Z DEBUG [BEGIN] op=298.94 [vic/lib/portlayer/task.Wait:28] 260d8d203058a36a7c9ce9c726cf83313965982d6720761eb8721231b6b47d62
Feb 22 2018 09:37:55.035Z DEBUG [BEGIN] op=298.94 [vic/lib/portlayer/exec.(*containerBase).WaitForExec:167] 260d8d203058a36a7c9ce9c726cf83313965982d6720761eb8721231b6b47d62
Feb 22 2018 09:37:55.035Z DEBUG [BEGIN] op=298.94 [vic/lib/portlayer/exec.(*containerBase).waitForExec:412] 260d8d203058a36a7c9ce9c726cf83313965982d6720761eb8721231b6b47d62
Feb 22 2018 09:37:55.741Z DEBUG HandshakeClient: Sending syn with pos 2
Feb 22 2018 09:37:56.284Z DEBUG HandshakeClient: HandshakeClient: Unexpected server response: 0x6f
Feb 22 2018 09:37:56.284Z DEBUG HandshakeClient: Sending syn with pos 174
Feb 22 2018 09:37:57.284Z DEBUG HandshakeClient: Sending syn with pos 173
Feb 22 2018 09:37:58.284Z DEBUG HandshakeClient: Sending syn with pos 118
Feb 22 2018 09:37:59.284Z DEBUG HandshakeClient: Sending syn with pos 92
Feb 22 2018 09:38:00.285Z DEBUG HandshakeClient: Sending syn with pos 121
Feb 22 2018 09:38:01.285Z DEBUG HandshakeClient: Sending syn with pos 19
Feb 22 2018 09:38:02.285Z DEBUG HandshakeClient: Sending syn with pos 100
Feb 22 2018 09:38:03.285Z DEBUG HandshakeClient: Sending syn with pos 16
Feb 22 2018 09:38:04.285Z DEBUG HandshakeClient: Sending syn with pos 43
Feb 22 2018 09:38:05.285Z DEBUG HandshakeClient: Sending syn with pos 223
Feb 22 2018 09:38:06.286Z DEBUG HandshakeClient: Sending syn with pos 63
Feb 22 2018 09:38:07.286Z DEBUG HandshakeClient: Sending syn with pos 171
Feb 22 2018 09:38:08.286Z DEBUG HandshakeClient: Sending syn with pos 67
Feb 22 2018 09:38:09.286Z DEBUG HandshakeClient: Sending syn with pos 212
Feb 22 2018 09:38:10.286Z DEBUG HandshakeClient: Sending syn with pos 186
Feb 22 2018 09:38:11.287Z DEBUG HandshakeClient: Sending syn with pos 5
Feb 22 2018 09:38:12.287Z DEBUG HandshakeClient: Sending syn with pos 37
Feb 22 2018 09:38:13.287Z DEBUG HandshakeClient: Sending syn with pos 88
Feb 22 2018 09:38:14.287Z DEBUG HandshakeClient: Sending syn with pos 132
Feb 22 2018 09:38:15.287Z DEBUG HandshakeClient: Sending syn with pos 47
Feb 22 2018 09:38:16.287Z DEBUG HandshakeClient: Sending syn with pos 179
Feb 22 2018 09:38:17.287Z DEBUG HandshakeClient: Sending syn with pos 133
Feb 22 2018 09:38:18.288Z DEBUG HandshakeClient: Sending syn with pos 239
Feb 22 2018 09:38:19.288Z DEBUG HandshakeClient: Sending syn with pos 63
Feb 22 2018 09:38:20.288Z DEBUG HandshakeClient: Sending syn with pos 103
Feb 22 2018 09:38:21.288Z DEBUG HandshakeClient: Sending syn with pos 187
Feb 22 2018 09:38:22.288Z DEBUG HandshakeClient: Sending syn with pos 215
Feb 22 2018 09:38:23.288Z DEBUG HandshakeClient: Sending syn with pos 38
Feb 22 2018 09:38:24.289Z DEBUG HandshakeClient: Sending syn with pos 27
Feb 22 2018 09:38:25.289Z DEBUG HandshakeClient: Sending syn with pos 144
Feb 22 2018 09:38:26.289Z DEBUG HandshakeClient: Sending syn with pos 220
Feb 22 2018 09:38:27.289Z DEBUG HandshakeClient: Sending syn with pos 217
Feb 22 2018 09:38:28.289Z DEBUG HandshakeClient: Sending syn with pos 219
Feb 22 2018 09:38:29.289Z DEBUG HandshakeClient: Sending syn with pos 64
Feb 22 2018 09:38:30.290Z DEBUG HandshakeClient: Sending syn with pos 252
Feb 22 2018 09:38:31.290Z DEBUG HandshakeClient: Sending syn with pos 27
Feb 22 2018 09:38:32.290Z DEBUG HandshakeClient: Sending syn with pos 206
Feb 22 2018 09:38:33.290Z DEBUG HandshakeClient: Sending syn with pos 128
Feb 22 2018 09:38:34.290Z DEBUG HandshakeClient: Sending syn with pos 140
Feb 22 2018 09:38:35.290Z DEBUG HandshakeClient: Sending syn with pos 205
Feb 22 2018 09:38:36.290Z DEBUG HandshakeClient: Sending syn with pos 68
Feb 22 2018 09:38:37.291Z DEBUG HandshakeClient: Sending syn with pos 51
Feb 22 2018 09:38:38.291Z DEBUG HandshakeClient: Sending syn with pos 155
Feb 22 2018 09:38:39.291Z DEBUG HandshakeClient: Sending syn with pos 84
Feb 22 2018 09:38:40.291Z DEBUG HandshakeClient: Sending syn with pos 196
Feb 22 2018 09:38:41.291Z DEBUG HandshakeClient: Sending syn with pos 155
Feb 22 2018 09:38:42.292Z DEBUG HandshakeClient: Sending syn with pos 193
Feb 22 2018 09:38:43.292Z DEBUG HandshakeClient: Sending syn with pos 142
Feb 22 2018 09:38:44.293Z DEBUG HandshakeClient: Sending syn with pos 151
Feb 22 2018 09:38:44.773Z ERROR attach connector: Connection not found error for id:260d8d203058a36a7c9ce9c726cf83313965982d6720761eb8721231b6b47d62: context deadline exceeded

looks like the handshake client sort of aborted?

tether debug of this failed container createFileContainer-6aaf58b7b97f also shows an ssh handshake failure:

Feb 22 2018 09:37:56.316Z DEBUG Passing config to Attach
Feb 22 2018 09:37:56.317Z DEBUG [BEGIN]  [main.(*attachServerSSH).Reload:110] attach reload
Feb 22 2018 09:37:56.318Z DEBUG [BEGIN]  [main.(*attachServerSSH).start:190] start attach server
Feb 22 2018 09:37:56.318Z DEBUG Start called while enabled, reloading
Feb 22 2018 09:38:46.957Z DEBUG [BEGIN]  [vic/lib/tether.ReloadConfig:50]
Feb 22 2018 09:38:46.961Z INFO  Reloading tether configuration
Feb 22 2018 09:38:46.964Z DEBUG [BEGIN]  [vic/lib/tether.(*tether).Reload:614]
Feb 22 2018 09:38:46.967Z DEBUG [ END ]  [vic/lib/tether.(*tether).Reload:614] [3.298988ms] 
Feb 22 2018 09:38:46.967Z DEBUG [ END ]  [vic/lib/tether.ReloadConfig:50] [10.384525ms] 
Feb 22 2018 09:40:02.776Z DEBUG Closing the RawConn (/dev/ttyS0:/dev/ttyS0)
Feb 22 2018 09:40:02.785Z DEBUG Pushing EOF to any blocked readers on the raw connection (/dev/ttyS0:/dev/ttyS0)
Feb 22 2018 09:40:02.790Z ERROR failed to establish ssh handshake: ssh: overflow reading version string

Suites.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-2576-container-logs-before-reboot.zip
Uploading Suites.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-2576-container-logs.zip…
The failure is before reboot.

VCH vmware.log and tether.debug bundle:
VCH-0-2576.zip
nightly test log:
Suites.Group5-Functional-Tests.5-22-NFS-Volume.zip

@matthewavery
Copy link
Contributor

I will be referencing this issue in my current exec related PR. This is not hard to force with enough concurrent exec calls against a container. My PR should help immensely with one off execs like these. It does not fix this issue completely though. more work will need to be done, I believe we will see this issue far far less in the future.

@AngieCris
Copy link
Contributor

AngieCris commented Feb 26, 2018

Similar behavior is seen for non-exec docker operations, not sure if it's related? @matthewavery Or should I move this comment to some other ticket

Nightly 02/24/18
VC 6.0
Test suite 5-22-NFS-Volume
Test case Docker read and remove file

docker command:
'docker -H 10.161.136.185:2376 --tls run -v nfs-volume_VCH-0-7068:/mydata busybox sh -c "cat mydata/test_nfs_file.txt
output:
docker: Error response from daemon: Server error from portlayer: unable to wait for process launch status: Post https://10.161.132.139/sdk: context deadline exceeded.

From tether debug log of the appliance VCH, after attach connector found the connection and during handshake client sending info, attach connector first failed with containerGetStderrNotFound, then stdout connector failed:

...
Feb 25 2018 04:28:26.660Z DEBUG HandshakeClient: Sending syn with pos 140
Feb 25 2018 04:28:26.662Z DEBUG [0xc4203d5b60] write "Feb 25 2018 04:28:26.660Z DEBUG HandshakeClient: Sending syn with pos 140\n" to 2 writers (err: <nil>)
Feb 25 2018 04:28:26.914Z ERROR container attach: stderr (5b062ea616a1ca4ab95c3f986b4d543e964eb06ea6d062323dca8805c20df804): Server error from portlayer: [GET /interaction/{id}/stderr][404] containerGetStderrNotFound 
Feb 25 2018 04:28:26.914Z DEBUG * stdout attach end Sun Feb 25 04:28:26 UTC 2018
Feb 25 2018 04:28:26.914Z ERROR container attach: stdout (5b062ea616a1ca4ab95c3f986b4d543e964eb06ea6d062323dca8805c20df804): No such interaction connection for container: 5b062ea616a1ca4ab95c3f986b4d543e964eb06ea6d062323dca8805c20df804

PL log snippet at the same time:

Feb 25 2018 04:28:25.659Z DEBUG HandshakeClient: Sending syn with pos 106
Feb 25 2018 04:28:26.660Z DEBUG HandshakeClient: Sending syn with pos 140
Feb 25 2018 04:28:26.913Z ERROR attach connector: Connection not found error for id:5b062ea616a1ca4ab95c3f986b4d543e964eb06ea6d062323dca8805c20df804: context deadline exceeded

container logs before reboot:
Suites.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-7068-container-logs-before-reboot.zip
after reboot:
Suites.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-7068-container-logs.zip
appliance logs:
VCH-0-7068.zip
nightly output:
Suites.Group5-Functional-Tests.5-22-NFS-Volume.zip

@hickeng
Copy link
Member

hickeng commented Feb 28, 2018

@matthewavery The failure @AngieCris referenced in the prior comment isn't related to exec.
We have the following in the container tether.debug - failing container is epic_hodgkin-5b062ea616a1:

Feb 25 2018 04:27:46.306Z DEBUG [BEGIN]  [vic/lib/tether.(*BaseOperations).MountTarget:888] Mounting nfs://10.161.157.20/store/volumes/nfs-volume_VCH-0-7068 on /mydata
Feb 25 2018 04:32:03.860Z ERROR mounting nfs://10.161.157.20/store/volumes/nfs-volume_VCH-0-7068 on /mydata failed: connection timed out
Feb 25 2018 04:32:03.871Z DEBUG [ END ]  [vic/lib/tether.(*BaseOperations).MountTarget:888] [4m17.564558873s] Mounting nfs://10.161.157.20/store/volumes/nfs-volume_VCH-0-7068 on /mydata
Feb 25 2018 04:32:03.878Z ERROR connection timed out

@anchal-agrawal Is this the same issue you were chasing, inherited from @matthewavery ?

@andrewtchin
Copy link
Contributor

@andrewtchin
Copy link
Contributor

VIC version:

V1.4.0-DEV-17572-42375AA

Deployment details:

Nightly 6.5

Logs:

Running command 'docker -H 10.160.210.55:2376 --tls start 525b431776ab2f1e6c67aa9a9c56321cac13cbd815101c3b16ddad7db432c30e 2>&1'.
${output} = Error response from daemon: Server error from portlayer: unable to wait for process launch status: Post https://10.160.212.143/sdk: context deadline exceeded Error: failed to start containers: 525b431...

kind_franklin-525b431776ab tether.debug

Mar 16 2018 11:18:39.898Z DEBUG [BEGIN]  [vic/lib/tether.(*BaseOperations).MountTarget:930] Mounting nfs://10.160.221.3/store/volumes/nfs-volume_VCH-0-3546 on /mydata
Mar 16 2018 11:18:39.907Z DEBUG Inspecting children with status change
Mar 16 2018 11:18:39.908Z DEBUG No more child processes to reap
Mar 16 2018 11:22:57.380Z ERROR mounting nfs://10.160.221.3/store/volumes/nfs-volume_VCH-0-3546 on /mydata failed: connection timed out
Mar 16 2018 11:22:57.388Z DEBUG [ END ]  [vic/lib/tether.(*BaseOperations).MountTarget:930] [4m17.490092862s] Mounting nfs://10.160.221.3/store/volumes/nfs-volume_VCH-0-3546 on /mydata
Mar 16 2018 11:22:57.397Z ERROR connection timed out

@andrewtchin
Copy link
Contributor

VIC version:

v1.4.0-dev-17605-6144b31

Deployment details:

Nightly 6.7 2018-03-20
Kill NFS Server

Logs:

Running command 'docker -H 10.193.52.242:2376 --tls run -v nfs-volume_VCH-0-2514:/mydata busybox sh -c "tail -5 /mydata/test_nfs_kill.txt" 2>&1'.
${rc} = 125
${tailOutput} = docker: Error response from daemon: Conflict error from portlayer: device nfs-volume_VCH-0-2514 in use.

https://storage.cloud.google.com/vic-ci-logs/vic_nightly_logs_2018-03-20-05-11-24.zip?authuser=1

@hickeng
Copy link
Member

hickeng commented Apr 6, 2018

The prior comment is not related to NFS timeouts, but to a different issue that's tracked in #7621.

@mdubya66
Copy link
Contributor

per the last comment, i'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/portlayer kind/defect Behavior that is inconsistent with what's intended priority/p0 source/scenario Found via a scenario failure team/container team/foundation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants