nightly 5-4-High-Availability failure: ESX host is temporarily disconnected #7005

AngieCris · 2017-12-28T16:19:02Z

From portlayer log:

Dec 27 2017 22:37:52.768Z ERROR op=290.26: unexpected fault on task retry: &types.HostNotConnected{HostCommunication:types.HostCommunication{RuntimeFault:types.RuntimeFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}}
Dec 27 2017 22:37:52.792Z DEBUG op=290.26: Unhandled fault while attempting to destroy vm fd5e455f572cbd2c0f1a07198b94ab1d96d713431a9f2858efccf15c7a02f357: &types.HostNotConnected{HostCommunication:types.HostCommunication{RuntimeFault:types.RuntimeFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}}

5-4-High-Availability.zip

The text was updated successfully, but these errors were encountered:

hickeng · 2018-01-03T22:58:58Z

Possible candidate for adding a task retry - deleting a container should be viable regardless of whether a single host is down.

hickeng · 2018-01-04T15:44:45Z

#6370 is discussing which task errors should be retried at a low level in the tasks package (e.g. TaskInProgress, HostNotConnected, etc) and which should propagate up to the higher level logic for re-dispatch (ConcurrentModificationError)

hickeng · 2018-02-12T21:23:01Z

Talking to dbeard I'm told that we could well be seeing a delay in updating the host list for routing operations, however I don't see why we should settle for this given the HA remediation has already occurred.
@mhagen-vmware derek has suggested we grab a support bundle when we see this again and open an issue. I've opened bug2057397 with the details we have and the vpxd log fragment gathered as part of the VCH log bundle. Adding this to the log collection epic as it's another case where we'd like to be able to trigger specific log collection on a given symptom.

hickeng · 2018-02-14T02:22:42Z

dup of #6667. Updated that with the visible symptom observed here.

AngieCris added component/test Tests not covered by a more specific component label source/scenario Found via a scenario failure priority/p0 team/foundation labels Dec 28, 2017

hickeng mentioned this issue Feb 12, 2018

HA nightly: "host is temporarily disconnected" - fails to delete container, then fails to delete image in use by container. #6667

Closed

hickeng closed this as completed Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nightly 5-4-High-Availability failure: ESX host is temporarily disconnected #7005

nightly 5-4-High-Availability failure: ESX host is temporarily disconnected #7005

AngieCris commented Dec 28, 2017

hickeng commented Jan 3, 2018

hickeng commented Jan 4, 2018

hickeng commented Feb 12, 2018

hickeng commented Feb 14, 2018

nightly 5-4-High-Availability failure: ESX host is temporarily disconnected #7005

nightly 5-4-High-Availability failure: ESX host is temporarily disconnected #7005

Comments

AngieCris commented Dec 28, 2017

hickeng commented Jan 3, 2018

hickeng commented Jan 4, 2018

hickeng commented Feb 12, 2018

hickeng commented Feb 14, 2018