Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New crc release start failed on RHEL and MacOS where previous release deployed #815

Closed
morningspace opened this issue Nov 18, 2019 · 24 comments

Comments

@morningspace
Copy link

morningspace commented Nov 18, 2019

General information

  • OS: Linux / macOS
  • Hypervisor: KVM / hyperkit
  • Did you run crc setup before starting it (Yes)?

CRC version

$ crc version
crc version: 1.1.0+95966a9
OpenShift version: 4.2.2 (embedded in binary)

CRC status

# For RHEL
$ crc status
Machine 'crc' does not exist. Use 'crc start' to create it.

# For Mac
$ crc status
ERRO error: stat /Users/morningspace/.crc/machines/crc/kubeconfig: no such file or directory
 - exit status 1

CRC config

# Put the output of `crc config view`
Nothing returned

Host Operating System

# Put the output of `cat /etc/os-release` in case of Linux
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.6:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.6"

# Put the output of `sw_vers` in case of Mac
$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.1
BuildVersion:	19B88

Steps to reproduce

  1. Both the MacOS and RHEL machines have been used to successfully deploy a previous release of crc (1.0.0).
  2. After I download the new release (1.1.0), and run crc setup, then crc start:
  • On RHEL, it failed at domain 'crc' already exists with uuid xxxxx, e.g.:
INFO Extracting bundle: crc_libvirt_4.2.2.crcbundle ...
INFO Creating CodeReady Containers VM for OpenShift 4.2.2...
ERRO Error creating host: Error creating the VM: Error creating machine: Error in driver during machine creation: virError(Code=9, Domain=20, Message='operation failed: domain 'crc' already exists with uuid 84b5685a-03bd-4a27-9d44-266d8f2a9272')
  • On MacOS, it keeps hanging at INFO Creating CodeReady Containers VM for OpenShift 4.2.2... for ever. I've been waiting for a few hours, before press Ctrl+C. e.g.:
$ crc start -p ~/.crc/pull-secret.txt
INFO Checking if running as non-root
INFO Checking if oc binary is cached
INFO Checking if HyperKit is installed
INFO Checking if crc-driver-hyperkit is installed
INFO Checking file permissions for /etc/resolver/testing
INFO Checking file permissions for /etc/hosts
INFO Extracting bundle: crc_hyperkit_4.2.2.crcbundle ...
INFO Creating CodeReady Containers VM for OpenShift 4.2.2...
  1. Run crc stop, crc delete, crc status after start failed.
  • On RHEL:
$ crc stop
Machine 'crc' does not exist. Use 'crc start' to create it.
$ crc delete
Machine 'crc' does not exist. Use 'crc start' to create it.
$ crc status
Machine 'crc' does not exist. Use 'crc start' to create it.
  • On MacOS:
$ crc stop
Stopping the OpenShift cluster, this may take a few minutes...
ERRO Machine "crc" is already stopped.
$ crc delete
Do you want to delete the OpenShift cluster? [y/N]: y
Deleted the OpenShift cluster
$ crc status
Machine 'crc' does not exist. Use 'crc start' to create it.
  1. The same issue happened even I remove ~/.crc.
@morningspace morningspace changed the title new crc release starts failed on RHEL and MacOS machines where previous release deployed New crc release starts failed on RHEL and MacOS machines where previous release deployed Nov 18, 2019
@morningspace morningspace changed the title New crc release starts failed on RHEL and MacOS machines where previous release deployed New crc release start failed on RHEL and MacOS machines where previous release deployed Nov 18, 2019
@morningspace morningspace changed the title New crc release start failed on RHEL and MacOS machines where previous release deployed New crc release start failed on RHEL and MacOS where previous release deployed Nov 18, 2019
@gbraad
Copy link
Contributor

gbraad commented Nov 18, 2019 via email

@morningspace
Copy link
Author

morningspace commented Nov 18, 2019

Hmmm... so, it's for RHEL, how about MacOS...

I tried virsh undefine crc, but got the following error:

error: failed to get domain 'crc'
error: Domain not found: no domain with matching name 'crc'

But, after that, when I run crc start again, it says:

INFO Creating CodeReady Containers VM for OpenShift 4.2.2...
ERRO Error creating host: Error creating the VM: Error creating machine: Error in driver during machine creation: virError(Code=9, Domain=20, Message='operation failed: domain 'crc' already exists with uuid 84b5685a-03bd-4a27-9d44-266d8f2a9272')

It makes me confused. I guess I can revert my machine to a clean state (since it's a VM), so as to avoid the error. But I'd like to know whether there's way to fix it before I revert it.

@gbraad

@gbraad
Copy link
Contributor

gbraad commented Nov 18, 2019 via email

@morningspace
Copy link
Author

Thanks @gbraad, I have successfully launched crc on RHEL based on your comment above.

But for MacOS, it still failed, any ideas?

$ crc start -p ~/pull-secret.txt
INFO Checking if running as non-root
INFO Checking if oc binary is cached
INFO Checking if HyperKit is installed
INFO Checking if crc-driver-hyperkit is installed
INFO Checking file permissions for /etc/resolver/testing
INFO Checking file permissions for /etc/hosts
INFO Starting CodeReady Containers VM for OpenShift 4.2.2...
ERRO Failed to connect to the CRC VM with SSH
$ crc status
ERRO error: stat /Users/moyingbj/.crc/machines/crc/kubeconfig: no such file or directory
 - exit status 1
$ ls ~/.crc/machines/crc/
config.json	console-ring	crc.disk	hyperkit.json	hyperkit.pid	tty

@gbraad
Copy link
Contributor

gbraad commented Nov 19, 2019 via email

@praveenkumar
Copy link
Member

@morningspace Can you share you mac details like sw_vers output ? Also can you please run the command crc start -p ~/pull-secret.txt --log-level=debug and paste that so we get to know what is causing the issue?

@cfergeau
Copy link
Contributor

This looks like a partially created crc instance, so I'd start with crc delete && crc start ... to recreate it from scratch.

@morningspace
Copy link
Author

@cfergeau I did run delete before start, just it did not work...
@praveenkumar Sure, I rerun it with debug log enabled as below. It appears the ssh to core@192.168.64.56 keeps failing:

...
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.14 HWAddress:ee:84:2a:5c:9b:19 ID:1,ee:84:2a:5c:9b:19 Lease:0x5d95b294}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.13 HWAddress:f6:b:e5:1e:db:7 ID:1,f6:b:e5:1e:db:7 Lease:0x5d95a622}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.12 HWAddress:f2:da:92:b6:3f:c5 ID:1,f2:da:92:b6:3f:c5 Lease:0x5d959fa7}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.11 HWAddress:72:8e:fa:c6:ee:73 ID:1,72:8e:fa:c6:ee:73 Lease:0x5d94a49f}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.10 HWAddress:f2:c3:89:8d:f1:cb ID:1,f2:c3:89:8d:f1:cb Lease:0x5d94a07d}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.9 HWAddress:f2:1d:c5:bb:27:94 ID:1,f2:1d:c5:bb:27:94 Lease:0x5d949864}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.8 HWAddress:1a:c0:89:4f:0:eb ID:1,1a:c0:89:4f:0:eb Lease:0x5d949255}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.7 HWAddress:ee:c3:da:4b:86:cc ID:1,ee:c3:da:4b:86:cc Lease:0x5d946617}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.6 HWAddress:9a:31:d3:63:82:8e ID:1,9a:31:d3:63:82:8e Lease:0x5d945f9e}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.5 HWAddress:5e:1a:a3:d6:73:3a ID:1,5e:1a:a3:d6:73:3a Lease:0x5d943f25}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.4 HWAddress:6e:2:41:40:11:78 ID:1,6e:2:41:40:11:78 Lease:0x5d943a38}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.3 HWAddress:42:57:bd:c6:43:71 ID:1,42:57:bd:c6:43:71 Lease:0x5d93ea6e}
(crc) DBG | dhcp entry: {Name:crc-vsqrt-master-0 IPAddress:192.168.64.2 HWAddress:22:63:9:7a:6d:77 ID:1,22:63:9:7a:6d:77 Lease:0x5d936c9a}
(crc) DBG | error: Temporary Error: could not find an IP address for 46:ef:ab:8f:52:53 - sleeping 2s
(crc) DBG | retry loop 5
(crc) DBG | exe=/Users/moyingbj/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 91994
(crc) DBG | Searching for 46:ef:ab:8f:52:53 in /var/db/dhcpd_leases ...
(crc) DBG | Found 55 entries in /var/db/dhcpd_leases!
(crc) DBG | dhcp entry: {Name:crc-shdl4-master-0 IPAddress:192.168.64.56 HWAddress:46:ef:ab:8f:52:53 ID:1,46:ef:ab:8f:52:53 Lease:0x5dd51d14}
(crc) DBG | Found match: 46:ef:ab:8f:52:53
(crc) DBG | IP: 192.168.64.56
(crc) Calling .GetConfigRaw
(crc) Calling .DriverName
(crc) Calling .DriverName
Waiting for machine to be running, this may take a few minutes...
(crc) Calling .GetState
(crc) DBG | exe=/Users/moyingbj/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 91994
Machine is up and running!
Machine successfully created
Created /Users/moyingbj/.crc/machines/crc/.crc-exist
(crc) Calling .GetState
(crc) DBG | exe=/Users/moyingbj/.crc/bin/crc-driver-hyperkit uid=0
(crc) DBG | hyperkit pid from json: 91994
Found binary path at /Users/moyingbj/.crc/bin/crc-driver-hyperkit
Launching plugin server for driver hyperkit
Plugin server listening at address 127.0.0.1:50219
() Calling .GetVersion
Using API Version  1
() Calling .SetConfigRaw
() Calling .GetMachineName
DEBU Waiting until ssh is available
(crc) Calling .GetSSHHostname
(crc) Calling .GetSSHPort
(crc) Calling .GetSSHKeyPath
(crc) Calling .GetSSHKeyPath
(crc) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc (-r--------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null core@192.168.64.56 -o IdentitiesOnly=yes -i /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
exit 0
SSH cmd err, output: exit status 255:
DEBU error: Temporary Error: ssh command error:
command : exit 0
err     : exit status 255
output  :  - sleeping 1s
DEBU retry loop 1
(crc) Calling .GetSSHHostname
(crc) Calling .GetSSHPort
(crc) Calling .GetSSHKeyPath
(crc) Calling .GetSSHKeyPath
(crc) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc (-r--------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null core@192.168.64.56 -o IdentitiesOnly=yes -i /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
exit 0
SSH cmd err, output: exit status 255:
DEBU error: Temporary Error: ssh command error:
command : exit 0
err     : exit status 255
output  :  - sleeping 1s
DEBU retry loop 2
(crc) Calling .GetSSHHostname
(crc) Calling .GetSSHPort
(crc) Calling .GetSSHKeyPath
(crc) Calling .GetSSHKeyPath
(crc) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc (-r--------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null core@192.168.64.56 -o IdentitiesOnly=yes -i /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
exit 0
SSH cmd err, output: exit status 255:
DEBU error: Temporary Error: ssh command error:
command : exit 0
err     : exit status 255
output  :  - sleeping 1s
DEBU retry loop 3
...

@cfergeau
Copy link
Contributor

Yup, if there are no more logs, it looks like ssh connection to 192.168.64.56 (VM IP) fails.

@gbraad
Copy link
Contributor

gbraad commented Nov 19, 2019 via email

@morningspace
Copy link
Author

@gbraad I tried, but no luck:

$ ssh -i /Users/moyingbj/.crc/cache/crc_hyperkit_4.2.2/id_rsa_crc core@192.168.64.56
ssh: connect to host 192.168.64.56 port 22: Operation timed out

A couple of interesting things I noticed, and hope that helps to debug the issue.

Before the ssh timeout error, from the log, it seems crc was trying to lookup the IP for the VM via dhcp. It got the result after a few retries, e.g.

(crc) DBG | dhcp entry: {Name:crc-shdl4-master-0 IPAddress:192.168.64.57 HWAddress:8a:5d:af:b4:a9:23 ID:1,8a:5d:af:b4:a9:23 Lease:0x5dd717f3}
(crc) DBG | Found match: 8a:5d:af:b4:a9:23
(crc) DBG | IP: 192.168.64.57

Suppose that's the VM IP, 192.168.64.57 in this case. However, when I check /etc/hosts, the IP is not updated accordingly. BTW: The VM IP seems to be auto-increased every time when I launch crc which I did not realize before.

192.168.64.52 api.crc.testing oauth-openshift.apps-crc.testing

And, the /etc/resolver/testing:

port 53
nameserver 192.168.64.52
search_order 1

Here, 192.168.64.52 is the old one. Assume this needs to be updated to 192.168.64.57.

Then, the interesting thing is, if I try to modify /etc/hosts manually and save, I can see an Error like this: E828: Cannot open undo file for writing: /private/etc/.hosts.un~. This is true even I've confirmed both the file mode and owner are correct.

$ ls -l /etc/hosts
-rw-r--r--  1 moyingbj  wheel  2094 Nov 21 08:33 /etc/hosts

This keeps happening until I change the owner of the parent folder /private/etc to be moyingbj.

$ ls -l /private/
total 0
drwxr-xr-x  128 moyingbj  wheel  4096 Nov 21 08:40 etc
...

But even with that, it looks still I cannot get the crc up. Also, by using the new VM IP, neither I can ping or ssh to it:

$ ping 192.168.64.57
PING 192.168.64.57 (192.168.64.57): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
^C
--- 192.168.64.57 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss

My previous log pasted above was truncated for its first half part due to too many ssh retries which exceeded the terminal buffer, here's a complete one with less retries: crc-start.log

Also to note, I just upgrade my MacOS to Catalina v10.15.1. And, before that, I use High Sierra v10.13.4, which never see such an issue.

@cfergeau
Copy link
Contributor

If you crc delete/crc start, then yes, the VM ip will change. At the beginning of the previous comment, you try to ssh to 192.168.64.56, but then you say the VM IP is .57, so a failure seems normal. Your permissions issues are not when trying to modify /etc/hosts, but when your editor tries to create a temporary file in the same directory. This is expected to fail.
What is not expected though is that /etc/resolver/testing and /etc/hosts are not updated.

@cfergeau
Copy link
Contributor

My previous log pasted above was truncated for its first half part due to too many ssh retries which exceeded the terminal buffer, here's a complete one with less retries: crc-start.log

The log you uploaded seems truncated too? It ends with ^C and is not showing the messages which would be shown after a successful crc start.

@morningspace
Copy link
Author

@cfergeau

At the beginning of the previous comment, you try to ssh to 192.168.64.56

Yep, it's for the run prior to the IP changed to 57, the log that I pasted here is the next run which the IP was increased to 57. But no matter its 56 or 57, all the same.

What is not expected though is that /etc/resolver/testing and /etc/hosts are not updated.

Exactly, that's what I'm confused, and probably the cause to the issue.

The log you uploaded seems truncated too?

No, I just stop it by manually pressing Ctrl + C, after a few retries. Actually, crc keeps retry until failed at last.

@praveenkumar
Copy link
Member

praveenkumar commented Nov 21, 2019

Hi @morningspace Can you try following on that mac and let us know if still the same issue?

$ crc delete
$ cd ~/.crc/bin
$ rm hyperkit
$ curl -L -O https://686-55985023-gh.circle-artifacts.com/0/hyperkit
$ chmod +x hyperkit
$ sudo chown root:wheel hyperkit 
$ sudo chmod u+s hyperkit

$ ./hyperkit version
hyperkit: v0.20190802-3-g3b296c

Homepage: https://github.com/docker/hyperkit
License: BSD

$ crc start --log-level debug

@praveenkumar
Copy link
Member

praveenkumar commented Nov 25, 2019

@morningspace I upgraded to Catalina and didn't see any issue to run the CRC, Are you able to check with your colleagues if they are facing the same?

@zeenix
Copy link
Contributor

zeenix commented Nov 25, 2019

@morningspace Same here as @praveenkumar. I am on Catalina but once I work around #836 , everything works fine and cluster comes up.

@morningspace
Copy link
Author

morningspace commented Nov 26, 2019

@praveenkumar It's good to know that's not reproducible on your side. I am asking my colleague today to have a try on his Catalina Mac, and will cycle back when done.

@zeenix I checked #836, and tried the workaround, but didn't see the warning dialog. It looks my Mac is in a very special situation, which is good to others but bad to me :-)

@praveenkumar
Copy link
Member

I am asking my colleague today to have a try on his Catalina Mac, and will cycle back when done.

@morningspace Thanks, we are waiting for the feedback on this one, do let us know.

@morningspace
Copy link
Author

@praveenkumar , sorry for my late response!

My colleague did try on his Mac a few days ago and cannot reproduce the issue as well... So, I assume that's an issue particular to my Mac if there's no other people reporting similar issue on Catalina, neither new installation nor upgrade...

With that, I'm happy to close it for now, and keep eyes on it. Thanks a lot for taking time to look into it!

@renegadeandy
Copy link

I am getting the same issue here. Exactly as per @morningspace

@renegadeandy
Copy link

It turned out to be because of a VPN "Cisco Anyconnect" - when this was running, I got the same behaviour as you. When this was not running and closed, crc behaved as expected.

@morningspace
Copy link
Author

Aha, that sounds a reasonable cause... I've taken a new MacBook and deployed a new CRC, so that's never happened... But, IIRC, I was seeing the same issue happened when I was in office where I don't need Cisco AnyConnect.

@lgc0313
Copy link

lgc0313 commented Feb 6, 2020

I solved it in RHEL.

virsh list --all
virsh shutdown <name > or virsh destory <name>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants