Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: autohttps cert acquisition stability fixed #1719

Merged

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Jul 16, 2020

Sometimes autohttps cert acquisition failed, and it was because the
challenge was initiated from a not-k8s-ready pod due to a
readinessProbe that would require to get a successfull response before
it marked the pod as ready.

See: traefik/traefik#7033 (comment)

Closes #1716 - @AndrewSav provided very useful guidance making me realize this, and I'm truly thankful this help - it made my day! Thank you!!! 🎉 ❤️

Sometimes autohttps cert acquisition failed, and it was because the
challenge was initiated from a not-k8s-ready pod due to a
readinessProbe that would require to get a successfull response before
it marked the pod as ready.

See: traefik/traefik#7033 (comment)
@consideRatio consideRatio merged commit a323fc3 into jupyterhub:master Jul 16, 2020
@zhuzeyu22
Copy link

I think maybe some bug in here ( letsencrypt )
I always get the log

time="2021-11-25T06:35:26Z" level=debug msg="legolog: [INFO] [hub.jupyterhub-system.svc.cluster.local] acme: Obtaining bundled SAN certificate"
time="2021-11-25T06:35:27Z" level=error msg="Unable to obtain ACME certificate for domains "hub.jupyterhub-system.svc.cluster.local" : unable to generate a certificate for the domains [hub.jupyterhub-system.svc.cluster.local]: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/new-order :: urn:ietf:params:acme:error:rejectedIdentifier :: Error creating new order :: Cannot issue for "hub.jupyterhub-system.svc.cluster.local": Domain name does not end with a valid public suffix (TLD)" providerName=default.acme
time="2021-11-25T06:45:24Z" level=warning msg="A new release has been found: 2.5.4. Please consider updating."

way not use cert-manager instead letsencrypt,
letsencrypt is so clumsy I think.

@consideRatio
Copy link
Member Author

You must use a public domain, not a local one.

When you get a cert from letsencrypt, they need to verify you are in control of that public domain, which means they try to reach it from the public internet. Reaching hub.jupyterhub-system.svc.cluster.local is not possible. You need something lile my-domain.com instead.

Now this isnt a bug, and cert manager would do the same. Please refer to z2jh.jupyter.org for details, and discourse.jupyter.org for help on non-bugs.

@zhuzeyu22
Copy link

zhuzeyu22 commented Nov 25, 2021

I don't think so , I change my config like this:

hosts:
      - "*.jupyterhub-system.svc.cluster.local"
      - "*.*"
      - "*"

First , The JH in my native clustor , So I needn't a public domain .
And the letsencrypt support offline nslookup form DNS ,
I can control the local DNS , It's seem useful.

Then, I got a new error.


time="2021-11-25T09:27:21Z" level=debug msg="Creating TCP server 0 at jupyterhub-ssh:22" serviceName=ssh-service serverName=0 entryPointName=ssh-entrypoint routerName=ssh-router@file
time="2021-11-25T09:27:21Z" level=debug msg="Adding route * on TCP" entryPointName=ssh-entrypoint routerName=ssh-router@file
time="2021-11-25T09:31:21Z" level=debug msg="Handling connection from 127.0.0.1:50420"
time="2021-11-25T09:31:21Z" level=debug msg="Handling connection from 127.0.0.1:50428"
time="2021-11-25T09:31:21Z" level=debug msg="Handling connection from 127.0.0.1:50432"
time="2021-11-25T09:31:52Z" level=error msg="Error while connection to backend: dial tcp 172.19.96.183:22: connect: connection timed out"
time="2021-11-25T09:31:52Z" level=error msg="Error while connection to backend: dial tcp 172.19.96.183:22: connect: connection timed out"
time="2021-11-25T09:31:52Z" level=error msg="Error while connection to backend: dial tcp 172.19.96.183:22: connect: connection timed out"
time="2021-11-25T09:37:23Z" level=warning msg="A new release has been found: 2.5.4. Please consider updating."
time="2021-11-25T09:46:38Z" level=debug msg="Serving default certificate for request: \"\""
time="2021-11-25T09:46:38Z" level=debug msg="http: TLS handshake error from 127.0.0.1:59190: EOF"
time="2021-11-25T09:46:38Z" level=debug msg="Serving default certificate for request: \"\""
time="2021-11-25T09:46:38Z" level=debug msg="http: TLS handshake error from 127.0.0.1:59192: EOF"
time="2021-11-25T09:46:46Z" level=debug msg="Serving default certificate for request: \"\""
time="2021-11-25T09:46:46Z" level=debug msg="Serving default certificate for request: \"\""

It seem like traefik got some unknow error, I don't know , I just try to config something.

@consideRatio
Copy link
Member Author

@zhuzeyu22, write a proper post on discourse.jupyter.org, that isnt just an error message - then it wont be marked as spam as the last one was.

Please stop writing in github issues/prs for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI/CD: Autohttps reliability issues when acquiring certs with Pebble (unknown behavior with Let's Encrypt)
2 participants