diff --git a/CHANGELOG.md b/CHANGELOG.md index b22ccce02..44eff9a59 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,362 @@ +## v0.6.0 - 2021-01-22 + +We're happy to announce the release of Lokomotive v0.6.0 (Flying Scotsman). + +This release includes several new features, many component updates, and a new platform - [Tinkerbell](https://tinkerbell.org/). + +### Changes in v0.6.0 + +#### Kubernetes updates + +- Update Kubernetes to v1.19.4 and AKS to v1.18.10 ([#1189](https://github.com/kinvolk/lokomotive/pull/1189)). + +#### Component updates + +- Update `external-dns` to v0.7.4 ([#1115](https://github.com/kinvolk/lokomotive/pull/1115)). +- Update `metrics-server` to v2.11.2 ([#1116](https://github.com/kinvolk/lokomotive/pull/1116)). +- Update `cluster-autoscaler` to version v1.1.0 ([#1137](https://github.com/kinvolk/lokomotive/pull/1137)). +- Update `rook` to v1.4.6 ([#1117](https://github.com/kinvolk/lokomotive/pull/1117)). +- Update `velero` to v1.5.2 ([#1131](https://github.com/kinvolk/lokomotive/pull/1131)). +- Update `openebs-operator` to v2.2.0 ([#1095](https://github.com/kinvolk/lokomotive/pull/1095)). +- Update `contour` to v1.10.0 ([#1170](https://github.com/kinvolk/lokomotive/pull/1170)). +- Update `experimental-linkerd` to stable-2.9.0 ([#1123](https://github.com/kinvolk/lokomotive/pull/1123)). +- Update `web-ui` to v0.1.3 ([#1237](https://github.com/kinvolk/lokomotive/pull/1237)). +- Update `prometheus-operator` to v0.43.2 ([#1162](https://github.com/kinvolk/lokomotive/pull/1162)). +- Update Calico to v3.17.0 ([#1251](https://github.com/kinvolk/lokomotive/pull/1251)). +- Update `aws-ebs-csi-driver` to v0.7.0 ([#1135](https://github.com/kinvolk/lokomotive/pull/1135)). +- Update `etcd` to 3.4.14 ([#1309](https://github.com/kinvolk/lokomotive/pull/1309)). + +#### Terraform provider updates + +- Update Terraform providers to their latest versions ([#1133](https://github.com/kinvolk/lokomotive/pull/1133)). + +#### New platforms + +- Add support for Tinkerbell platform ([#392](https://github.com/kinvolk/lokomotive/pull/392)). + +#### Bug fixes + +- Add new worker pools when TLS bootstrap is enabled without remaining stuck in the installation phase ([#1181](https://github.com/kinvolk/lokomotive/pull/1181)). +- `contour`: Consistently apply node affinity and tolerations to all scheduled workloads ([#1161](https://github.com/kinvolk/lokomotive/pull/1161)). +- Don't run control plane components as DaemonSets on single control plane node clusters ([#1193](https://github.com/kinvolk/lokomotive/pull/1193)). + +#### Features + +- Add Packet CCM to Packet platform ([#1155](https://github.com/kinvolk/lokomotive/pull/1155)). +- `contour`: Parameterize Envoy scraping interval ([#1229](https://github.com/kinvolk/lokomotive/pull/1229)). +- Expose `--conntrack-max-per-core` kube-proxy flag ([#1187](https://github.com/kinvolk/lokomotive/pull/1187)). +- Add `require_volume_annotation` for restic plugin ([#1132](https://github.com/kinvolk/lokomotive/pull/1132)). +- Print bootkube journal if cluster bootstrap fails ([#1166](https://github.com/kinvolk/lokomotive/pull/1166)). This makes cluster bootstrap problems easier to debug. +- `aws-ebs-csi-driver`: Add dynamic provisioning, resizing and snapshot options ([#1277](https://github.com/kinvolk/lokomotive/pull/1277)). Now the user has the ability to control the AWS EBS driver to enable or disable provisioning, resizing and snapshotting. + +#### Security enhancements + +- `calico-host-protection`: Add custom locked down PSP configuration ([#1274](https://github.com/kinvolk/lokomotive/pull/1274)). + +#### Documentation + +- Add `openebs-operator` update guide ([#1163](https://github.com/kinvolk/lokomotive/pull/1163)). +- Add `rook-ceph` update guide ([#1165](https://github.com/kinvolk/lokomotive/pull/1165)). + +#### Miscellaneous + +- Pull control plane images from Quay to avoid hitting Docker Hub pulling limits ([#1226](https://github.com/kinvolk/lokomotive/pull/1226)). +- Bootkube now waits for all control plane charts to converge before exiting, which should make the bootstrapping process more stable ([#1085](https://github.com/kinvolk/lokomotive/pull/1085)). +- Remove deprecated CoreOS mentions from AWS ([#1245](https://github.com/kinvolk/lokomotive/pull/1245)) and bare metal ([#1246](https://github.com/kinvolk/lokomotive/pull/1246)). +- Improve hardware reservations validation rules on Equinix Metal ([#1186](https://github.com/kinvolk/lokomotive/pull/1186)). + +### Updating from v0.5.0 + +#### Configuration syntax changes + +##### AWS + +Removed the undocumented `cluster.os_name` parameter, since Lokomotive supports Flatcar Container Linux only. + +##### Bare-metal + +The `cluster.os_channel` parameter got simplified by removing the `flatcar-` prefix. + +###### Old + +```hcl +os_channel = "flatcar-stable" +``` + +###### New + +```hcl +os_channel = "stable" +``` + +##### Velero + +Velero requires an explicit `provider` field to select the provider. +Example: + +```hcl +component `velero` { + provider = "openebs" + + openebs { + ... + } +} +``` + +#### Updating Prometheus Operator + +Due to a change in the upstream Helm chart, updating the Prometheus Operator component incurs down time. We do this before updating the cluster so no visibility is lost while the cluster update is happening. + +1. Patch the `PersistentVolume` created/used by the `prometheus-operator` component to `Retain` claim policy. + + ```bash + kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}') + + kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}') + ``` + + > **NOTE:** To execute the above command, the user must have a cluster wide permission. + +2. Uninstall the `prometheus-operator` release and delete the existing `PersistentVolumeClaim`, and verify `PersistentVolume` become `Released`. + + ```bash + lokoctl component delete prometheus-operator + ``` + + ```bash + kubectl delete pvc data-prometheus-prometheus-operator-prometheus-0 -n monitoring + kubectl delete pvc data-alertmanager-prometheus-operator-alertmanager-0 -n monitoring + ``` + +3. Remove current `spec.claimRef` values to change the PV's status from Released to Available. + + ```bash + kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}') + + kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}') + ``` + + > **NOTE:** To execute the above command, the user must have a cluster wide permission. + +4. Make sure that the `prometheus-operator`'s `storage_class` and `prometheus.storage_size` are unchanged during the upgrade process. + +5. Proceed to a fresh `prometheus-operator` component installation. The new release should now re-attach your previously released PV with its content. + + ``` + lokoctl component apply prometheus-operator + ``` + + > **NOTE:** Etcd dashboard will only start showing data after the cluster is updated. + +6. Delete the old kubelet service. + + ```bash + kubectl -n kube-system delete svc prometheus-operator-kubelet + ``` + +7. If monitoring was enabled for `rook`, `contour`, `metallb` components, make sure you update them as well after the cluster is updated. + +#### Cluster update steps + +> **NOTE:** Updating multiple Lokomotive versions at a time is not supported. If your cluster is running a version older than `v0.5.0`, update to `v0.5.0` first and only then proceed with the update to `v0.6.0`. + +Please perform the following manual steps in your cluster configuration directory. + +1. Download the release bundle. + + ```bash + curl -LO https://github.com/kinvolk/lokomotive/archive/v0.6.0.tar.gz + tar -xvzf v0.6.0.tar.gz + ``` + +2. Install the Packet CCM. + + If you are running Lokomotive on Equinix Metal (formerly Packet), then install Packet CCM. Export your Packet cluster's project ID and API Key. + + ```bash + export PACKET_AUTH_TOKEN="" + export PACKET_PROJECT_ID="" + + echo "apiKey: $PACKET_AUTH_TOKEN +projectID: $PACKET_PROJECT_ID" > /tmp/ccm-values.yaml + + helm install packet-ccm --namespace kube-system --values=/tmp/ccm-values.yaml ./lokomotive-0.6.0/assets/charts/control-plane/packet-ccm/ + ``` + +3. Update node config. + + On Equinix Metal (formerly Packet), this script shipped with the release tarball will add permanent MetalLB labels and kubelet config to use CCM. + + > **NOTE:** Please edit this script to disable updating certain nodes. Modify the `update_other_nodes` function as required. + + ```bash + UPDATE_BOOTSTRAP_COMPONENTS=false + ./lokomotive-0.6.0/scripts/update/0.5-0.6/update.sh $UPDATE_BOOTSTRAP_COMPONENTS + ``` + +4. If you're using the self-hosted kubelet, apply the `--cloud-provider` flag to it. + + > **NOTE:** If you're unsure you can run the command as it's harmless if you're not using the self-hosted kubelet. + + ```bash + kubectl -n kube-system get ds kubelet -o yaml | \ + sed '/client-ca-file.*/a \ \ \ \ \ \ \ \ \ \ --cloud-provider=external \\' | \ + kubectl apply -f - + ``` + +5. Export assets directory. + + ```bash + export ASSETS_DIR="assets" + ``` + +6. Remove BGP sessions from Terraform state. + + If you are running Lokomotive on Equinix Metal (formerly Packet), then run the following commands: + + ```bash + cd $ASSETS_DIR/terraform + terraform state rm $(terraform state list | grep packet_bgp_session.bgp) + cd - + ``` + +7. Remove old asset files. + + ```bash + rm -rf $ASSETS_DIR/cluster-assets + rm -rf $ASSETS_DIR/terraform-modules + ``` + +8. Update control plane. + + ```bash + lokoctl cluster apply --skip-components -v + ``` + + > **NOTE:** If the update process gets interrupted, rerun above command. + + > **NOTE:** If you are running self-hosted kubelet then append the above command with flag `--upgrade-kubelets`. + + The update process typically takes about 10 minutes. + After the update, running `lokoctl health` should result in an output similar to the following: + + ```bash + Node Ready Reason Message + + lokomotive-controller-0 True KubeletReady kubelet is posting ready status + lokomotive-1-worker-0 True KubeletReady kubelet is posting ready status + lokomotive-1-worker-1 True KubeletReady kubelet is posting ready status + lokomotive-1-worker-2 True KubeletReady kubelet is posting ready status + Name Status Message Error + + etcd-0 True {"health":"true"} + ``` + +9. Update the bootstrap components: kubelet and etcd. + + This script shipped with the release tarball will update all the nodes to run the latest kubelet and etcd. + + > **NOTE:** Please edit this script to disable updating certain nodes. Modify `update_other_nodes` function as required. + + ```bash + UPDATE_BOOTSTRAP_COMPONENTS=true + ./lokomotive-0.6.0/scripts/update/0.5-0.6/update.sh $UPDATE_BOOTSTRAP_COMPONENTS + ``` + +10. If you're using the self-hosted kubelet, reload its config. + + > **NOTE:** If you're unsure you can run the command as it's harmless if you're not using the self-hosted kubelet. + + ```bash + kubectl -n kube-system rollout restart ds kubelet + ``` + +#### Update Docker log settings + +We've added log rotation to the Docker daemon running on cluster nodes. However, this only takes effect in new nodes. For this to apply to existing cluster nodes, you need to manually configure each node. + +- Drain the node. + + This step ensures that you don't see any abrupt changes. Any workloads running on this node are evicted and scheduled to other nodes. The node is marked as unschedulable after running this command. + + ```bash + kubectl drain --ignore-daemonsets + ``` + +- SSH into the node and become root with `sudo -s`. +- Create the Docker config file: + + ```json + echo ' + { + "live-restore": true, + "log-opts": { + "max-size": "100m", + "max-file": "3" + } + } + ' | tee /etc/docker/daemon.json + ``` + +- Restart the Docker daemon: + + > **NOTE:** This will restart all the containers on the node, including the kubelet. This step cannot be part of the automatic update script because restarting the Docker daemon will also kill the update script pod. + + ```bash + systemctl restart docker + ``` + +- Make the node schedulable: + + ```bash + kubectl uncordon + ``` + +#### Updating Contour + +Manually update the CRDs before updating the component `contour`: + +```bash +kubectl apply -f https://raw.githubusercontent.com/kinvolk/lokomotive/v0.6.0/assets/charts/components/contour/crds/01-crds.yaml +``` + +Update the component: + +```bash +lokoctl component apply contour +``` + +#### Updating Velero + +Manually update the CRDs before updating the component `velero`: + +```bash +kubectl apply -f ./lokomotive-0.6.0/assets/charts/components/velero/crds/ +``` + +Update the component: + +```bash +lokoctl component apply velero +``` + +#### Updating openebs-operator + +Follow the [OpenEBS update guide](https://kinvolk.io/docs/lokomotive/0.6/how-to-guides/update-openebs/). + +#### Updating rook-ceph + +Follow the [Rook Ceph update guide](https://kinvolk.io/docs/lokomotive/0.6/how-to-guides/update-rook-ceph/). + +#### Updating other components + +Other components are safe to update by running the following command: + +```bash +lokoctl component apply +``` + ## v0.5.0 - 2020-10-27 We're happy to announce the release of Lokomotive v0.5.0 (Eurostar). @@ -163,9 +522,9 @@ taints = { } ``` -This release also changes the default `cluster.oidc.client_id` value from `gangway` to `clusterauth`. +This release also changes the default `cluster.oidc.client_id` value from `gangway` to `clusterauth`. -This setting must match `gangway.client_id` and `dex.static_client.id`. +This setting must match `gangway.client_id` and `dex.static_client.id`. If you use default settings for oidc you'll need to add `client_id = "gangway"` or change the `static_client.id` and `client_id` parameters for dex and gangway to `clusterauth` respectively. @@ -191,11 +550,11 @@ packet { #### Cluster update steps -Ensure your cluster is in a healthy state by running `lokoctl cluster apply` using the `v0.4.1` version. +Ensure your cluster is in a healthy state by running `lokoctl cluster apply` using the `v0.4.1` version. Updating multiple versions at a time is not supported so, if your cluster is older, update to `v0.4.1` and only then proceed with the update to `v0.5.0`. -Due to [Terraform](https://github.com/kinvolk/lokomotive/pull/824) and [Kubernetes](https://github.com/kinvolk/lokomotive/pull/1030) updates to v0.13+ and v1.19.3 respectively. +Due to [Terraform](https://github.com/kinvolk/lokomotive/pull/824) and [Kubernetes](https://github.com/kinvolk/lokomotive/pull/1030) updates to v0.13+ and v1.19.3 respectively. Some manual steps need to be performed when updating. In your cluster configuration directory, follow these steps: diff --git a/docs/how-to-guides/monitoring-with-prometheus-operator.md b/docs/how-to-guides/monitoring-with-prometheus-operator.md index 474da10a0..ff7a62b31 100644 --- a/docs/how-to-guides/monitoring-with-prometheus-operator.md +++ b/docs/how-to-guides/monitoring-with-prometheus-operator.md @@ -61,7 +61,7 @@ kubectl -n monitoring get pods Execute the following command to forward port `9090` locally to the Prometheus pod: ```bash -kubectl -n monitoring port-forward svc/prometheus-operator-prometheus 9090 +kubectl -n monitoring port-forward svc/prometheus-operator-kube-p-prometheus 9090 ``` Open the following URL: [http://localhost:9090](http://localhost:9090). @@ -91,7 +91,7 @@ Open the following URL: `https://prometheus..`. Execute the following command to forward port `9093` locally to the Alertmanager pod: ```bash -kubectl -n monitoring port-forward svc/prometheus-operator-alertmanager 9093 +kubectl -n monitoring port-forward svc/prometheus-operator-kube-p-alertmanager 9093 ``` Open the following URL: [http://localhost:9093](http://localhost:9093). @@ -100,16 +100,16 @@ Open the following URL: [http://localhost:9093](http://localhost:9093). #### Using port forward -Execute the following command to forward port `8080` locally to the Grafana dashboard pod on port `80`: +Obtain the password for the `admin` Grafana user by running the following command: ```bash -kubectl -n monitoring port-forward svc/prometheus-operator-grafana 8080:80 +kubectl -n monitoring get secret prometheus-operator-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echo ``` -Obtain the password for the `admin` Grafana user by running the following command: +Execute the following command to forward port `8080` locally to the Grafana dashboard pod on port `80`: ```bash -kubectl -n monitoring get secret prometheus-operator-grafana -o jsonpath='{.data.admin-password}' | base64 -d && echo +kubectl -n monitoring port-forward svc/prometheus-operator-grafana 8080:80 ``` Open the following URL: [http://localhost:8080](http://localhost:8080). Enter the username `admin` and password obtained from the previous step. diff --git a/docs/troubleshooting/re-register-nodes.md b/docs/troubleshooting/re-register-nodes.md new file mode 100644 index 000000000..0adefae7e --- /dev/null +++ b/docs/troubleshooting/re-register-nodes.md @@ -0,0 +1,34 @@ +--- +title: Reregistering a worker node +weight: 10 +--- + + +## Re-register nodes + +### Scenario + +On Equinix Metal (Packet) if a cluster is updated from `v0.5.0` to `v0.6.0`, it also installs Packet's Cloud Controller Manager (CCM). The nodes added before `v0.6.0` were configured with BGP using Terraform. From `v0.6.0` onwards they are configured to use CCM for BGP setup. During the update, the older nodes won't see BGP resource removal, but it is possible that someone can disable the BGP from the Equinix Metal console. To make the nodes resilient against such human errors, follow the next steps. + +### Steps + +> **NOTE:** SSH into a given node in a separate console upfront, before starting to follow steps. + +This step ensures that you don't see any abrupt changes. Any workloads running on this node are evicted and scheduled to other nodes. The node is marked as unschedulable after running this command. + +```bash +export nodename="" +kubectl drain --ignore-daemonsets $nodename +``` + +Delete the node object: + +```bash +kubectl delete node $nodename +``` + +SSH into the node: + +```bash +sudo systemctl restart kubelet +``` diff --git a/scripts/changelog.sh b/scripts/changelog.sh index 9d54bda92..46b98a9ad 100755 --- a/scripts/changelog.sh +++ b/scripts/changelog.sh @@ -25,11 +25,12 @@ if [ -z "${RANGE}" ]; then fi if [ ! -z "${GITHUB_TOKEN}" ]; then - GITHUB_AUTH="--header \"authorization: Bearer ${GITHUB_TOKEN}\"" + GITHUB_AUTH="authorization: Bearer ${GITHUB_TOKEN}" + HEADERS=(--header "${GITHUB_AUTH}") fi for pr in $(git log --pretty=%s --first-parent "${RANGE}" | egrep -o '#\w+' | tr -d '#'); do - body=$(curl -s ${GITHUB_AUTH} https://api.github.com/repos/kinvolk/lokomotive/pulls/"${pr}" | \ + body=$(curl -s "${HEADERS[@]}" https://api.github.com/repos/kinvolk/lokomotive/pulls/"${pr}" | \ jq -r '{title: .title, body: .body}') echo "-" \ diff --git a/scripts/update/0.5-0.6/cluster.sh b/scripts/update/0.5-0.6/cluster.sh new file mode 100644 index 000000000..9e3ad6a40 --- /dev/null +++ b/scripts/update/0.5-0.6/cluster.sh @@ -0,0 +1,169 @@ +#!/bin/bash + +# This script: +# 1) Appends the label `lokomotive.alpha.kinvolk.io/bgp-enabled=true` in the env file for the nodes +# running MetalLB. +# 2) Update the image tag of Kubelet. +# 3) Update etcd if it is a controller node. + +set -euo pipefail + +mode="${1}" +update_kubelet_etcd="${2}" + +readonly kubelet_env="/etc/kubernetes/kubelet.env" +kubelet_needs_restart=false +packet_environment=false + +function run_on_host() { + nsenter -a -t 1 /bin/sh -c "${1}" +} + +function is_packet_environment() { + if grep -i packet /run/metadata/flatcar > /dev/null; then + packet_environment=true + fi +} + +is_packet_environment + +function update_kubelet_version() { + readonly kubelet_version="v1.19.4" + + if grep "${kubelet_version}" "${kubelet_env}" >/dev/null; then + echo "Kubelet env var file ${kubelet_env} already updated, version ${kubelet_version} exists." + return + fi + + echo -e "\nUpdating Kubelet env file...\nOld Kubelet env file:\n" + cat "${kubelet_env}" + + # Update the kubelet image version. + sed "s|^KUBELET_IMAGE_TAG.*|KUBELET_IMAGE_TAG=${kubelet_version}|g" "${kubelet_env}" >/tmp/kubelet.env + + # This copy is needed because `sed -i` tries to create a new file, this changes the file inode and + # docker does not allow it. We save changes using `sed` to a temporary file and then overwrite + # contents of actual file from temporary file. + cat /tmp/kubelet.env >"${kubelet_env}" + + echo -e "\nNew Kubelet env file:\n" + cat "${kubelet_env}" + + kubelet_needs_restart=true +} + +function update_kubelet_labels() { + # Update the label only on MetalLB nodes. + if [ "${mode}" != "metallb" ]; then + echo "Nothing to do. Not a MetalLB node." + return + fi + + readonly metallb_label="lokomotive.alpha.kinvolk.io/bgp-enabled=true" + + if grep "${metallb_label}" "${kubelet_env}" >/dev/null; then + echo "Kubelet env var file ${kubelet_env} already updated, label ${metallb_label} exists." + return + fi + + label=$(grep ^NODE_LABELS "${kubelet_env}") + label_prefix="${label::-1}" + augmented_label="${label_prefix},${metallb_label}\"" + + echo -e "\nUpdating Kubelet env file...\nOld Kubelet env file:\n" + cat "${kubelet_env}" + + # Update the kubelet image version. + sed "s|^NODE_LABELS.*|${augmented_label}|g" "${kubelet_env}" >/tmp/kubelet.env + + cat /tmp/kubelet.env >"${kubelet_env}" + + echo -e "\nNew Kubelet env file:\n" + cat "${kubelet_env}" + + kubelet_needs_restart=true +} + +function update_kubelet_service_file() { + if ! "${packet_environment}"; then + echo "Nothing to do. Not a Packet node." + return + fi + + readonly kubeletsvcfile="/etc/systemd/system/kubelet.service" + readonly newline='--cloud-provider=external' + + if grep "cloud-provider=external" "${kubeletsvcfile}" >/dev/null; then + echo "Kubelet service file ${kubeletsvcfile} is already updated." + return + fi + + echo -e "\nUpdating Kubelet service file...\nOld Kubelet service file:\n" + cat "${kubeletsvcfile}" + + sed '/client-ca-file.*/a \ \ --cloud-provider=external \\' "${kubeletsvcfile}" > /tmp/kubeletsvcfile + cat /tmp/kubeletsvcfile > "${kubeletsvcfile}" + + echo -e "\nNew Kubelet service file:\n" + cat "${kubeletsvcfile}" + + kubelet_needs_restart=true +} + +function restart_host_kubelet() { + if ! "${kubelet_needs_restart}"; then + return + fi + + echo -e "\nRestarting Kubelet...\n" + run_on_host "systemctl daemon-reload && systemctl restart kubelet && systemctl status --no-pager kubelet" +} + +function update_etcd() { + if [ "${mode}" != "controller" ]; then + echo "Nothing to do. Not a controller node." + return + fi + + rkt_etcd_cfg="/etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf" + docker_etcd_cfg="/etc/kubernetes/etcd.env" + readonly etcd_version="v3.4.14" + + if [ -f "${rkt_etcd_cfg}" ]; then + cfg_file="${rkt_etcd_cfg}" + sed_cmd="sed 's|^Environment=\"ETCD_IMAGE_TAG.*|Environment=\"ETCD_IMAGE_TAG=${etcd_version}\"|g' ${cfg_file} > /tmp/etcd.env" + restart_etcd_command="systemctl is-active etcd-member && systemctl restart etcd-member && systemctl status --no-pager etcd-member" + + elif [ -f "${docker_etcd_cfg}" ]; then + cfg_file="${docker_etcd_cfg}" + sed_cmd="sed 's|^ETCD_IMAGE_TAG.*|ETCD_IMAGE_TAG=${etcd_version}|g' ${cfg_file} > /tmp/etcd.env" + restart_etcd_command="systemctl is-active etcd && systemctl restart etcd && systemctl status --no-pager etcd" + fi + + if grep "${etcd_version}" "${cfg_file}" >/dev/null; then + echo "etcd env var file ${cfg_file} is already updated." + return + fi + + echo -e "\nUpdating etcd file...\nOld etcd file:\n" + cat "${cfg_file}" + + eval "${sed_cmd}" + + cat /tmp/etcd.env >"${cfg_file}" + + echo -e "\nNew etcd file...\n" + cat "${cfg_file}" + + echo -e "\nRestarting etcd...\n" + run_on_host "${restart_etcd_command}" +} + +if "${update_kubelet_etcd}"; then + update_etcd + update_kubelet_version +fi + +update_kubelet_labels +update_kubelet_service_file +restart_host_kubelet diff --git a/scripts/update/0.5-0.6/update.sh b/scripts/update/0.5-0.6/update.sh new file mode 100755 index 000000000..862730d66 --- /dev/null +++ b/scripts/update/0.5-0.6/update.sh @@ -0,0 +1,101 @@ +#!/bin/bash + +set -euo pipefail + +update_kubelet_etcd=$1 + +readonly script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd -P) +readonly namespace="update-host-files" + +kubectl create ns "${namespace}" --dry-run=client -o yaml | kubectl apply -f - +kubectl create -n "${namespace}" cm script --from-file "${script_dir}"/cluster.sh --dry-run=client -o yaml | kubectl apply -f - + +function update_node_files() { + nodename=$1 + mode=$2 + + podname="uhf-$nodename-$RANDOM" + + echo " +apiVersion: v1 +kind: Pod +metadata: + labels: + run: ${podname} + name: ${podname} + namespace: ${namespace} +spec: + containers: + - image: registry.fedoraproject.org/fedora:32 + name: update-host-files + imagePullPolicy: IfNotPresent + securityContext: + privileged: true + args: + - sh + - -c + - bash /tmp/script/cluster.sh ${mode} ${update_kubelet_etcd} + volumeMounts: + - name: etc-kubernetes + mountPath: /etc/kubernetes/ + - name: script + mountPath: /tmp/script/ + - name: rkt-etcd + mountPath: /etc/systemd/system/etcd-member.service.d/ + - name: flatcar-metadata + mountPath: /run/metadata/flatcar + - name: kubelet-service + mountPath: /etc/systemd/system/kubelet.service + nodeName: ${nodename} + restartPolicy: Never + hostPID: true + volumes: + - name: etc-kubernetes + hostPath: + path: /etc/kubernetes/ + - name: script + configMap: + name: script + - name: rkt-etcd + hostPath: + path: /etc/systemd/system/etcd-member.service.d/ + - name: flatcar-metadata + hostPath: + path: /run/metadata/flatcar + - name: kubelet-service + hostPath: + path: /etc/systemd/system/kubelet.service +" | kubectl apply -f - + + echo -e "\n\nLogs: ${podname}\n\n" + + # Wait until pod exits. Show logs to the user. + while ! kubectl -n "${namespace}" logs -f "${podname}" 2>/dev/null; do + sleep 1 + done + + echo '-------------------------------------------------------------------------------------------' +} + +function update_metallb_nodes() { + for nodename in $(kubectl get nodes -l metallb.lokomotive.io/my-asn -ojsonpath='{.items[*].metadata.name}'); do + update_node_files "${nodename}" "metallb" + done +} + +function update_controller_nodes() { + for nodename in $(kubectl get nodes -l node.kubernetes.io/master -ojsonpath='{.items[*].metadata.name}'); do + update_node_files "${nodename}" "controller" + done +} + +function update_other_nodes() { + # Add an inequality to the following node selector to exclude a certain worker pool. + for nodename in $(kubectl get nodes -l node.kubernetes.io/master!="",metallb.lokomotive.io/my-asn!="65000" -ojsonpath='{.items[*].metadata.name}'); do + update_node_files "${nodename}" "general" + done +} + +update_metallb_nodes +update_controller_nodes +update_other_nodes