Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce experimental live mode [squashfs] #1248

Merged
merged 3 commits into from
Aug 16, 2024
Merged

Conversation

MichalHe
Copy link
Member

@MichalHe MichalHe commented Jun 2, 2024

This work is based on @bessonc's (cbesson) awesome livemode actors.

Description

This patch introduces a feature named live mode allowing to boot into a squashfs of the target userspace, reaching the target multi-user. Therefore, it is possible to interact with the upgrade environment while the upgrade is running in the background, or, debug crash of the service while having a fully booted system (although the system is relatively minimal). Reaching multi-user means that the environment could have network available, and the patch will set up network configurations for target userspace, however, it is a best-effort solution at the moment. The patch somehow simplifies development of such features, as one can boot into the squashfs, log in, check whether everything is as expected and if not the developer can manually play with the system recording his/hers modifications. Later, the developer just needs to reproduce these modifications in the code.

If the feature is not used/enabled, then leapp's behavior remains unchanged, although this patch refactors some of leapp's core actors (e.g., add_upgrade_boot_entry).

How to run:

Note that one needs to have squashfs-tools installed. Moreover, the feature is currently limited to x86_64 only and attempting to run it on a different architecture will be prevented by leapp.

LEAPP_UNSUPPORTED=1 LEAPP_DEVEL_ENABLE_LIVE_MODE=1 leapp upgrade --debug \
    --whitelist-experimental='live_image_generator' \
    --whitelist-experimental='live_mode_config_scanner' \
    --whitelist-experimental='live_mode_reporter' \
    --whitelist-experimental='prepare_live_image' \
    --whitelist-experimental='emit_livemode_requirements' \
    --whitelist-experimental='remove_live_image'

Tests coverage:

  • add_upgrade_boot_entry
  • live_image_generator
  • live_mode_config
  • live_mode_report
  • prepare_live_image
  • emit_livemode_userspace_requirements

Jira ref: RHEL-45280

@abadger
Copy link
Member

abadger commented Aug 13, 2024

I reviewed the whole PR yesterday and it looks great. The approach looks good and I didn't find anything that looks like it will introduce a bug. Everything I noted is a small thing and can be deferred until a later PR. I did find that my spotting of little things that could be changed went down in the second half of the PR, though, so it's possible I've missed things. I've asked @MichalHe to point out the places that I should take a harder look at (because those places will run when we are not in livemode) and I will take a second look at those.

@pirat89 pirat89 added the enhancement New feature or request label Aug 14, 2024
Comment on lines +326 to +340
[ -f "$NEWROOT$LEAPP_FAILED_FLAG_FILE" ] && {
echo >&2 "Found file $NEWROOT$LEAPP_FAILED_FLAG_FILE"
echo >&2 "Error: Leapp previously failed and cannot continue, returning back to emergency shell"
echo >&2 "Please file a support case with $NEWROOT/var/log/leapp/leapp-upgrade.log attached"
echo >&2 "To rerun the upgrade upon exiting the dracut shell remove the $NEWROOT$LEAPP_FAILED_FLAG_FILE file"
exit 1
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the shell script does not reflect some recent upstream changes requested by support. e.g. this should not block the upgrade (even when it will most likely lead to crash later). As this is experimental only, it's not a problem, but we should reflect the changes at least in a follow up PR

@pirat89
Copy link
Member

pirat89 commented Aug 15, 2024

/packit copr-build

The experimental "live mode" feature that allows booting into
a squashfs image of the target userspace and running leapp
via a service builds on several models added by this commit.
The most important role has the LiveModeConfig model, storing
user-defined configuration of the feature.

Jira ref: RHEL-45280
@MichalHe MichalHe marked this pull request as ready for review August 15, 2024 14:29
@Rezney
Copy link
Member

Rezney commented Aug 15, 2024

/packit copr-build

@Rezney
Copy link
Member

Rezney commented Aug 16, 2024

Great job! Once the last failing "unittest" is fixed we are ready to go.

@MichalHe MichalHe force-pushed the livemode branch 5 times, most recently from f6de260 to aaf7528 Compare August 16, 2024 10:26
@MichalHe MichalHe force-pushed the livemode branch 5 times, most recently from e5e045c to 86742d1 Compare August 16, 2024 11:12
Michal Hecko added 2 commits August 16, 2024 13:21
Modify core actors to support upgrades with "live mode". Whereas live
mode implies a new, separate, code path for generating the live image
initramfs, the changes introduced in add_upgrade_boot_entry actor
interfere deeply with the old implementation. Kernel cmdline arguments
for the created boot entry are now manipulated uniformly, avoiding ad-
hoc string formatting. It is also possible to remove kernel cmdline
args from the entry. Addition of arguments precedes removal, i.e., if
arg=value should be added and also removed, it will be removed. The
root cmdline parameter is modified separately, due to a bug in grubby.

Jira ref: RHEL-45280
Add actors that scan the new configuration file devel-livemode.ini,
informing the rest of the actor collective about the configuration.
Based on this configuration, additional packages are requested to be
installed into the target userspace. The target userspace is also
modified to contain services that execute leapp based on kernel cmdline.
For a full list of modifications, see models/livemode.py added in
a previous commit.

The feature can be enabled by setting LEAPP_UNSUPPORTED=1 together
with LEAPP_DEVEL_ENABLE_LIVE_MODE=1. Note, that the squashfs-tools
package must be installed (otherwise an error will be raised). The
live mode feature is currently tested only for x86_64, and, therefore,
attempting to use this feature on a different architecture will be
prohibited by the implementation.

Jira ref: RHEL-45280
@pirat89
Copy link
Member

pirat89 commented Aug 16, 2024

I am trying to get the tests fixed. There are 2 problems

  • broken building for rhel 7 in copr due to rhel 7 EOL kind of (so we ignore that one)
  • broken test for leapp data files which hasn't expected that the leapp-upgrade-elXtoelY rpm could contain additional configuration files

I've already created a MR for the problematic test. let's see if we will be able to proceed in few hours.

@pirat89
Copy link
Member

pirat89 commented Aug 16, 2024

/packit copr-build

Comment on lines +11 to +24
summary = (
'The Live Upgrade Mode requires at least 2 GB of additional space '
'in the partition that hosts /var/lib/leapp in order to create '
'the squashfs image. During the "reboot phase", the image will '
'need more space into memory, in particular for booting over the '
'network. The recommended memory for this mode is at least 4 GB.'
)
reporting.create_report([
reporting.Title('Live Upgrade Mode enabled'),
reporting.Summary(summary),
reporting.Severity(reporting.Severity.HIGH),
reporting.Groups([reporting.Groups.BOOT]),
reporting.RelatedResource('file', '/etc/leapp/files/devel-livemode.ini')
])
Copy link
Member

@pirat89 pirat89 Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future, we should check the available memory and raise this output just in case we see that there is not enough memory avilable. Note that information about total memory is covered by MemoryInfo msg, but I do not remember now whether it's physical only or whether it includes swap as well - note that in some cases we should not count with SWAP.

Copy link
Member

@pirat89 pirat89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passed! Reviewed most of the code (I haven't reviewed deeply some experimental actors). It seems the solution works as expected. Additional changes are expected to be delivered in another PRs in future.

Great job guys!!! 🚀

@pirat89 pirat89 merged commit d253c21 into oamg:master Aug 16, 2024
17 of 20 checks passed
@pirat89 pirat89 added the changelog-checked The merger/reviewer checked the changelog draft document and updated it when relevant label Aug 16, 2024
@pirat89 pirat89 changed the title Use a squashfs of a target RHEL userspace instead of containers Introduce experimental live mode [squashfs] Aug 16, 2024
pirat89 added a commit to pirat89/leapp-repository that referenced this pull request Aug 16, 2024
## Packaging
- .. names of packages, dependencies, changes in provided capabilities....

## Upgrade handling
### Fixes
- Add missing RHUI GCP config info for RHEL for SAP (oamg#1253)
- Fix creation of the post upgrade report about changes in states of systemd services (oamg#1210)
- Fix detection of valid sshd config with internal-sftp subsystem in Leapp (oamg#1212)
- Fix evaluation of PES data (oamg#1194)
- Fix failing "update-ca-trust" command caused by missing util-linux package (oamg#1169)
- Fix handling of versions in RHUI configuration for ELS and SAP upgrades (oamg#1240)
- Fix the parsing of the lscpu output (oamg#1184, oamg#1208)
- Fix the upgrade of systems using RHUI on AWS after changes in RHUI client package (oamg#1178)
- Fix upgrade on aarch64 via RHUI on AWS (oamg#1240)
- Handle a false positive GPG check error when TargetUserSpaceInfo is missing  (oamg#1269)
- Target by default always "GA" channel repositories unless a different channel is specified for the leapp execution (oamg#1205)
- Update the default kernel cmdline (oamg#1193, oamg#1216)
- Update the device driver deprecation data, fixing invalid fields for some AMD CPUs (oamg#1211)
- Wait for the storage initialization when /usr is on separate file system - covering SAN (oamg#1218, oamg#1219)
- [IPU 7 -> 8] Drop enforced tomcat removal for satellite when upgrading to RHEL 8.10 (oamg#1243)
- [IPU 7 -> 8] Fix detection of bootable device on RAID (oamg#1260)
- [IPU 8 -> 9] Inhibit the upgrade to RHEL 9.5 on ARM architecture due to incompatibility of the RHEL 8 bootloader and RHEL 9.5 kernel (oamg#1270)

### Enhancements
- [IPU 8 -> 9] Introduce upgrade path 8.10 -> 9.5 (oamg#1245, oamg#1246)
- Apply solutions for leftover rpms for all major upgrade paths - including experimental actors (oamg#1199)
- Do not terminate the upgrade dracut module execution anymore if /sysroot/root/tmp_leapp_py3/.leapp_upgrade_failed exists (oamg#1197)
- Improve set_systemd_services_states logging (oamg#1213)
- Include leapp command execution and defined leapp envars inside leapp.db - (oamg#1152)
- Introduce experimental upgrades in 'live' mode for the testing (oamg#1248)
- Load obsoleted GPG keys from gpg-signatures.json file instead of hardcoding them (oamg#1241)
- Several minor improvements in messages printed in console output (oamg#1173, oamg#1214, oamg#1274)
- Several minor improvements in report and error messages (oamg#1207, oamg#1217, oamg#1234, oamg#1235, oamg#1242)
- Sort lists in dnf-plugin-data for easier overview (oamg#1231)
- [IPU 7 -> 8] Allow upgrade of content from ELS repositories (oamg#1198)
- [IPU 7 -> 8] Inhibit the upgrade when Legacy GRUB is detected (oamg#1206)
- [IPU 7 -> 8] Inhibit the upgrade when embedding area is small to prevent failed bootloader update (oamg#1195)
- [IPU 8 -> 9] Enable EL 8 > 9 upgrades on Alibaba cloud (oamg#1249)
- [IPU 8 -> 9] Enable EL 8 to 9 upgrade of Satellite/Foreman server (oamg#1181)
- [IPU 9 -> 10] Introduced number of changes to enable experimental IPU 9 -> 10 (oamg#1169)
- [IPU 9 -> 10] Prevent upgrading if NetworkManager is configured with dhcp=dhclient (oamg#1268)
- [IPU 9 -> 10] Update URLs in reports to reflect the next planned major upgrade path (oamg#1169, oamg#1273)

## Additional changes interesting for devels
- drop unused `packager` field from gpg-signatures.json (oamg#1233)
- [IPU 9 -> 10] make system_upgrade/common leapp repo Python 3.12 compatible
- [IPU 9 -> 10] introduced system_upgrade/el9toel10 leapp repo
@pirat89 pirat89 mentioned this pull request Aug 16, 2024
pirat89 added a commit that referenced this pull request Aug 16, 2024
## Packaging
- Start building for EL 9 in the upstream repository on COPR (#1169)

## Upgrade handling
### Fixes
- Add missing RHUI GCP config info for RHEL for SAP (#1253)
- Fix creation of the post upgrade report about changes in states of systemd services (#1210)
- Fix detection of valid sshd config with internal-sftp subsystem in Leapp (#1212)
- Fix evaluation of PES data (#1194)
- Fix failing "update-ca-trust" command caused by missing util-linux package (#1169)
- Fix handling of versions in RHUI configuration for ELS and SAP upgrades (#1240)
- Fix the parsing of the lscpu output (#1184, #1208)
- Fix the upgrade of systems using RHUI on AWS after changes in RHUI client package (#1178)
- Fix upgrade on aarch64 via RHUI on AWS (#1240)
- Handle a false positive GPG check error when TargetUserSpaceInfo is missing  (#1269)
- Target by default always "GA" channel repositories unless a different channel is specified for the leapp execution (#1205)
- Update the default kernel cmdline (#1193, #1216)
- Update the device driver deprecation data, fixing invalid fields for some AMD CPUs (#1211)
- Wait for the storage initialization when /usr is on separate file system - covering SAN (#1218, #1219)
- [IPU 7 -> 8] Drop enforced tomcat removal for satellite when upgrading to RHEL 8.10 (#1243)
- [IPU 7 -> 8] Fix detection of bootable device on RAID (#1260)
- [IPU 8 -> 9] Inhibit the upgrade to RHEL 9.5 on ARM architecture due to incompatibility of the RHEL 8 bootloader and RHEL 9.5 kernel (#1270)

### Enhancements
- [IPU 8 -> 9] Introduce upgrade path 8.10 -> 9.5 (#1245, #1246)
- Update leapp data files (#1280)
- Apply solutions for leftover rpms for all major upgrade paths - including experimental actors (#1199)
- Do not terminate the upgrade dracut module execution anymore if /sysroot/root/tmp_leapp_py3/.leapp_upgrade_failed exists (#1197)
- Improve set_systemd_services_states logging (#1213)
- Include leapp command execution and defined leapp envars inside leapp.db - (#1152)
- Introduce experimental upgrades in 'live' mode for the testing (#1248)
- Load obsoleted GPG keys from gpg-signatures.json file instead of hardcoding them (#1241)
- Several minor improvements in messages printed in console output (#1173, #1214, #1274)
- Several minor improvements in report and error messages (#1207, #1217, #1234, #1235, #1242)
- Sort lists in dnf-plugin-data for easier overview (#1231)
- [IPU 7 -> 8] Allow upgrade of content from ELS repositories (#1198)
- [IPU 7 -> 8] Inhibit the upgrade when Legacy GRUB is detected (#1206)
- [IPU 7 -> 8] Inhibit the upgrade when embedding area is small to prevent failed bootloader update (#1195)
- [IPU 8 -> 9] Enable EL 8 > 9 upgrades on Alibaba cloud (#1249)
- [IPU 8 -> 9] Enable EL 8 to 9 upgrade of Satellite/Foreman server (#1181)
- [IPU 9 -> 10] Introduced number of changes to enable IPU 9 -> 10 for testing (#1169)
- [IPU 9 -> 10] Prevent upgrading if NetworkManager is configured with dhcp=dhclient (#1268)
- [IPU 9 -> 10] Update URLs in reports to reflect the next planned major upgrade path (#1169, #1273)

## Additional changes interesting for devels
- drop unused `packager` field from gpg-signatures.json (#1233)
- [IPU 9 -> 10] make system_upgrade/common leapp repo Python 3.12 compatible
- [IPU 9 -> 10] introduced system_upgrade/el9toel10 leapp repo
yuravk pushed a commit to yuravk/leapp-repository that referenced this pull request Aug 20, 2024
## Packaging
- Start building for EL 9 in the upstream repository on COPR (oamg#1169)

## Upgrade handling
### Fixes
- Add missing RHUI GCP config info for RHEL for SAP (oamg#1253)
- Fix creation of the post upgrade report about changes in states of systemd services (oamg#1210)
- Fix detection of valid sshd config with internal-sftp subsystem in Leapp (oamg#1212)
- Fix evaluation of PES data (oamg#1194)
- Fix failing "update-ca-trust" command caused by missing util-linux package (oamg#1169)
- Fix handling of versions in RHUI configuration for ELS and SAP upgrades (oamg#1240)
- Fix the parsing of the lscpu output (oamg#1184, oamg#1208)
- Fix the upgrade of systems using RHUI on AWS after changes in RHUI client package (oamg#1178)
- Fix upgrade on aarch64 via RHUI on AWS (oamg#1240)
- Handle a false positive GPG check error when TargetUserSpaceInfo is missing  (oamg#1269)
- Target by default always "GA" channel repositories unless a different channel is specified for the leapp execution (oamg#1205)
- Update the default kernel cmdline (oamg#1193, oamg#1216)
- Update the device driver deprecation data, fixing invalid fields for some AMD CPUs (oamg#1211)
- Wait for the storage initialization when /usr is on separate file system - covering SAN (oamg#1218, oamg#1219)
- [IPU 7 -> 8] Drop enforced tomcat removal for satellite when upgrading to RHEL 8.10 (oamg#1243)
- [IPU 7 -> 8] Fix detection of bootable device on RAID (oamg#1260)
- [IPU 8 -> 9] Inhibit the upgrade to RHEL 9.5 on ARM architecture due to incompatibility of the RHEL 8 bootloader and RHEL 9.5 kernel (oamg#1270)

### Enhancements
- [IPU 8 -> 9] Introduce upgrade path 8.10 -> 9.5 (oamg#1245, oamg#1246)
- Update leapp data files (oamg#1280)
- Apply solutions for leftover rpms for all major upgrade paths - including experimental actors (oamg#1199)
- Do not terminate the upgrade dracut module execution anymore if /sysroot/root/tmp_leapp_py3/.leapp_upgrade_failed exists (oamg#1197)
- Improve set_systemd_services_states logging (oamg#1213)
- Include leapp command execution and defined leapp envars inside leapp.db - (oamg#1152)
- Introduce experimental upgrades in 'live' mode for the testing (oamg#1248)
- Load obsoleted GPG keys from gpg-signatures.json file instead of hardcoding them (oamg#1241)
- Several minor improvements in messages printed in console output (oamg#1173, oamg#1214, oamg#1274)
- Several minor improvements in report and error messages (oamg#1207, oamg#1217, oamg#1234, oamg#1235, oamg#1242)
- Sort lists in dnf-plugin-data for easier overview (oamg#1231)
- [IPU 7 -> 8] Allow upgrade of content from ELS repositories (oamg#1198)
- [IPU 7 -> 8] Inhibit the upgrade when Legacy GRUB is detected (oamg#1206)
- [IPU 7 -> 8] Inhibit the upgrade when embedding area is small to prevent failed bootloader update (oamg#1195)
- [IPU 8 -> 9] Enable EL 8 > 9 upgrades on Alibaba cloud (oamg#1249)
- [IPU 8 -> 9] Enable EL 8 to 9 upgrade of Satellite/Foreman server (oamg#1181)
- [IPU 9 -> 10] Introduced number of changes to enable IPU 9 -> 10 for testing (oamg#1169)
- [IPU 9 -> 10] Prevent upgrading if NetworkManager is configured with dhcp=dhclient (oamg#1268)
- [IPU 9 -> 10] Update URLs in reports to reflect the next planned major upgrade path (oamg#1169, oamg#1273)

## Additional changes interesting for devels
- drop unused `packager` field from gpg-signatures.json (oamg#1233)
- [IPU 9 -> 10] make system_upgrade/common leapp repo Python 3.12 compatible
- [IPU 9 -> 10] introduced system_upgrade/el9toel10 leapp repo

(cherry picked from commit 03c257b)
yuravk pushed a commit to yuravk/leapp-repository that referenced this pull request Aug 20, 2024
## Packaging
- Start building for EL 9 in the upstream repository on COPR (oamg#1169)

## Upgrade handling
### Fixes
- Add missing RHUI GCP config info for RHEL for SAP (oamg#1253)
- Fix creation of the post upgrade report about changes in states of systemd services (oamg#1210)
- Fix detection of valid sshd config with internal-sftp subsystem in Leapp (oamg#1212)
- Fix evaluation of PES data (oamg#1194)
- Fix failing "update-ca-trust" command caused by missing util-linux package (oamg#1169)
- Fix handling of versions in RHUI configuration for ELS and SAP upgrades (oamg#1240)
- Fix the parsing of the lscpu output (oamg#1184, oamg#1208)
- Fix the upgrade of systems using RHUI on AWS after changes in RHUI client package (oamg#1178)
- Fix upgrade on aarch64 via RHUI on AWS (oamg#1240)
- Handle a false positive GPG check error when TargetUserSpaceInfo is missing  (oamg#1269)
- Target by default always "GA" channel repositories unless a different channel is specified for the leapp execution (oamg#1205)
- Update the default kernel cmdline (oamg#1193, oamg#1216)
- Update the device driver deprecation data, fixing invalid fields for some AMD CPUs (oamg#1211)
- Wait for the storage initialization when /usr is on separate file system - covering SAN (oamg#1218, oamg#1219)
- [IPU 7 -> 8] Drop enforced tomcat removal for satellite when upgrading to RHEL 8.10 (oamg#1243)
- [IPU 7 -> 8] Fix detection of bootable device on RAID (oamg#1260)
- [IPU 8 -> 9] Inhibit the upgrade to RHEL 9.5 on ARM architecture due to incompatibility of the RHEL 8 bootloader and RHEL 9.5 kernel (oamg#1270)

### Enhancements
- [IPU 8 -> 9] Introduce upgrade path 8.10 -> 9.5 (oamg#1245, oamg#1246)
- Update leapp data files (oamg#1280)
- Apply solutions for leftover rpms for all major upgrade paths - including experimental actors (oamg#1199)
- Do not terminate the upgrade dracut module execution anymore if /sysroot/root/tmp_leapp_py3/.leapp_upgrade_failed exists (oamg#1197)
- Improve set_systemd_services_states logging (oamg#1213)
- Include leapp command execution and defined leapp envars inside leapp.db - (oamg#1152)
- Introduce experimental upgrades in 'live' mode for the testing (oamg#1248)
- Load obsoleted GPG keys from gpg-signatures.json file instead of hardcoding them (oamg#1241)
- Several minor improvements in messages printed in console output (oamg#1173, oamg#1214, oamg#1274)
- Several minor improvements in report and error messages (oamg#1207, oamg#1217, oamg#1234, oamg#1235, oamg#1242)
- Sort lists in dnf-plugin-data for easier overview (oamg#1231)
- [IPU 7 -> 8] Allow upgrade of content from ELS repositories (oamg#1198)
- [IPU 7 -> 8] Inhibit the upgrade when Legacy GRUB is detected (oamg#1206)
- [IPU 7 -> 8] Inhibit the upgrade when embedding area is small to prevent failed bootloader update (oamg#1195)
- [IPU 8 -> 9] Enable EL 8 > 9 upgrades on Alibaba cloud (oamg#1249)
- [IPU 8 -> 9] Enable EL 8 to 9 upgrade of Satellite/Foreman server (oamg#1181)
- [IPU 9 -> 10] Introduced number of changes to enable IPU 9 -> 10 for testing (oamg#1169)
- [IPU 9 -> 10] Prevent upgrading if NetworkManager is configured with dhcp=dhclient (oamg#1268)
- [IPU 9 -> 10] Update URLs in reports to reflect the next planned major upgrade path (oamg#1169, oamg#1273)

## Additional changes interesting for devels
- drop unused `packager` field from gpg-signatures.json (oamg#1233)
- [IPU 9 -> 10] make system_upgrade/common leapp repo Python 3.12 compatible
- [IPU 9 -> 10] introduced system_upgrade/el9toel10 leapp repo

(cherry picked from commit 03c257b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-checked The merger/reviewer checked the changelog draft document and updated it when relevant enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants