[OSDOCS#20426]: Added files for zstream release notes 4.22.3#114333
[OSDOCS#20426]: Added files for zstream release notes 4.22.3#114333bjahagir-OpenShift wants to merge 1 commit into
Conversation
|
🤖 Tue Jun 30 08:19:50 - Prow CI generated the docs preview: |
| [id="zstream-4-22-3-fixed-issues_{context}"] | ||
| == Fixed issues | ||
|
|
||
| * Before this update, during parallel deployments of Single Node OpenShift (SNO), specifically with hypervisor-based SNOs, some systems would hang in the middle of the deployment due to a race condition involving the BareMetalHost (BMH) custom resource. The SNO was never powered on by metal3 after virtual media was attached. As a consequence, parallel deployments at scale (10+ nodes) had approximately 80% success rate, with the remaining nodes requiring manual intervention to patch the `online` field to `true` for the impacted BMH custom resources. With this release, the race condition in the BMH power-on process has been resolved. As a result, parallel SNO deployments successfully power on all nodes without manual intervention, even at scale. (link:https://issues.redhat.com/browse/OCPBUGS-73622[OCPBUGS-73622]) |
There was a problem hiding this comment.
🤖 [error] Vale.Avoid: Avoid using 'Single Node OpenShift'.
| [id="zstream-4-22-3-fixed-issues_{context}"] | ||
| == Fixed issues | ||
|
|
||
| * Before this update, during parallel deployments of Single Node OpenShift (SNO), specifically with hypervisor-based SNOs, some systems would hang in the middle of the deployment due to a race condition involving the BareMetalHost (BMH) custom resource. The SNO was never powered on by metal3 after virtual media was attached. As a consequence, parallel deployments at scale (10+ nodes) had approximately 80% success rate, with the remaining nodes requiring manual intervention to patch the `online` field to `true` for the impacted BMH custom resources. With this release, the race condition in the BMH power-on process has been resolved. As a result, parallel SNO deployments successfully power on all nodes without manual intervention, even at scale. (link:https://issues.redhat.com/browse/OCPBUGS-73622[OCPBUGS-73622]) |
There was a problem hiding this comment.
🤖 [error] Vale.Avoid: Avoid using 'SNO'.
| [id="zstream-4-22-3-fixed-issues_{context}"] | ||
| == Fixed issues | ||
|
|
||
| * Before this update, during parallel deployments of Single Node OpenShift (SNO), specifically with hypervisor-based SNOs, some systems would hang in the middle of the deployment due to a race condition involving the BareMetalHost (BMH) custom resource. The SNO was never powered on by metal3 after virtual media was attached. As a consequence, parallel deployments at scale (10+ nodes) had approximately 80% success rate, with the remaining nodes requiring manual intervention to patch the `online` field to `true` for the impacted BMH custom resources. With this release, the race condition in the BMH power-on process has been resolved. As a result, parallel SNO deployments successfully power on all nodes without manual intervention, even at scale. (link:https://issues.redhat.com/browse/OCPBUGS-73622[OCPBUGS-73622]) |
There was a problem hiding this comment.
🤖 [error] Vale.Avoid: Avoid using 'SNO'.
|
|
||
| * Before this update, during parallel deployments of Single Node OpenShift (SNO), specifically with hypervisor-based SNOs, some systems would hang in the middle of the deployment due to a race condition involving the BareMetalHost (BMH) custom resource. The SNO was never powered on by metal3 after virtual media was attached. As a consequence, parallel deployments at scale (10+ nodes) had approximately 80% success rate, with the remaining nodes requiring manual intervention to patch the `online` field to `true` for the impacted BMH custom resources. With this release, the race condition in the BMH power-on process has been resolved. As a result, parallel SNO deployments successfully power on all nodes without manual intervention, even at scale. (link:https://issues.redhat.com/browse/OCPBUGS-73622[OCPBUGS-73622]) | ||
|
|
||
| * Before this update, the `kube-apiserver-check-endpoints` container generated a TLS certificate for the `check-endpoint` service on port `17697` with a validity of only 1 second. As a consequence, the certificate expired almost immediately after generation, which differed from previous OpenShift Container Platform versions where the certificate was valid for 1 month. With this release, the `kube-apiserver-check-endpoints` container generates certificates with an appropriate validity period. As a result, the `check-endpoint` service certificate remains valid for the expected duration, consistent with previous releases. (link:https://issues.redhat.com/browse/OCPBUGS-84536[OCPBUGS-84536]) |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.SuggestAttribute: Use the AsciiDoc attribute '{product-title}' rather than the plain text product term 'OpenShift Container Platform', unless your use case is an exception.
|
|
||
| * Before this update, the `kube-apiserver-check-endpoints` container generated a TLS certificate for the `check-endpoint` service on port `17697` with a validity of only 1 second. As a consequence, the certificate expired almost immediately after generation, which differed from previous OpenShift Container Platform versions where the certificate was valid for 1 month. With this release, the `kube-apiserver-check-endpoints` container generates certificates with an appropriate validity period. As a result, the `check-endpoint` service certificate remains valid for the expected duration, consistent with previous releases. (link:https://issues.redhat.com/browse/OCPBUGS-84536[OCPBUGS-84536]) | ||
|
|
||
| * Before this update, the oslat latency test hardcoded the runner pod memory to 1 GB regardless of the `LATENCY_TEST_CPUS` value. As a consequence, when running the CNF latency test with high CPU counts such as `LATENCY_TEST_CPUS=126`, the oslat pod was OOMKilled because the fixed 1 GB memory limit was insufficient, blocking hardware platform evaluation. With this release, the oslat test runner pod memory is configurable or appropriately scaled based on the `LATENCY_TEST_CPUS` setting. As a result, the documented CNF latency test flow completes successfully with higher CPU counts without OOMKilled failures. (link:https://issues.redhat.com/browse/OCPBUGS-86071[OCPBUGS-86071]) |
There was a problem hiding this comment.
🤖 [error] RedHat.TermsErrors: Use 'hard-coded' rather than 'hardcoded'. For more information, see RedHat.TermsErrors.
|
|
||
| * Before this update, the oslat latency test hardcoded the runner pod memory to 1 GB regardless of the `LATENCY_TEST_CPUS` value. As a consequence, when running the CNF latency test with high CPU counts such as `LATENCY_TEST_CPUS=126`, the oslat pod was OOMKilled because the fixed 1 GB memory limit was insufficient, blocking hardware platform evaluation. With this release, the oslat test runner pod memory is configurable or appropriately scaled based on the `LATENCY_TEST_CPUS` setting. As a result, the documented CNF latency test flow completes successfully with higher CPU counts without OOMKilled failures. (link:https://issues.redhat.com/browse/OCPBUGS-86071[OCPBUGS-86071]) | ||
|
|
||
| * Before this update, when the `etcd-endpoints` configmap contained only the IP addresses of failed or unreachable etcd members, the etcd-operator entered a permanent deadlock. The EtcdEndpointsController, which updates the configmap, required a working etcd connection to list members, but the etcd client pool read endpoints exclusively from the stale configmap. This circular dependency prevented all operator controllers from functioning. As a consequence, the operator retried indefinitely against dead endpoints, logging `context deadline exceeded` errors continuously, and required manual intervention to patch the configmap with healthy member IPs. With this release, the operator detects when all configmap-derived endpoints are unreachable and falls back to node-based endpoint discovery to re-establish connectivity with healthy etcd members. As a result, the EtcdEndpointsController automatically updates the configmap with correct IPs and recovery proceeds without manual intervention. (link:https://issues.redhat.com/browse/OCPBUGS-88490[OCPBUGS-88490]) |
There was a problem hiding this comment.
🤖 [error] Vale.Terms: Use 'Operators?' instead of 'operator'.
|
|
||
| * Before this update, the oslat latency test hardcoded the runner pod memory to 1 GB regardless of the `LATENCY_TEST_CPUS` value. As a consequence, when running the CNF latency test with high CPU counts such as `LATENCY_TEST_CPUS=126`, the oslat pod was OOMKilled because the fixed 1 GB memory limit was insufficient, blocking hardware platform evaluation. With this release, the oslat test runner pod memory is configurable or appropriately scaled based on the `LATENCY_TEST_CPUS` setting. As a result, the documented CNF latency test flow completes successfully with higher CPU counts without OOMKilled failures. (link:https://issues.redhat.com/browse/OCPBUGS-86071[OCPBUGS-86071]) | ||
|
|
||
| * Before this update, when the `etcd-endpoints` configmap contained only the IP addresses of failed or unreachable etcd members, the etcd-operator entered a permanent deadlock. The EtcdEndpointsController, which updates the configmap, required a working etcd connection to list members, but the etcd client pool read endpoints exclusively from the stale configmap. This circular dependency prevented all operator controllers from functioning. As a consequence, the operator retried indefinitely against dead endpoints, logging `context deadline exceeded` errors continuously, and required manual intervention to patch the configmap with healthy member IPs. With this release, the operator detects when all configmap-derived endpoints are unreachable and falls back to node-based endpoint discovery to re-establish connectivity with healthy etcd members. As a result, the EtcdEndpointsController automatically updates the configmap with correct IPs and recovery proceeds without manual intervention. (link:https://issues.redhat.com/browse/OCPBUGS-88490[OCPBUGS-88490]) |
There was a problem hiding this comment.
🤖 [error] Vale.Terms: Use 'Operators?' instead of 'operator'.
|
|
||
| * Before this update, the oslat latency test hardcoded the runner pod memory to 1 GB regardless of the `LATENCY_TEST_CPUS` value. As a consequence, when running the CNF latency test with high CPU counts such as `LATENCY_TEST_CPUS=126`, the oslat pod was OOMKilled because the fixed 1 GB memory limit was insufficient, blocking hardware platform evaluation. With this release, the oslat test runner pod memory is configurable or appropriately scaled based on the `LATENCY_TEST_CPUS` setting. As a result, the documented CNF latency test flow completes successfully with higher CPU counts without OOMKilled failures. (link:https://issues.redhat.com/browse/OCPBUGS-86071[OCPBUGS-86071]) | ||
|
|
||
| * Before this update, when the `etcd-endpoints` configmap contained only the IP addresses of failed or unreachable etcd members, the etcd-operator entered a permanent deadlock. The EtcdEndpointsController, which updates the configmap, required a working etcd connection to list members, but the etcd client pool read endpoints exclusively from the stale configmap. This circular dependency prevented all operator controllers from functioning. As a consequence, the operator retried indefinitely against dead endpoints, logging `context deadline exceeded` errors continuously, and required manual intervention to patch the configmap with healthy member IPs. With this release, the operator detects when all configmap-derived endpoints are unreachable and falls back to node-based endpoint discovery to re-establish connectivity with healthy etcd members. As a result, the EtcdEndpointsController automatically updates the configmap with correct IPs and recovery proceeds without manual intervention. (link:https://issues.redhat.com/browse/OCPBUGS-88490[OCPBUGS-88490]) |
There was a problem hiding this comment.
🤖 [error] Vale.Terms: Use 'Operators?' instead of 'operator'.
| [id="zstream-4-22-3-updating_{context}"] | ||
| == Updating | ||
|
|
||
| To update an {product-title} 4.22 cluster to this latest release, see xref:../updating/updating_a_cluster/updating-cluster-cli.adoc#updating-cluster-cli[Updating a cluster using the CLI]. |
There was a problem hiding this comment.
🤖 [error] OpenShiftAsciiDoc.NoXrefInModules: Do not include xrefs in modules, only assemblies (exception: release notes modules).
|
@bjahagir-OpenShift: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Version(s):
4.22
Issue:
https://redhat.atlassian.net/browse/OSDOCS-20426
Link to docs preview:
https://114333--ocpdocs-pr.netlify.app/openshift-enterprise/latest/release_notes/ocp-4-22-release-notes.html#zstream-4-22-3_release-notes
QE review:
N/A