Skip to content

sonataflow platform changes for osl 1.37#292

Closed
elai-shalev wants to merge 1 commit intoredhat-developer:mainfrom
elai-shalev:osl-1.37-fixes
Closed

sonataflow platform changes for osl 1.37#292
elai-shalev wants to merge 1 commit intoredhat-developer:mainfrom
elai-shalev:osl-1.37-fixes

Conversation

@elai-shalev
Copy link
Copy Markdown

@elai-shalev elai-shalev commented Jan 7, 2026

Description of the change

This PR will introduce necessary changes in Orchestrator's Sonataflow Platform, following the switch to use Openshift Serverless Logic v1.37

for more information see this slack thread

Which issue(s) does this PR fix or relate to

- JIRA_issue_link

How to test changes / Special notes to the reviewer

Checklist

  • For each Chart updated, version bumped in the corresponding Chart.yaml according to Semantic Versioning.
  • For each Chart updated, variables are documented in the values.yaml and added to the corresponding README.md. The pre-commit utility can be used to generate the necessary content. Use pre-commit run -a to apply changes. The pre-commit Workflow will do this automatically for you if needed.
  • JSON Schema template updated and re-generated the raw schema via the pre-commit hook.
  • Tests pass using the Chart Testing tool and the ct lint command.
  • If you updated the orchestrator-infra chart, make sure the versions of the Knative CRDs are aligned with the versions of the CRDs installed by the OpenShift Serverless operators declared in the values.yaml file. See Installing Knative Eventing and Knative Serving CRDs for more details.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jan 7, 2026

@rhdh-qodo-merge
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

RHIDP-11265 - Partially compliant

Compliant requirements:

  • Update Orchestrator installation to reflect Sonataflow Platform changes introduced by OSL v1.37.

Non-compliant requirements:

  • Documentation for upgrade scenarios should include the changes.
  • Upstream documentation updates (design docs, release notes, etc.).

Requires further human verification:

  • QE automation changes required to test these changes.
  • Any documentation updates outside this repo/PR that may be done in parallel.
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🔒 Security concerns

TLS verification weakening:
QUARKUS_DATASOURCE_REACTIVE_TRUST_ALL=true together with QUARKUS_DATASOURCE_REACTIVE_POSTGRESQL_SSL_MODE=allow may disable/relax certificate validation and allow non-TLS connections depending on server/client negotiation. This can expose DB credentials and traffic to interception if used outside tightly controlled environments. Consider making these settings optional/configurable and defaulting to secure verification.

⚡ No major issues detected
📚 Focus areas based on broader codebase context

Hardcoded Config

The template introduces hardcoded DB-related behavior (dbMigrationStrategy: service) and hardcoded TLS settings (QUARKUS_DATASOURCE_REACTIVE_TRUST_ALL=true, ..._SSL_MODE=allow) when jobServiceImage is set. This should be validated for security and portability, and ideally made configurable via values.yaml (or omitted by default) to match existing chart configurability patterns. (Ref 2, Ref 6)

        dbMigrationStrategy: service
        postgresql:
{{- if .Values.upstream.postgresql.enabled }}
          secretRef:
            name: {{ .Release.Name }}-postgresql-svcbind-postgres
            userKey: username
            passwordKey: password
          serviceRef:
            name: {{ .Release.Name }}-postgresql{{- if eq .Values.upstream.postgresql.architecture "replication" }}-primary{{- end }}
            namespace: {{ .Release.Namespace }}
            databaseName: sonataflow
{{- else }}
          secretRef:
            name: {{ .Values.orchestrator.sonataflowPlatform.externalDBsecretRef }}
            userKey: POSTGRES_USER
            passwordKey: POSTGRES_PASSWORD
          jdbcUrl: jdbc:postgresql://{{ .Values.orchestrator.sonataflowPlatform.externalDBHost }}:{{ .Values.orchestrator.sonataflowPlatform.externalDBPort }}/sonataflow?currentSchema=jobs-service
{{- end }}
{{- if .Values.orchestrator.sonataflowPlatform.jobServiceImage }}
      podTemplate:
        container:
          image: {{ .Values.orchestrator.sonataflowPlatform.jobServiceImage }}
          env:
            - name: QUARKUS_DATASOURCE_REACTIVE_POSTGRESQL_SSL_MODE
              value: allow
            - name: QUARKUS_DATASOURCE_REACTIVE_TRUST_ALL
              value: 'true'

Reference reasoning: Existing SonataFlowPlatform configuration in the codebase focuses on declarative persistence wiring (secretRef/serviceRef) without embedding trust-all/SSL overrides in the manifest, and the chart’s values.yaml pattern exposes SonataFlowPlatform knobs as user-configurable fields rather than hardcoding operational/security-sensitive settings in templates.

📄 References
  1. redhat-developer/rhdh-chart/charts/orchestrator-infra/values.yaml [1-18]
  2. redhat-developer/rhdh-chart/charts/backstage/values.yaml [379-408]
  3. redhat-developer/rhdh-chart/charts/backstage/templates/sonataflows.yaml [0-2]
  4. redhat-developer/rhdh-chart/charts/orchestrator-infra/templates/serverless-logic/subscription.yaml [0-2]
  5. redhat-developer/rhdh-operator/config/profile/rhdh/plugin-infra/serverless-logic.yaml [1-25]
  6. redhat-developer/rhdh-operator/config/profile/rhdh/plugin-deps/sonataflow.yaml [107-139]
  7. redhat-developer/rhdh-chart/charts/orchestrator-infra/ci/upstream-olm-values.yaml [1-13]
  8. redhat-developer/rhdh-chart/charts/backstage/values.yaml [371-378]

@rhdh-qodo-merge rhdh-qodo-merge Bot added the enhancement New feature or request label Jan 7, 2026
@rhdh-qodo-merge
Copy link
Copy Markdown

PR Type

Enhancement


Description

  • Update Backstage chart version to 4.9.1 for OSL 1.37 compatibility

  • Add database migration strategy configuration to job service

  • Configure PostgreSQL SSL settings for job service container


File Walkthrough

Relevant files
Configuration changes
Chart.yaml
Increment chart version for release                                           

charts/backstage/Chart.yaml

  • Bump chart version from 4.9.0 to 4.9.1 following semantic versioning
+1/-1     
Documentation
README.md
Update documentation with new version                                       

charts/backstage/README.md

  • Update version badge from 4.9.0 to 4.9.1
  • Update Helm install command example with new chart version
+2/-2     
Enhancement
sonataflows.yaml
Configure job service database and SSL settings                   

charts/backstage/templates/sonataflows.yaml

  • Add dbMigrationStrategy: service configuration to job service
    persistence settings
  • Add PostgreSQL SSL environment variables
    (QUARKUS_DATASOURCE_REACTIVE_POSTGRESQL_SSL_MODE and
    QUARKUS_DATASOURCE_REACTIVE_TRUST_ALL) to job service container
+6/-0     

@rhdh-qodo-merge
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Enforce SSL certificate verification

To prevent Man-in-the-Middle attacks, enforce SSL certificate verification by
setting QUARKUS_DATASOURCE_REACTIVE_POSTGRESQL_SSL_MODE to verify-full and
QUARKUS_DATASOURCE_REACTIVE_TRUST_ALL to false.

charts/backstage/templates/sonataflows.yaml [84-88]

 env:
   - name: QUARKUS_DATASOURCE_REACTIVE_POSTGRESQL_SSL_MODE
-    value: allow
+    value: verify-full
   - name: QUARKUS_DATASOURCE_REACTIVE_TRUST_ALL
-    value: 'true'
+    value: 'false'
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies and offers a fix for a significant security vulnerability (MITM risk) introduced by disabling database connection certificate validation.

High
Possible issue
Move migration strategy out of persistence

Correct the configuration by moving dbMigrationStrategy to be a direct child of
jobService and a sibling of persistence, ensuring the database migration runs
correctly.

charts/backstage/templates/sonataflows.yaml [61-63]

+dbMigrationStrategy: service
 persistence:
-  dbMigrationStrategy: service
   postgresql:
     secretRef:
       name: {{ .Release.Name }}-postgresql-svcbind-postgres
     jdbcUrl: jdbc:postgresql://...

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a misconfiguration where dbMigrationStrategy is placed incorrectly, which will likely cause the database migration to fail.

Medium
  • More

@elai-shalev
Copy link
Copy Markdown
Author

I'm debugging the CI failures

@elai-shalev
Copy link
Copy Markdown
Author

@rm3l could we rerun the CI? OL 1.37 with this config passed the QE's CI

@rm3l
Copy link
Copy Markdown
Member

rm3l commented Jan 8, 2026

@rm3l could we rerun the CI? OL 1.37 with this config passed the QE's CI

Sure. https://github.com/redhat-developer/rhdh-chart/actions/runs/20776737022/job/59770519308 🤞🏿

@rm3l
Copy link
Copy Markdown
Member

rm3l commented Jan 8, 2026

could we rerun the CI? OL 1.37 with this config passed the QE's CI

Did you test an upgrade from the previous version of the Chart? This is what the CI here tries to do.

Sure. redhat-developer/rhdh-chart/actions/runs/20776737022/job/59770519308 🤞🏿

I can see a lot of errors like this in the RHDH container logs:

2026-01-08T08:37:51.855Z orchestrator error Error fetching workflow service urls operation=
{"key":-10725952659,"query":{"kind":"Document","definitions":[{"kind":"OperationDefinition","operation":"query","
selectionSet":{"kind":"SelectionSet","selections":[{"kind":"Field","name":{"kind":"Name","value":"ProcessDefinitions"},
"selectionSet":{"kind":"SelectionSet","selections":[{"kind":"Field","name":{"kind":"Name","value":"id"}},{"kind":"Field","name":{"kind":"Name","value":"serviceUrl"}}]}}]}}],"loc":{"start":0,"end":52,"source":{"body":"{\n  ProcessDefinitions {\n    id\n    serviceUrl\n  }\n}","name":"gql","locationOffset":{"line":1,"column":1}}},"__key":-10725952659},"variables":{},"kind":"query","
context":{"url":"http://sonataflow-platform-data-index-service.backstage-tv33n0q00z/graphql","requestPolicy":"cache-first","suspense":false}} data=undefined error={"name":"CombinedError","graphQLErrors":[],"networkError":{}} extensions=undefined hasNext=false stale=false trace_id="e3cb2f9bc7305ce4936d9d6f7fd62057" span_id="78c5aa6b56e34db8" trace_flags="01"
2026-01-08T08:37:46.844Z orchestrator error Error running task__Orchestrator__WorkflowCacheService: 
Network Error: [Network] fetch failed. 
getaddrinfo ENOTFOUND sonataflow-platform-data-index-service.backstage-tv33n0q00z 
trace_id="f7f947a7d3cb4e59b485f921203dde8e" span_id="2c921c390889cd4f" trace_flags="01"

I am realizing that the CI workflow here also tries to install the upstream serverless operator: https://github.com/elai-shalev/rhdh-chart/blob/osl-1.37-fixes/.github/workflows/test.yaml#L135-L149
Shouldn't this be also updated to work to reflect the OSL version bump here?

@elai-shalev
Copy link
Copy Markdown
Author

could we rerun the CI? OL 1.37 with this config passed the QE's CI

Did you test an upgrade from the previous version of the Chart? This is what the CI here tries to do.

Sure. redhat-developer/rhdh-chart/actions/runs/20776737022/job/59770519308 🤞🏿

I can see a lot of errors like this in the RHDH container logs:

2026-01-08T08:37:51.855Z orchestrator error Error fetching workflow service urls operation=
{"key":-10725952659,"query":{"kind":"Document","definitions":[{"kind":"OperationDefinition","operation":"query","
selectionSet":{"kind":"SelectionSet","selections":[{"kind":"Field","name":{"kind":"Name","value":"ProcessDefinitions"},
"selectionSet":{"kind":"SelectionSet","selections":[{"kind":"Field","name":{"kind":"Name","value":"id"}},{"kind":"Field","name":{"kind":"Name","value":"serviceUrl"}}]}}]}}],"loc":{"start":0,"end":52,"source":{"body":"{\n  ProcessDefinitions {\n    id\n    serviceUrl\n  }\n}","name":"gql","locationOffset":{"line":1,"column":1}}},"__key":-10725952659},"variables":{},"kind":"query","
context":{"url":"http://sonataflow-platform-data-index-service.backstage-tv33n0q00z/graphql","requestPolicy":"cache-first","suspense":false}} data=undefined error={"name":"CombinedError","graphQLErrors":[],"networkError":{}} extensions=undefined hasNext=false stale=false trace_id="e3cb2f9bc7305ce4936d9d6f7fd62057" span_id="78c5aa6b56e34db8" trace_flags="01"
2026-01-08T08:37:46.844Z orchestrator error Error running task__Orchestrator__WorkflowCacheService: 
Network Error: [Network] fetch failed. 
getaddrinfo ENOTFOUND sonataflow-platform-data-index-service.backstage-tv33n0q00z 
trace_id="f7f947a7d3cb4e59b485f921203dde8e" span_id="2c921c390889cd4f" trace_flags="01"

I am realizing that the CI workflow here also tries to install the upstream serverless operator: https://github.com/elai-shalev/rhdh-chart/blob/osl-1.37-fixes/.github/workflows/test.yaml#L135-L149 Shouldn't this be also updated to work to reflect the OSL version bump here?

The CI is trying to install the sonataflow CRDs via the upstream project, so it's compatible on k8s.
It looks like the latest version is from 2023? https://github.com/apache/incubator-kie-kogito-serverless-operator/releases

podTemplate:
container:
image: {{ .Values.orchestrator.sonataflowPlatform.jobServiceImage }}
env:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elai-shalev when testing this on an OCP cluster I only changed the env var variables, I did not add the dbMigrationStrategy: service section. It seemed to worked regardless.

@elai-shalev
Copy link
Copy Markdown
Author

I think the new CRDs are just not compatible with the upstream version.
Even thi sversion https://github.com/apache/incubator-kie-tools/releases/tag/10.1.0 is out of date (July), no recent upstream release yet we can use. @rm3l

I am realizing that the CI workflow here also tries to install the upstream serverless operator: https://github.com/elai-shalev/rhdh-chart/blob/osl-1.37-fixes/.github/workflows/test.yaml#L135-L149

@elai-shalev
Copy link
Copy Markdown
Author

After conversing with @y-first offline, looks like this PR is not really needed. The extra env-vars should only be added in the specific case for deploying orchestraotr with an external DB and using SSL, and was needed for a workaround in the QE's setup. This could be handled in documentation. Closing this PR for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants