COO-1597: Raise memory limit of observability-operator deploy by muellerfabi · Pull Request #994 · rhobs/observability-operator

muellerfabi · 2026-02-10T12:37:01Z

On a larger OCP 4.18 cluster with 66 nodes observability-operator 1.3.0 gets OOMKilled right after start.

Issue is reported in https://access.redhat.com/support/cases/#/case/04368491
Issue is tracked in https://issues.redhat.com/browse/COO-1597

Update due to missing information about the actual issue:
We are aware of the fact that it is possible to set resource requests and limits in the subscription. We use the following as a workaround:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-observability-operator
  namespace: openshift-operators-redhat
spec:
  channel: stable
  config:
    resources:
      limits:
        memory: 3Gi
     requests:
       memory: 500Mi

The problem is that the given config is set on all COO components equally:

obo-prometheus-operator
obo-prometheus-operator-admission-webhook
observability-operator
perses-operator

Because of the elevated memory usage of perses in contrast to the other components, we were forced to set the memory limit to 3Gi.
Now all the components have a needlessly too high memory limit:

$ oc get deploy -o custom-columns=NAME:.metadata.name,RESOURCES:.spec.template.spec.containers[0].resources
NAME                                        RESOURCES
logging                                     map[]
loki-operator-controller-manager            map[]
obo-prometheus-operator                     map[limits:map[memory:3Gi] requests:map[memory:500Mi]]
obo-prometheus-operator-admission-webhook   map[limits:map[memory:3Gi] requests:map[memory:500Mi]]
observability-operator                      map[limits:map[memory:3Gi] requests:map[memory:500Mi]]
perses-operator                             map[limits:map[memory:3Gi] requests:map[memory:500Mi]]


$ oc adm top po
NAME                                                        CPU(cores)   MEMORY(bytes)   
logging-7c8b5bfdf4-v5p67                                    1m           23Mi            
loki-operator-controller-manager-7c5b4ffbfb-wvt9b           49m          5622Mi          
obo-prometheus-operator-545cdc864f-wxmzh                    61m          317Mi           
obo-prometheus-operator-admission-webhook-57d54bf6d-4p6j6   1m           11Mi            
obo-prometheus-operator-admission-webhook-57d54bf6d-x5pfh   1m           12Mi            
observability-operator-595c984dfb-24lsc                     3m           536Mi           
perses-operator-5fc9687477-8g9jc                            11m          1701Mi

openshift-ci · 2026-02-10T12:37:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: muellerfabi
Once this PR has been reviewed and has the lgtm label, please assign simonpasquier for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-02-10T12:37:13Z

Hi @muellerfabi. Thanks for your PR.

I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

simonpasquier · 2026-02-16T13:49:58Z

/retitle COO-1597: Raise memory limit of observability-operator deploy

while it might help the reported case, I think that we should also document how users can customize the out-of-the box limits and where the current limits fit because I'm sure that we'll get other reports in the future (https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources).

openshift-ci-robot · 2026-02-16T13:50:04Z

@muellerfabi: This pull request references COO-1597 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

On a larger OCP 4.18 cluster with 66 nodes observability-operator 1.3.0 gets OOMKilled right after start.

Issue is reported in https://access.redhat.com/support/cases/#/case/04368491
Issue is tracked in https://issues.redhat.com/browse/COO-1597

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jan--f · 2026-04-24T11:33:55Z

@jgbernalp wdyt? It's a small adjustment, so not sure how many user this will help out of the box.

jgbernalp · 2026-04-24T16:09:13Z

@jgbernalp wdyt? It's a small adjustment, so not sure how many user this will help out of the box.

Is there a way for users to configure these limits? this might apply for this case. @muellerfabi do you have a pprof from this cluster that we can analyze?

simonpasquier · 2026-04-27T08:06:51Z

@jgbernalp yes it's possible (https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources) but the original report complained that it "should" just work out of the box.

duritong · 2026-04-27T12:58:50Z

@simonpasquier the main issue with: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources is that it overwrites the resource constraints globally and thus for example if you have a deployment with the actual operator and for example kube-rbac-proxy sidecar (very typical), your large requirements will also apply there, see redhat-cop/patch-operator#76 as an example from another operator.

openshift-ci-robot · 2026-06-19T10:22:27Z

@muellerfabi: This pull request references COO-1597 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

On a larger OCP 4.18 cluster with 66 nodes observability-operator 1.3.0 gets OOMKilled right after start.

Issue is reported in https://access.redhat.com/support/cases/#/case/04368491
Issue is tracked in https://issues.redhat.com/browse/COO-1597

Update due to missing information about the actual issue:
We are aware of the fact that it is possible to set resource requests and limits in the subscription. We use the following as a workaround:
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
 name: cluster-observability-operator
 namespace: openshift-operators-redhat
spec:
 channel: stable
 config:
   resources:
     limits:
       memory: 3Gi
    requests:
      memory: 500Mi
The problem is that the given config is set on all COO components equally:

obo-prometheus-operator

obo-prometheus-operator-admission-webhook

observability-operator

perses-operator

Because of the elevated memory usage of perses in contrast to the other components, we were forced to set the memory limit to 3Gi.
Now all the components have a needlessly too high memory limit:
$ oc get deploy -o custom-columns=NAME:.metadata.name,RESOURCES:.spec.template.spec.containers[0].resources
NAME                                        RESOURCES
logging                                     map[]
loki-operator-controller-manager            map[]
obo-prometheus-operator                     map[limits:map[memory:3Gi] requests:map[memory:500Mi]]
obo-prometheus-operator-admission-webhook   map[limits:map[memory:3Gi] requests:map[memory:500Mi]]
observability-operator                      map[limits:map[memory:3Gi] requests:map[memory:500Mi]]
perses-operator                             map[limits:map[memory:3Gi] requests:map[memory:500Mi]]


$ oc adm top po
NAME                                                        CPU(cores)   MEMORY(bytes)   
logging-7c8b5bfdf4-v5p67                                    1m           23Mi            
loki-operator-controller-manager-7c5b4ffbfb-wvt9b           49m          5622Mi          
obo-prometheus-operator-545cdc864f-wxmzh                    61m          317Mi           
obo-prometheus-operator-admission-webhook-57d54bf6d-4p6j6   1m           11Mi            
obo-prometheus-operator-admission-webhook-57d54bf6d-x5pfh   1m           12Mi            
observability-operator-595c984dfb-24lsc                     3m           536Mi           
perses-operator-5fc9687477-8g9jc                            11m          1701Mi          

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

muellerfabi · 2026-06-19T10:35:25Z

I updated the initial comment with further information, why overwriting resources in the Subscription kind is not ideal.

@jgbernalp Is there a way for users to configure these limits? this might apply for this case. @muellerfabi do you have a pprof from this cluster that we can analyze?
A must-gather from the affected cluster could be uploaded to the support case, if that helps?

Raising the memory limit to 550Mi is sufficient until the cluster grows to 90 nodes or so.
Maybe it was better to drop the limit entirely. Cluster-Operators usually do not have limits.
WDYT?

simonpasquier · 2026-06-19T12:41:14Z

Raising the memory limit to 550Mi is sufficient until the cluster grows to 90 nodes or so.

I'm worried that while it works for this cluster, we'll hear about other clusters still hitting the limit.

Maybe it was better to drop the limit entirely. Cluster-Operators usually do not have limits.
WDYT?

My initial assumption was that resource limits were a strong requirement but apparently not: https://sdk.operatorframework.io/docs/best-practices/managing-resources/#general-guidelines
We'd need to discuss the downsides of removing the limits but it might be the best course of action.

raise mem limit in observability-operator deploy

15ea476

openshift-ci Bot requested review from danielmellado and lihongyan1 February 10, 2026 12:37

openshift-ci Bot added the needs-ok-to-test label Feb 10, 2026

openshift-ci Bot changed the title ~~COO-1597 Raise memory limit of observability-operator deploy~~ COO-1597: Raise memory limit of observability-operator deploy Feb 16, 2026

openshift-ci-robot added the jira/valid-reference label Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

COO-1597: Raise memory limit of observability-operator deploy#994

COO-1597: Raise memory limit of observability-operator deploy#994
muellerfabi wants to merge 1 commit into
rhobs:mainfrom
muellerfabi:coo-1597

muellerfabi commented Feb 10, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Feb 10, 2026

Uh oh!

openshift-ci Bot commented Feb 10, 2026

Uh oh!

simonpasquier commented Feb 16, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

openshift-ci-robot commented Feb 16, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

jan--f commented Apr 24, 2026

Uh oh!

jgbernalp commented Apr 24, 2026

Uh oh!

simonpasquier commented Apr 27, 2026

Uh oh!

duritong commented Apr 27, 2026

Uh oh!

openshift-ci-robot commented Jun 19, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

muellerfabi commented Jun 19, 2026

Uh oh!

simonpasquier commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

muellerfabi commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Feb 10, 2026

Uh oh!

openshift-ci Bot commented Feb 10, 2026

Uh oh!

simonpasquier commented Feb 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan--f commented Apr 24, 2026

Uh oh!

jgbernalp commented Apr 24, 2026

Uh oh!

simonpasquier commented Apr 27, 2026

Uh oh!

duritong commented Apr 27, 2026

Uh oh!

openshift-ci-robot commented Jun 19, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muellerfabi commented Jun 19, 2026

Uh oh!

simonpasquier commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

muellerfabi commented Feb 10, 2026 •

edited

Loading

simonpasquier commented Feb 16, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Feb 16, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Jun 19, 2026 •

edited by openshift-ci Bot

Loading