COO-1597: Raise memory limit of observability-operator deploy#994
COO-1597: Raise memory limit of observability-operator deploy#994muellerfabi wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: muellerfabi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @muellerfabi. Thanks for your PR. I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retitle COO-1597: Raise memory limit of observability-operator deploy while it might help the reported case, I think that we should also document how users can customize the out-of-the box limits and where the current limits fit because I'm sure that we'll get other reports in the future (https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources). |
|
@muellerfabi: This pull request references COO-1597 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jgbernalp wdyt? It's a small adjustment, so not sure how many user this will help out of the box. |
Is there a way for users to configure these limits? this might apply for this case. @muellerfabi do you have a pprof from this cluster that we can analyze? |
|
@jgbernalp yes it's possible (https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources) but the original report complained that it "should" just work out of the box. |
|
@simonpasquier the main issue with: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md#resources is that it overwrites the resource constraints globally and thus for example if you have a deployment with the actual operator and for example kube-rbac-proxy sidecar (very typical), your large requirements will also apply there, see redhat-cop/patch-operator#76 as an example from another operator. |
|
@muellerfabi: This pull request references COO-1597 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
I updated the initial comment with further information, why overwriting resources in the Subscription kind is not ideal.
Raising the memory limit to 550Mi is sufficient until the cluster grows to 90 nodes or so. |
I'm worried that while it works for this cluster, we'll hear about other clusters still hitting the limit.
My initial assumption was that resource limits were a strong requirement but apparently not: https://sdk.operatorframework.io/docs/best-practices/managing-resources/#general-guidelines |
On a larger OCP 4.18 cluster with 66 nodes observability-operator 1.3.0 gets OOMKilled right after start.
Issue is reported in https://access.redhat.com/support/cases/#/case/04368491
Issue is tracked in https://issues.redhat.com/browse/COO-1597
Update due to missing information about the actual issue:
We are aware of the fact that it is possible to set resource requests and limits in the subscription. We use the following as a workaround:
The problem is that the given config is set on all COO components equally:
Because of the elevated memory usage of perses in contrast to the other components, we were forced to set the memory limit to 3Gi.
Now all the components have a needlessly too high memory limit: