Skip to content

OBSDOCS-2755: Document when to resize vs scale individual Loki components#114433

Open
johnwilkins wants to merge 4 commits into
openshift:standalone-logging-docs-mainfrom
johnwilkins:OBSDOCS-2755
Open

OBSDOCS-2755: Document when to resize vs scale individual Loki components#114433
johnwilkins wants to merge 4 commits into
openshift:standalone-logging-docs-mainfrom
johnwilkins:OBSDOCS-2755

Conversation

@johnwilkins

@johnwilkins johnwilkins commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Completes OBSDOCS-2755 by documenting when to change the LokiStack t-shirt size vs scaling individual component replicas, and improves the sizing calculation method based on KCS article best practices.

Releases: 6.6, 6.5, 6.4

Problem

Administrators needed:

  • Guidance on when to change overall size vs scale individual components
  • More accurate method for measuring log ingestion rate
  • Links to KCS articles for additional sizing guidance

Solution

1. New Module: Sizing vs Component Scaling

Created: modules/loki-sizing-vs-component-scaling.adoc

Provides:

  • Clear explanation of when to use each approach
  • Workload-specific recommendations (read-heavy vs write-heavy)
  • Decision guide table with 7 scenarios
  • Component replica override syntax and examples
  • Important warnings about override persistence

2. Improved Sizing Calculation

Updated: modules/understanding-loki-sizing.adoc

Old Query (Collector-side):

sum by (component_name) (rate(vector_component_sent_bytes_total{component_type="loki", component_kind="sink"}[5m]))

Required manual conversion from bytes/sec → GB/day

New Query (Loki-side):

sum(increase(loki_distributor_bytes_received_total[24h])) / 1024 / 1024 / 1024

Returns GB/day directly, more accurate

Benefits:

  • Simpler (no conversion math)
  • More accurate (measures what Loki receives, not what collector sends)
  • 24h window smooths daily variations
  • Aligned with KCS article 7092615

3. KCS Article References

Added to Additional Resources:

Ticket Coverage

OBSDOCS-2755 requested four things:

Requirement Addressed By Status
How to estimate Loki size OBSDOCS-3312 ✅ Merged
How to calculate data transfer THIS PR (improved) ✅ Better method
When to resize based on metrics OBSDOCS-3312 ✅ Merged
When to scale components vs size THIS PR ✅ New

Decision Guide Example

The new guide helps users choose:

  • Log volume increased 100→500 GB/day? → Change t-shirt size
  • High query latency but ingestion fine? → Scale querier/queryFrontend
  • High ingestion latency but queries fine? → Scale distributor/ingester
  • Unsure what's bottleneck? → Change t-shirt size (safer)

Technical Details

Documents spec.template.<component>.replicas API for per-component scaling:

spec:
  size: 1x.small
  template:
    querier:
      replicas: 4  # Override default

Warning included: Overrides persist across size changes.

Related

Signed-off-by: John Wilkins jowilkin@redhat.com

…ents

Adds comprehensive guidance on choosing between changing the LokiStack
t-shirt size vs scaling individual component replicas.

New Module: modules/loki-sizing-vs-component-scaling.adoc
- Explains when to change overall size (log volume changes, proportional scaling)
- Explains when to scale individual components (workload-specific optimization)
- Provides decision guide table with scenarios and recommendations
- Documents component replica override syntax with examples
- Warns that replica overrides persist across size changes

This addresses the final gap from OBSDOCS-2755. The ticket's other requirements
were already covered by OBSDOCS-3312 (baseline sizing) and OBSDOCS-3530
(per-component resource tables).

Coverage:
✅ How to estimate initial size (OBSDOCS-3312)
✅ How to calculate data transfer/ingestion rate (OBSDOCS-3312)
✅ When to resize based on metrics (OBSDOCS-3312)
✅ When to scale components vs changing size (THIS PR)

Related: https://redhat.atlassian.net/browse/OBSDOCS-2755

Signed-off-by: John Wilkins <jowilkin@redhat.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 30, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 30, 2026

Copy link
Copy Markdown

@johnwilkins: This pull request references OBSDOCS-2755 which is a valid jira issue.

Details

In response to this:

Summary

Completes OBSDOCS-2755 by documenting when to change the LokiStack t-shirt size vs scaling individual component replicas. This was the final missing piece after OBSDOCS-3312 and OBSDOCS-3530 addressed the other requirements from this ticket.

Problem

Administrators needed guidance on:

  • When to change the overall LokiStack size (1x.small → 1x.medium)
  • When to scale individual components (increase querier replicas)
  • Trade-offs between the two approaches

Solution

New Module: modules/loki-sizing-vs-component-scaling.adoc

Provides:

  • Clear explanation of when to use each approach
  • Workload-specific recommendations (read-heavy vs write-heavy)
  • Decision guide table with scenarios
  • Component replica override syntax and examples
  • Important warnings about override persistence

Added to: installing/loki-deployment-sizing.adoc

Ticket Coverage

OBSDOCS-2755 requested four things:

Requirement Addressed By Status
How to estimate Loki size OBSDOCS-3312 ✅ Merged
How to calculate data transfer OBSDOCS-3312 ✅ Merged
When to resize based on metrics OBSDOCS-3312 ✅ Merged
When to scale components vs size THIS PR ✅ New

Decision Guide Example

The new guide helps users choose:

  • Log volume increased 100→500 GB/day? → Change t-shirt size
  • High query latency but ingestion fine? → Scale querier/queryFrontend
  • High ingestion latency but queries fine? → Scale distributor/ingester
  • Unsure what's bottleneck? → Change t-shirt size (safer)

Technical Details

Documents spec.template.<component>.replicas API for per-component scaling:

spec:
 size: 1x.small
 template:
   querier:
     replicas: 4  # Override default

Warning included: Overrides persist across size changes.

Related

Signed-off-by: John Wilkins jowilkin@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 30, 2026
Changes:
1. Updated Prometheus query to use loki_distributor_bytes_received_total
   - More accurate: measures at Loki entry point (not collector)
   - Returns GB/day directly (no conversion needed)
   - Uses 24h window for better daily average

2. Added KCS article references to Additional Resources
   - How to estimate LokiStack deployment sizing (7092615)
   - How to estimate LokiStack storage size (7092616)

Old query (collector-side):
  sum by (component_name) (rate(vector_component_sent_bytes_total...))
  Required manual conversion from bytes/sec to GB/day

New query (Loki-side):
  sum(increase(loki_distributor_bytes_received_total[24h])) / 1024^3
  Returns GB/day directly, more accurate for sizing decisions

Benefits:
- Simpler for administrators (no conversion math)
- More accurate (measures what Loki actually receives)
- Aligned with KCS article guidance
- 24h window smooths out daily variations

Signed-off-by: John Wilkins <jowilkin@redhat.com>
Vale flagged: "Content other than links cannot be mapped to DITA related-links"

Removed "(Red Hat Knowledgebase)" text from KCS article links to comply
with DITA conversion requirements.

Before: link:...[Title] (Red Hat Knowledgebase)
After:  link:...[Title]

Vale now shows 0 errors, 0 warnings.

Signed-off-by: John Wilkins <jowilkin@redhat.com>
@ocpdocs-previewbot

ocpdocs-previewbot commented Jun 30, 2026

Copy link
Copy Markdown

🤖 Tue Jun 30 22:40:56 - Prow CI generated the docs preview:

https://114433--ocpdocs-pr.netlify.app/openshift-logging/latest/installing/loki-deployment-sizing.html

The xref used the wrong context suffix. The section ID expands to
calculating-sizing-after-installation_loki-deployment-sizing (the
assembly context), not _understanding-loki-sizing (the module name).

Error: "Unknown ID or title calculating-sizing-after-installation_understanding-loki-sizing"

Fixed: Changed xref to use correct context _loki-deployment-sizing

Signed-off-by: John Wilkins <jowilkin@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown

@johnwilkins: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants