Skip to content

News data is extremely large / raw -> causing 6–7k token usage even with top-5 articles #291

@kaushik-yadav

Description

@kaushik-yadav

Hi, I noticed that the news data block is extremely large. Even when selecting only the top 5 news articles, the input to the LLM ends up being 6–7k tokens, mainly because the full raw article text is passed through.

Other data sources (fundamentals, technicals) remain compact; only news is causing heavy token usage.

Problem

  • Very high token cost per run
  • Slower inference
  • News text contains many irrelevant sections (ads, disclaimers, long paragraphs)

Suggestion

It might help to:

  • Use only the headline + first paragraph, or
  • Add a built-in summarization step, or
  • Add a max_chars/max_tokens limit per article.

I’m currently using a workaround that keeps just the title + first paragraph, which reduces tokens significantly while keeping signal quality.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions