Hi, I noticed that the news data block is extremely large. Even when selecting only the top 5 news articles, the input to the LLM ends up being 6–7k tokens, mainly because the full raw article text is passed through.
Other data sources (fundamentals, technicals) remain compact; only news is causing heavy token usage.
Problem
- Very high token cost per run
- Slower inference
- News text contains many irrelevant sections (ads, disclaimers, long paragraphs)
Suggestion
It might help to:
- Use only the headline + first paragraph, or
- Add a built-in summarization step, or
- Add a
max_chars/max_tokens limit per article.
I’m currently using a workaround that keeps just the title + first paragraph, which reduces tokens significantly while keeping signal quality.
Thanks!