-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
api: spannerIssues related to the googleapis/langchain-google-spanner-python API.Issues related to the googleapis/langchain-google-spanner-python API.
Description
Environment details
- OS type and version: Linux
- Python version: 3.10
langchain-google-spannerversion: 0.82
Problem
top_k in is defined as one of the init params in SpannerGraphQAChain class:
top_k: int = 10
"""Restricts the number of results returned in the graph query."""And it is implemented as an array slice in method execute_query:
return self.graph.query(gql_query)[: self.top_k]In scenarios when in your results there are millions of results (e.g. show me all the companies in Spain with more than 10 employees), the full result set is fetched into memory before slicing, which it is not performance wise.
We use ORDER and LIMIT in our samples and query then this issue now is not impacting us.
Suggestions
On how to solve it:
- If you are keeping this as it is, will be worth to improve the comment ( """Restricts the number of re....) and explain that this is an array slice after the query has return all results or similar
- In my view, a more scalable way to implement this is using GQL LIMIT
- It is important to notice that any solution LIMIT, array slice, etc, should be based on the premise than the results are ordered. May be a good idea to add this to top_k documentation.
Metadata
Metadata
Assignees
Labels
api: spannerIssues related to the googleapis/langchain-google-spanner-python API.Issues related to the googleapis/langchain-google-spanner-python API.