Skip to content

Conversation

@ioo0s
Copy link
Contributor

@ioo0s ioo0s commented Jul 10, 2025

Description

Summary: This PR introduces support for VolcEngine embeddings. The main changes include:

  1. Added a new embedding option to integrate with the VolcEngine service.

  2. Implemented the necessary client logic to handle API requests and responses from VolcEngine.

  3. Included a comprehensive test suite to ensure the functionality and reliability of the new VolcEngine embedding integration.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have mentioned the person who will review this PR

@Ki-Seki Ki-Seki requested review from Ki-Seki and Copilot July 10, 2025 12:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for VolcEngine embeddings by introducing an ArkEmbedder implementation, integrating it into the existing factory and configuration system, and providing basic unit tests.

  • Implemented ArkEmbedder with text, image, and multimodal embedding support.
  • Registered ArkEmbedder in the factory and added its config class.
  • Added initial tests for single and batch text embedding and updated dependencies.

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/embedders/test_ark.py Added tests for single and batch text embedding
src/memos/memories/textual/general.py Updated imports and type annotation to include ArkEmbedder
src/memos/embedders/factory.py Registered ark backend in EmbedderFactory
src/memos/embedders/ark.py New ArkEmbedder implementation
src/memos/configs/embedder.py Added ArkEmbedderConfig and mapped it to "ark"
pyproject.toml Added volcengine-python-sdk dependency
Comments suppressed due to low confidence (4)

src/memos/embedders/ark.py:38

  • This comment refers to initializing the Ollama client but you are creating an Ark client here. Please update the comment accordingly.
        # Initialize ollama client

src/memos/embedders/ark.py:59

  • [nitpick] There are no unit tests covering embed_images. Consider adding tests to verify image embedding functionality.
    def embed_images(self, urls: list[str], chunk_size: int | None = None) -> list[list[float]]:

src/memos/embedders/ark.py:56

  • [nitpick] The embed_query method is not exercised by existing tests. Consider adding a test to confirm its behavior.
    def embed_query(self, text: str) -> list[float]:

src/memos/embedders/ark.py:57

  • The method embed_documents is not defined on ArkEmbedder. You likely meant to call self.embed([text]) or implement embed_documents.
        return self.embed_documents([text])[0]



class ArkEmbedder(BaseEmbedder):
"""Arl Embedder class."""
Copy link

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring on the next line spells "Arl Embedder" instead of "Ark Embedder". Please correct the typo.

Suggested change
"""Arl Embedder class."""
"""Ark Embedder class."""

Copilot uses AI. Check for mistakes.
Copy link
Member

@Ki-Seki Ki-Seki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the project continues to expand, its dependencies have become increasingly cumbersome, leading to numerous dependency conflicts. Would you consider implementing VolcEngine's Embedding support by directly using their API documentation (https://www.volcengine.com/docs/84313/1254554) instead of their Python SDK? This approach could significantly reduce dependencies.

Additionally, GitHub Copilot suggested a minor typo fix.

Lastly, please merge the dev branch or perform a rebase, as the current branch has fallen behind.


随着项目规模不断扩大,其依赖项变得日益臃肿,导致大量依赖冲突。您是否考虑通过直接调用火山引擎提供的API文档 https://www.volcengine.com/docs/84313/1254554 来替代他们的Python SDK以实现Embedding功能?这种方式可以大幅降低依赖复杂度。

另外GitHub Copilot提示了一个拼写错误需要修正。

最后请合并dev分支或进行变基操作,当前分支已落后主分支。

@ioo0s ioo0s force-pushed the feat/ark_embed_support branch from ef6f854 to 45d8fea Compare July 11, 2025 03:24
@ioo0s
Copy link
Contributor Author

ioo0s commented Jul 11, 2025

As the project continues to expand, its dependencies have become increasingly cumbersome, leading to numerous dependency conflicts. Would you consider implementing VolcEngine's Embedding support by directly using their API documentation (https://www.volcengine.com/docs/84313/1254554) instead of their Python SDK? This approach could significantly reduce dependencies.

Additionally, GitHub Copilot suggested a minor typo fix.

Lastly, please merge the dev branch or perform a rebase, as the current branch has fallen behind.

随着项目规模不断扩大,其依赖项变得日益臃肿,导致大量依赖冲突。您是否考虑通过直接调用火山引擎提供的API文档 https://www.volcengine.com/docs/84313/1254554 来替代他们的Python SDK以实现Embedding功能?这种方式可以大幅降低依赖复杂度。

另外GitHub Copilot提示了一个拼写错误需要修正。

最后请合并dev分支或进行变基操作,当前分支已落后主分支。

您好,我已按照要求修改拼写错误,并且添加image embedding test case。
这里考虑使用Python SDK的原因如下:

  1. seed embedding模型响应格式与openai embed存在差异,无法优雅的统一响应结果,若后续统一提供openai embed后,可以考虑通过添加if else的形式扩展火山embed的支持。
  2. 方便后续对LLM Backend扩展、以及涉及不同厂商对多模态模型的响应差异(我认为这部分不应该由MemOS维护)

@Ki-Seki
Copy link
Member

Ki-Seki commented Jul 11, 2025

ok。我可以同意使用Python SDK。

这个PR还需要修复下CI checks的bug;

另外,希望能先删除image和多模态embedding的部分,因为这个地方需要我们先定义好base类中的接口才可以。并且目前仓库还使用不到多模态的内容。

@ioo0s
Copy link
Contributor Author

ioo0s commented Jul 11, 2025

ok。我可以同意使用Python SDK。

这个PR还需要修复下CI checks的bug;

另外,希望能先删除image和多模态embedding的部分,因为这个地方需要我们先定义好base类中的接口才可以。并且目前仓库还使用不到多模态的内容。

非常感谢您的回复,我预计今晚更新PR

@Ki-Seki Ki-Seki merged commit c90f241 into MemTensor:dev Jul 12, 2025
16 checks passed
@Ki-Seki
Copy link
Member

Ki-Seki commented Jul 12, 2025

已合并,建议您可以同时在https://github.com/MemTensor/MemOS-Docs中增加相应的文档。

ioo0s added a commit to ioo0s/MemOS that referenced this pull request Jul 14, 2025
* feat: add volcengine embedding support

* test: add volcengine embedding test

* feat(embedder): add support for ArkEmbedder

* fixed: minor typo fix and remove unnecessary function

* test: add image embedding test

* fixed: remove images embed support and test

* fixed: change chunk_size default value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants