GitHub - tensorlakeai/tensorlake: Tensorlake is a Document Ingestion API and a serverless platform for building data processing and orchestration APIs

Get high quality data from Documents fast, and deploy scalable serverless Data Processor APIs

Tensorlake is the platform for agentic applications. Build and deploy high throughput, durable, agentic applications and workflows in minutes, leveraging our best-in-class Document Ingestion API and compute platform for applications.

Features

Document Ingestion - Parse documents (PDFs, DOCX, spreadsheets, presentations, images, and raw text) to markdown or extract structured data with schemas. This is powered by Tensorlake's state of the art layout detection and table recognition models. Review our benchmarks here.
Agentic Applications - Deploy Agentic Applications and AI Workflows using durable functions, with sandboxed and managed compute infrastructure that scales your agents with usage.

Document Ingestion Quickstart

Installation

Install the SDK and get an API Key.

pip install tensorlake

Sign up at cloud.tensorlake.ai and get your API key.

Parse Documents

from tensorlake.documentai import DocumentAI, ParseStatus

doc_ai = DocumentAI(api_key="your-api-key")

# Upload and parse document
file_id = doc_ai.upload("/path/to/document.pdf")

# Get parse ID
parse_id = doc_ai.parse(file_id)

# Wait for completion and get results
result = doc_ai.wait_for_completion(parse_id)

if result.status == ParseStatus.SUCCESSFUL:
    for chunk in result.chunks:
        print(chunk.content)  # Clean markdown output

Customize Parsing

Various aspect of Document Parsing, such as detecting strike through lines, table output mode, figure and table summarization can be customized. The API is documented here.

from tensorlake.documentai import DocumentAI, ParsingOptions, EnrichmentOptions, ParseStatus, ChunkingStrategy, TableOutputMode

doc_ai = DocumentAI(api_key="your-api-key")

# Skip the upload step, if you are passing pre-signed URLs or HTTPS accessible files.
file_id = doc_ai.upload("/path/to/document.pdf")

# Configure parsing options
parsing_options = ParsingOptions(
    chunking_strategy=ChunkingStrategy.SECTION,
    table_output_mode=TableOutputMode.HTML,
    signature_detection=True
)

# Configure enrichment options
enrichment_options = EnrichmentOptions(
    figure_summarization=True,
    table_summarization=True
)

# Parse and wait for completion
result = doc_ai.parse_and_wait(
    file_id,
    parsing_options=parsing_options,
    enrichment_options=enrichment_options
)

if result.status == ParseStatus.SUCCESSFUL:
    for chunk in result.chunks:
        print(chunk.content)

Structured Extraction

Extract specific data fields from documents using JSON schemas or Pydantic models:

Using Pydantic Models

from tensorlake.documentai import DocumentAI, StructuredExtractionOptions, ParseStatus
from pydantic import BaseModel, Field

# Define Pydantic model
class InvoiceData(BaseModel):
    invoice_number: str = Field(description="Invoice number")
    total_amount: float = Field(description="Total amount due")
    due_date: str = Field(description="Payment due date")
    vendor_name: str = Field(description="Vendor company name")

doc_ai = DocumentAI(api_key="your-api-key")

# Passing https accessible file directly (no need to upload to Tensorlake)
file_id = "https://...."   # publicly available URL of the invoice data file

# Configure structured extraction using Pydantic model
structured_extraction_options = StructuredExtractionOptions(
    schema_name="Invoice Data",
    json_schema=InvoiceData  # Can pass Pydantic model directly
)

# Parse and wait for completion
result = doc_ai.parse_and_wait(
    file_id,
    structured_extraction_options=[structured_extraction_options]
)

if result.status == ParseStatus.SUCCESSFUL:
    print(result.structured_data)

Using JSON Schema

# Define JSON schema directly
invoice_schema = {
    "title": "InvoiceData",
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string", "description": "Invoice number"},
        "total_amount": {"type": "number", "description": "Total amount due"},
        "due_date": {"type": "string", "description": "Payment due date"},
        "vendor_name": {"type": "string", "description": "Vendor company name"}
    }
}

structured_extraction_options = StructuredExtractionOptions(
    schema_name="Invoice Data",
    json_schema=invoice_schema
)

Structured Extraction is guided by the provided schema. We support Pydantic Models as well JSON Schema. All the levers for structured extraction are documented here.

Learn More

Build Durable Agentic Applications in Python

Tensorlake's Agentic Runtime allows you to deploy agentic applications built in any framework on a districutred runtime, which scales them as they get requests. The platform has built in durable execution to let applications restart from where they crash automatically.

No Queues: We manage internal state of applications and orchestration - no need for queues, background jobs and brittle retry logic.

Zero Infra: Write Python, deploy to Tensorlake.

Agentic Applications Quickstart

Write an Application in Python, decorate the entrypoint of your application with @application() and the functions with @function() if you want their state to be checkpointed or run them in sandboxes. Each Tensorlake function runs in its own isolated sandbox, allowing you to safely execute code and use different dependencies per function.

The example below creates a city guide application using OpenAI Agents with tool calls. It demonstrates:

Tool Calls: Using OpenAI Agents with WebSearchTool to search the web and function_tool to execute Python code, including Tensorlake Functions.
Sandboxed Execution: Each @function runs in its own isolated environment with specified dependencies.
Code Execution: Agents can run Python code via function_tool within the sandbox.

import os
from agents import Agent, Runner
from agents.tool import WebSearchTool, function_tool
from tensorlake.applications import application, function, run_local_application, Image

# Define the image with necessary dependencies
FUNCTION_CONTAINER_IMAGE = Image(base_image="python:3.11-slim", name="city_guide_image").run(
    "pip install openai openai-agents"
)

@function_tool
@function(
    description="Gets the weather for a city using an OpenAI Agent with web search",
    secrets=["OPENAI_API_KEY"],
    image=FUNCTION_CONTAINER_IMAGE,
)
def get_weather_tool(city: str) -> str:
    """Uses an OpenAI Agent with WebSearchTool to find current weather."""
    agent = Agent(
        name="Weather Reporter",
        instructions="Use web search to find current weather in Fahrenheit for the city.",
        tools=[WebSearchTool()],  # Agent can search the web
    )
    result = Runner.run_sync(agent, f"City: {city}")
    return result.final_output.strip()

@application(tags={"type": "example", "use_case": "city_guide"})
@function(
    description="Creates a guide with temperature conversion using function_tool",
    secrets=["OPENAI_API_KEY"],
    image=FUNCTION_CONTAINER_IMAGE,
)
def city_guide_app(city: str) -> str:
    """Uses an OpenAI Agent with function_tool to run Python code for conversion."""
    
    @function_tool
    def convert_to_celsius_tool(python_code: str) -> float:
        """Converts Fahrenheit to Celsius - runs as Python code via Agent."""
        return float(eval(python_code))
    
    agent = Agent(
        name="Guide Creator",
        instructions="Using the appropriate tools, get the weather for the purposes of the guide. If the city uses Celsius, call convert_to_celsius_tool to convert the temperature, passing in the code needed to convert the temperature to Celsius. Create a friendly guide that references the temperature of the city in Celsius if the city typically uses Celsius, otherwise reference the temperature in Fahrenheit. Only reference Celsius or Farenheit, not both.",
        tools=[get_weather_tool, convert_to_celsius_tool],  # Agent can execute this Python function
    )
    result = Runner.run_sync(agent, f"City: {city}")
    return result.final_output.strip()

Note: This is a simplified version. See the complete example at examples/readme_example/city_guide.py for the full implementation including activity suggestions and agent orchestration.

Running locally

The complete application code is available at examples/readme_example/city_guide.py. The following code is included to run it locally on your computer:

if __name__ == "__main__":
    CITY = "Paris"
    
    print(f"Generating city guide for: {CITY}\n")
    
    if not os.environ.get("OPENAI_API_KEY"):
        print("Error: OPENAI_API_KEY environment variable is not set.")
        exit(1)

    # Run locally using Tensorlake's local runner
    request = run_local_application("city_guide_app", CITY)
    response = request.output()
    
    print("\n" + "="*50)
    print("CITY GUIDE")
    print("="*50 + "\n")
    print(response)

Run the application locally:

python examples/readme_example/city_guide.py

The application will orchestrate multiple OpenAI Agents with tool calls to generate a personalized city guide. Each agent runs in its own sandbox and can execute code (like temperature conversion) and make web searches.

Here is some example output from the simplified version:

==================================================
CITY GUIDE
==================================================

Welcome to Paris! Today, the weather is cloudy with a current temperature of about 8°C. As you explore the city, you can expect evening and nighttime temperatures to stay between 5°C and 6°C.

Don’t forget your jacket as you stroll along the Seine or visit the Eiffel Tower! Paris can feel especially charming under a cloudy sky, so embrace the cozy atmosphere and maybe stop by a café for a warm drink.

If you need tips for what to do on a cloudy day in Paris, just let me know—enjoy your stay!

Testing your applications locally is convenient during development. There's no need to wait until the application is deployed to see how it works.

Deploying and running on Tensorlake Cloud

To run the application on Tensorlake Cloud, it first needs to be deployed.

Set TENSORLAKE_API_KEY environment variable in your shell session:

export TENSORLAKE_API_KEY="Paste your API key here"

Set OPENAI_API_KEY environment variable in your Tensorlake Secrets so that your application can make calls to OpenAI:

tensorlake secrets set OPENAI_API_KEY "Paste your API key here"

Deploy the application to Tensorlake Cloud:

tensorlake deploy examples/readme_example/city_guide.py

Run the remote test script, found in examples/readme_example/test_remote_app.py:

from tensorlake.applications import run_remote_application

city = "San Francisco"

# Run the application remotely
request = run_remote_application("city_guide_app", city)
print(f"Request ID: {request.id}")

# Get the output
response = request.output()
print(response)

The application will execute on Tensorlake Cloud, with each function running in its own isolated sandbox.

Updating your application

Any time you update your application, just re-deploy it to Tensorlake Cloud:

tensorlake deploy examples/readme_example/city_guide.py

And run the remote test script again:

python examples/readme_example/test_remote_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 493 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
reference_app		reference_app
src/tensorlake		src/tensorlake
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md
openapi.yaml		openapi.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Document Ingestion Quickstart

Installation

Parse Documents

Customize Parsing

Structured Extraction

Using Pydantic Models

Using JSON Schema

Learn More

Build Durable Agentic Applications in Python

Agentic Applications Quickstart

Running locally

Deploying and running on Tensorlake Cloud

Updating your application

Learn more about Tensorlake's Agentic Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 18

Languages

License

tensorlakeai/tensorlake

Folders and files

Latest commit

History

Repository files navigation

Features

Document Ingestion Quickstart

Installation

Parse Documents

Customize Parsing

Structured Extraction

Using Pydantic Models

Using JSON Schema

Learn More

Build Durable Agentic Applications in Python

Agentic Applications Quickstart

Running locally

Deploying and running on Tensorlake Cloud

Updating your application

Learn more about Tensorlake's Agentic Applications

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 18

Languages

Packages