Python library for working with table schemas in Universal Metadata Format (UMF). Provides type-safe models, validation, profiling integration, and schema generation tools.
- Type-Safe UMF Models: Pydantic-based models with runtime validation
- Schema Generation: Generate SQL DDL, PySpark schemas, and JSON schemas from UMF
- Great Expectations Integration: Baseline expectation generation and constraint extraction
- Profiling: Native Spark profiles feed GX validation; schema reflection handles UMF creation for bootstrap
- Validation: Table validation against UMF specifications with Great Expectations
- Type Mappings: Convert between UMF, PySpark, JSON, and Great Expectations types
- LLM Prompt Generation: Generate structured prompts for documentation, validation rules, relationships, and survivorship logic
- CLI: Typer-based command-line interface with Rich output for schema management and conversion
- Excel Conversion: Bidirectional Excel export/import for domain expert collaboration
- Split-Format UMF: Git-friendly directory-based storage with automatic format detection
- Sample Data Generation: Healthcare-specific, constraint-aware sample data from UMF specs
- Domain Type Inference: Automatic detection of domain types (SSN, NPI, phone, state codes, etc.)
- Change Management: UMF diffing, atomic change application, and git-based changelogs
The demo walks through loading a UMF schema, generating SQL/PySpark/JSON schemas, type mappings, domain type inference, Great Expectations baseline generation, LLM prompt generation, UMF diffing, and PySpark validation with sample data generation.
Play the asciinema recording in your terminal:
# Install asciinema, then play the recording
asciinema play examples/tablespec-demo.cast
# Or run the demo live (requires tablespec[spark])
uv run python examples/demo.py# Add to your uv project
uv add tablespec --index-url https://easel.github.io/tablespec/simple/
# With Spark support (for profiling and validation)
uv add tablespec[spark] --index-url https://easel.github.io/tablespec/simple/# Install from GitHub Pages index
pip install tablespec --index-url https://easel.github.io/tablespec/simple/
# With Spark support
pip install tablespec[spark] --index-url https://easel.github.io/tablespec/simple/Note: This package is distributed via GitHub Pages. The --index-url flag is required.
Optional extras:
tablespec[spark]- Adds PySpark support forSparkToUmfMapper,TableValidator,SampleDataGenerator(with Spark FK seeding),BaselineService, and table merge. Install this extra only if you need Spark-dependent features.
from tablespec import load_umf_from_yaml, save_umf_to_yaml, UMF
# Load UMF from YAML (see examples/ for sample files)
umf = load_umf_from_yaml("examples/schema.yaml")
# Access metadata
print(f"Table: {umf.table_name}")
print(f"Columns: {len(umf.columns)}")
# Modify and save
umf.description = "Updated description"
save_umf_to_yaml(umf, "updated_schema.yaml")from tablespec import UMF, UMFColumn, Nullable
umf = UMF(
version="1.0",
table_name="Medical_Claims",
description="Healthcare claims data",
columns=[
UMFColumn(
name="claim_id",
data_type="VARCHAR",
length=50,
description="Unique claim identifier",
nullable=Nullable(MD=False, MP=False, ME=False)
),
UMFColumn(
name="claim_amount",
data_type="DECIMAL",
precision=10,
scale=2,
description="Claim amount in USD",
nullable=Nullable(MD=True, MP=True, ME=True)
)
]
)
save_umf_to_yaml(umf, "medical_claims.yaml")from tablespec import generate_sql_ddl, generate_pyspark_schema, generate_json_schema
# Load UMF
umf = load_umf_from_yaml("examples/schema.yaml")
umf_dict = umf.model_dump()
# Generate SQL DDL
ddl = generate_sql_ddl(umf_dict)
print(ddl)
# Generate PySpark schema code
spark_schema = generate_pyspark_schema(umf_dict)
print(spark_schema)
# Generate JSON Schema
json_schema = generate_json_schema(umf_dict)
print(json_schema)from tablespec import bootstrap_from_tables
artifacts = bootstrap_from_tables(
spark,
["member"],
"/tmp/tablespec-bootstrap",
profile=True,
dialect="databricks",
)
print(artifacts.manifest_path)
print(artifacts.table("member").suite_json)bootstrap_from_tables reflects each Spark table into UMF, optionally profiles
the data to enrich the compiled validation suite, and persists the full
artifact tree in one call. The profiler enriches validation; it does not create
UMF.
Databricks-facing compile UX accepts dialect="databricks" for the Spark-family
SQL this bootstrap path emits. Internal emitters may normalize to spark when
the rendered SQL is identical.
This is the development bootstrap path. Production installs the published
tablespec wheel and consumes the committed artifact tree through the manifest;
it does not re-run the bootstrap orchestration from source-time Python.
Full documentation is available at easel.github.io/tablespec:
- User Guide -- UMF format, schema generation, GX integration, CLI, Excel, and more
- API Reference -- Complete API documentation
- Development -- Setup, testing, and project structure
Apache License 2.0 - see LICENSE file for details.
