Skip to content

chore: update HF card citation copy and add library version to builder config#303

Merged
johnnygreco merged 3 commits intomainfrom
johnny/chore/update-copy-in-hf-card
Feb 6, 2026
Merged

chore: update HF card citation copy and add library version to builder config#303
johnnygreco merged 3 commits intomainfrom
johnny/chore/update-copy-in-hf-card

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

📋 Summary

Updates the Hugging Face dataset card template with citation guidance and adds a library_version computed field to BuilderConfig so generated configs include the library version for traceability.

🔄 Changes

✨ Added

  • library_version computed field on BuilderConfig using importlib.metadata (06f8afb, 92555a6)
  • Test for library_version field serialization and correctness

🔧 Changed

  • Added citation instruction copy ("If you use Data Designer in your work, please cite the project as follows:") above the bibtex block in the HF dataset card template (06f8afb)

🤖 Generated with AI

Made with Cursor

@johnnygreco johnnygreco requested a review from a team as a code owner February 6, 2026 03:20
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

Adds library_version computed field to BuilderConfig for config traceability and updates HF dataset card citation copy. The implementation properly centralizes version lookup logic in a new utility function (get_library_version()) with fallback handling for environments without package metadata, addressing concerns from previous review threads.

  • Created version.py utility with try-catch to return "unknown" when package metadata isn't available
  • Added library_version as a Pydantic computed field on BuilderConfig, automatically included in serialization
  • Refactored column_wise_builder.py to use the centralized utility instead of direct importlib.metadata calls
  • Added comprehensive tests for version utility and BuilderConfig field
  • Updated HF dataset card template with clear citation instruction text
  • Properly integrated into lazy loading system via __init__.py

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-implemented with proper error handling, comprehensive test coverage, and follow established patterns. All concerns from previous review threads have been addressed by centralizing the version lookup logic with appropriate fallback handling. The implementation is non-breaking and adds valuable traceability to exported configs.
  • No files require special attention

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/version.py New utility function centralizes version lookup with fallback handling
packages/data-designer-config/src/data_designer/config/config_builder.py Added library_version computed field using centralized utility
packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py Refactored to use centralized get_library_version() utility for telemetry

Sequence Diagram

sequenceDiagram
    participant User
    participant ConfigBuilder as DataDesignerConfigBuilder
    participant BuilderConfig
    participant VersionUtil as get_library_version()
    participant ImportLib as importlib.metadata
    participant ColumnWiseBuilder as ColumnWiseDatasetBuilder
    participant Telemetry as TelemetryHandler

    Note over User,Telemetry: Configuration Export Flow
    User->>ConfigBuilder: get_builder_config()
    ConfigBuilder->>BuilderConfig: instantiate BuilderConfig
    BuilderConfig->>VersionUtil: @computed_field library_version()
    VersionUtil->>ImportLib: version("data-designer-config")
    alt Package metadata available
        ImportLib-->>VersionUtil: "1.2.3"
    else PackageNotFoundError
        ImportLib-->>VersionUtil: PackageNotFoundError
        VersionUtil->>VersionUtil: return "unknown"
    end
    VersionUtil-->>BuilderConfig: version string
    BuilderConfig-->>ConfigBuilder: BuilderConfig with library_version
    ConfigBuilder-->>User: BuilderConfig (includes library_version in serialization)

    Note over User,Telemetry: Telemetry Flow
    User->>ColumnWiseBuilder: initialize with config
    ColumnWiseBuilder->>VersionUtil: _CLIENT_VERSION = get_library_version()
    VersionUtil->>ImportLib: version("data-designer-config")
    alt Package metadata available
        ImportLib-->>VersionUtil: "1.2.3"
    else PackageNotFoundError
        ImportLib-->>VersionUtil: PackageNotFoundError
        VersionUtil->>VersionUtil: return "unknown"
    end
    VersionUtil-->>ColumnWiseBuilder: version string
    ColumnWiseBuilder->>Telemetry: send events with _CLIENT_VERSION
    Telemetry-->>ColumnWiseBuilder: telemetry recorded
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +69 to +71
@computed_field
def library_version(self) -> str:
return importlib.metadata.version("data-designer-config")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught PackageNotFoundError

library_version calls importlib.metadata.version("data-designer-config") directly. If this code is executed from a source checkout or any environment where the data-designer-config distribution metadata isn’t present (no installed wheel / missing dist-info / different distribution name), accessing builder_config.library_version (and therefore BuilderConfig.model_dump() / to_json() / to_yaml()) will raise PackageNotFoundError and break config export/serialization.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-config/src/data_designer/config/config_builder.py
Line: 69:71

Comment:
**Uncaught PackageNotFoundError**

`library_version` calls `importlib.metadata.version("data-designer-config")` directly. If this code is executed from a source checkout or any environment where the `data-designer-config` *distribution metadata* isn’t present (no installed wheel / missing dist-info / different distribution name), accessing `builder_config.library_version` (and therefore `BuilderConfig.model_dump()` / `to_json()` / `to_yaml()`) will raise `PackageNotFoundError` and break config export/serialization.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good call out. Should we wrap this in a try-catch?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, wrapping in a try-catch is the right approach. Return a fallback value (e.g., "unknown" or "dev") when the package metadata isn't available:

Suggested change
@computed_field
def library_version(self) -> str:
return importlib.metadata.version("data-designer-config")
@computed_field
def library_version(self) -> str:
try:
return importlib.metadata.version("data-designer-config")
except importlib.metadata.PackageNotFoundError:
return "unknown"

This ensures serialization won't break in development/source installations while still providing version info in production environments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if it's worth centralizing this in a utility. We have telemetry depending on the same functionality.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Centralizing avoids duplication and ensures consistent version resolution across telemetry, config exports, and any future use cases.

Suggested approach:

  1. Create a utility in a shared module (e.g., packages/data-designer-config/src/data_designer/config/version.py):
import importlib.metadata

def get_library_version() -> str:
    """Get the installed library version, or 'dev' if not available."""
    try:
        return importlib.metadata.version("data-designer-config")
    except importlib.metadata.PackageNotFoundError:
        return "dev"
  1. Use it in BuilderConfig:
@computed_field
def library_version(self) -> str:
    return get_library_version()
  1. Update telemetry to use the same utility instead of its own version lookup.

This keeps the fallback logic in one place and makes it easy to change the fallback value or package name if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points – updated in 386df11

Comment thread packages/data-designer-config/tests/config/test_config_builder.py
@johnnygreco johnnygreco requested a review from nabinchha February 6, 2026 16:05
johnnygreco and others added 3 commits February 6, 2026 11:45
Extract version retrieval into a dedicated version.py module with
graceful error handling (returns "unknown" if package not found).
Replace direct importlib.metadata.version() calls in config_builder
and column_wise_builder with the new helper.

Co-authored-by: Cursor <cursoragent@cursor.com>
@johnnygreco johnnygreco force-pushed the johnny/chore/update-copy-in-hf-card branch from 72b0d57 to 386df11 Compare February 6, 2026 16:45
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@johnnygreco johnnygreco merged commit f69ec87 into main Feb 6, 2026
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants