Skip to content

Implement Provider Defined Types (PDT)#3433

Merged
Cole-Greer merged 15 commits into
masterfrom
PDT
Jun 12, 2026
Merged

Implement Provider Defined Types (PDT)#3433
Cole-Greer merged 15 commits into
masterfrom
PDT

Conversation

@Cole-Greer

Copy link
Copy Markdown
Contributor

Add CompositePDT (0xF0) support enabling graph providers to define custom types that serialize/deserialize seamlessly across all GLVs without driver-side configuration. Replaces the TP3 CustomTypeSerializer mechanism.

Core (gremlin-core):

  • Add @ProviderDefined annotation and immutable ProviderDefinedType POJO
  • Add ProviderDefinedTypeSerializer for GraphBinary with wire format: fully-qualified type string + fully-qualified fields map
  • Add PdtGraphSONSerializersV4 with g:CompositePdt type tag
  • Add ProviderDefinedTypeAdapter and ProviderDefinedTypeRegistry with ServiceLoader discovery, recursive hydration, and graceful degradation on adapter failure
  • Integrate auto-hydration into GraphBinaryReader and GraphSONMapper
  • Add GraphBinaryWriter auto-conversion for @ProviderDefined objects
  • Cache reflection metadata per class for performance
  • Support inherited fields via superclass walking
  • Remove legacy CUSTOM(0x00) type mechanism entirely

Gremlin-lang:

  • Add PDT("name", ["key":value]) literal to ANTLR grammar
  • Server-side parser constructs ProviderDefinedType from PDT literals
  • All GLV translators emit PDT literal syntax
  • Registry-based and annotation-based auto-dehydration in translators
  • All TranslateVisitors handle PDT for cross-language translation

GLV support (all languages):

  • Python: ProviderDefinedType, serializer, registry with @provider_defined decorator, entry_points auto-discovery, registry wired through Client/DriverRemoteConnection
  • JavaScript: ProviderDefinedType, CompositePDTSerializer, registry with explicit function pair registration, client options wiring
  • Go: ProviderDefinedType struct, serializer/deserializer, PDTRegistry with struct tags, RegisterFuncs, PDTProvider interface, client wiring
  • .NET: ProviderDefinedType, CompositePDTSerializer, registry with IProviderDefinedTypeAdapter, [ProviderDefined] attribute with IncludedFields/ExcludedFields, assembly scanning, IMessageSerializer .SetPdtRegistry() interface method, client/connection wiring

Server and testing:

  • PDT flows end-to-end through gremlin-server with TinkerGraph storing original objects and conversion at serialization boundary
  • Test-jar with Point, Address, Person test types for Docker server
  • Integration tests in all GLVs using gremlin-lang PDT literals
  • Traversal API tests covering raw PDT, registry hydration, and annotation-based auto-dehydration round-trips

Comment thread docs/src/dev/provider/pdt.asciidoc Outdated
Comment thread docs/src/dev/provider/pdt.asciidoc Outdated
Comment thread docs/src/dev/provider/pdt.asciidoc Outdated
Comment thread docs/src/dev/provider/pdt.asciidoc Outdated
Comment thread docs/src/dev/provider/pdt.asciidoc Outdated
Comment thread docs/src/dev/provider/pdt.asciidoc Outdated
}
}

@Test

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see we don't have a test outside of Groovy for using Point directly in Gremlin. In practice, can you not just do that? Is it some test environment limitations that prevents this? i've forgotten how this works maybe...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restructured the java tests, there are now cases which demonstrate using PDT's via Traversals in all 3 tiers of support (Raw PDT only, registered types, and annotated types).

@Cole-Greer Cole-Greer marked this pull request as ready for review May 29, 2026 17:36
@Cole-Greer

Copy link
Copy Markdown
Contributor Author

VOTE +1

return name;
}

public Map<String, Object> getProperties() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres naming inconsistency here. This method is getProperties but the annotation is "(included/excluded)fields". It should be called one or the other but not both. I prefer fields as it makes it clear that its not the same thing as properties on Elements which overloads the term. I know fields is probably too Java-specific, but I'd rather the disambiguation upfront.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to use fields consistently throughout all GLVs.

Comment thread gremlin-server/pom.xml
Comment thread docs/src/upgrade/release-4.x.x.asciidoc Outdated

@GumpacG GumpacG left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the scope of this PR, but it may be nice to have a function for users to call to get a list of registered PDTs.

Can you also add an integration test for nested PDTs where the outer is not registered and the inner is registered and vice versa?

VOTE +1

if not name:
raise ValueError("name cannot be null or empty")
self._name = name
self._properties = dict(properties) if properties else {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that keys are Strings here? Other GLVs enforce String only keys.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've added a runtime check inProviderDefinedType.__init__ to validate keys are strings.


For driver users consuming PDTs, see the <<gremlin-variants,Gremlin Variants>> reference documentation for
each language driver.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add contraints and error behaviour section of some sort? I think something for property map keys must be String and the type name must be non-empty.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've folded details about the String keys and non-empty field names into the basic usage section.

public Builder addRegistry(final IoRegistry registry) {
if (null == registry) throw new IllegalArgumentException("The registry cannot be null");

final List<Pair<Class, CustomTypeSerializer>> classSerializers = registry.find(GraphBinaryIo.class, CustomTypeSerializer.class);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also remove GraphBinaryIo.java since all references to it are removed now?

@GumpacG

GumpacG commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Could you also look into this issue that Kiro found?

.NET annotation hydration is unguarded and breaks the whole response (ProviderDefinedAttribute.HydrateIfRegistered + GraphBinaryReader.cs)

prop.SetValue(obj, Convert.ChangeType(value, prop.PropertyType));

This has no try/catch and no recursion into nested PDTs, yet it is called directly from GraphBinaryReader.ReadAsync:

return ProviderDefinedAttribute.HydrateIfRegistered(pdt);

Convert.ChangeType throws InvalidCastException/FormatException for any non-IConvertible or nested-PDT property value, which propagates up and fails deserialization of the entire result stream. Every other hydration path in the PR (Java registry, Python, JS, Go, .NET ProviderDefinedTypeRegistry.Hydrate) deliberately catches and returns the raw PDT. This one path violates that contract. For the nested Person→Address annotation case, Convert.ChangeType(ProviderDefinedType, typeof(Address)) will throw. Recommend wrapping in try/catch returning the raw pdt, and recursing for nested PDT-valued properties.

@kenhuuu

kenhuuu commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

VOTE +1

1 similar comment
@xiazcy

xiazcy commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

VOTE +1

Add CompositePDT (0xF0) support enabling graph providers to define
custom types that serialize/deserialize seamlessly across all GLVs
without driver-side configuration. Replaces the TP3 CustomTypeSerializer
mechanism.

Core (gremlin-core):
- Add @ProviderDefined annotation and immutable ProviderDefinedType POJO
- Add ProviderDefinedTypeSerializer for GraphBinary with wire format:
  fully-qualified type string + fully-qualified fields map
- Add PdtGraphSONSerializersV4 with g:CompositePdt type tag
- Add ProviderDefinedTypeAdapter<T> and ProviderDefinedTypeRegistry
  with ServiceLoader discovery, recursive hydration, and graceful
  degradation on adapter failure
- Integrate auto-hydration into GraphBinaryReader and GraphSONMapper
- Add GraphBinaryWriter auto-conversion for @ProviderDefined objects
- Cache reflection metadata per class for performance
- Support inherited fields via superclass walking
- Remove legacy CUSTOM(0x00) type mechanism entirely

Gremlin-lang:
- Add PDT("name", ["key":value]) literal to ANTLR grammar
- Server-side parser constructs ProviderDefinedType from PDT literals
- All GLV translators emit PDT literal syntax
- Registry-based and annotation-based auto-dehydration in translators
- All TranslateVisitors handle PDT for cross-language translation

GLV support (all languages):
- Python: ProviderDefinedType, serializer, registry with
  @provider_defined decorator, entry_points auto-discovery,
  registry wired through Client/DriverRemoteConnection
- JavaScript: ProviderDefinedType, CompositePDTSerializer, registry
  with explicit function pair registration, client options wiring
- Go: ProviderDefinedType struct, serializer/deserializer, PDTRegistry
  with struct tags, RegisterFuncs, client wiring
- .NET: ProviderDefinedType, CompositePDTSerializer, registry with
  IProviderDefinedTypeAdapter<T>, [ProviderDefined] attribute with
  IncludedFields/ExcludedFields, assembly scanning, IMessageSerializer
  .SetPdtRegistry() interface method, client/connection wiring

Server and testing:
- PDT flows end-to-end through gremlin-server with TinkerGraph storing
  original objects and conversion at serialization boundary
- Test-jar with Point, Address, Person test types for Docker server
- Integration tests in all GLVs using gremlin-lang PDT literals
- Traversal API tests covering raw PDT, registry hydration, and
  annotation-based auto-dehydration round-trips

Also: documentation (docs/src/dev/provider/pdt.asciidoc), CHANGELOG
entry, .dockerignore update for test-jar.
Annotated provider-defined types previously dehydrated automatically but
came back as raw ProviderDefinedType on the response path, requiring a
hand-written adapter to reconstruct the original object.

- Java: add ProviderDefinedTypeRegistry.register(Class<?>...) which
  synthesizes an adapter from @ProviderDefined metadata, reusing
  ProviderDefinedType's validated field/name resolution. Fail fast when a
  no-arg constructor is missing and make the constructor accessible during
  hydration.
- .NET: ProviderDefinedTypeRegistry.Build() now also scans loaded
  assemblies for [ProviderDefined] types and registers them for hydration.
- Redistribute PDT integration tests into GremlinDriverIntegrateTest and
  add unit coverage for register(Class<?>...).
- Document per-language round-trip behavior and the no-arg/settable-fields
  requirement in the provider docs.

Assisted-by: Kiro:claude-opus-4.8
- Python: enforce String-only keys on ProviderDefinedType property maps to
  match the other GLVs, with a unit test.
- Docs: document the non-empty type name and String property-key constraints
  inline in the provider PDT section (covering all GLVs); add a JPMS note
  explaining annotation-based hydration needs the provider module to open its
  package to gremlin-core; integrate the CustomTypeSerializer breaking-change
  migration guidance into the existing 4.x upgrade entry.
- All GLVs: add a maintenance comment and a lightweight test verifying that a
  registered adapter takes precedence over the @ProviderDefined annotation
  during dehydration (Java, Python, Go, .NET).
- Remove the now-orphaned GraphBinaryIo class (all references were removed
  earlier in this PR).
- Remove the unused maven-jar-plugin test-jar execution from gremlin-server.

Assisted-by: Kiro:claude-opus-4.8
- Auto-wire an SPI-discovered ProviderDefinedTypeRegistry by default on the
  Java driver path so registered adapters hydrate/dehydrate with zero config:
  the default GraphBinaryMessageSerializerV4 and DriverRemoteConnection now use
  ProviderDefinedTypeRegistry.create(). setPdtRegistry and explicit serializer
  registries remain as overrides.
- Rename the registry factory build() -> create() across Java, Python, and
  .NET (Go's NewPDTRegistry and JS's constructor are already idiomatic). The
  old name misleadingly implied the Builder pattern.
- Rename the PDT field map terminology properties -> fields across all GLVs
  (getFields/Fields, toFields/fromFields) to match the @ProviderDefined
  annotation and the GraphSON wire format, and disambiguate from Element
  properties. Wire formats and binary layouts are unchanged.
- Document that equals()/hashCode() intentionally exclude the transient
  hydrated field.
- Move the PDT GraphBinary serializer tests from gremlin-core to gremlin-util
  and delete the bespoke HeapBuffer, reusing the netty-backed buffer pattern.
- Update provider and reference docs for the renames.

Assisted-by: Kiro:claude-opus-4.8
…ne section

Fold the JPMS opens guidance into the existing Java hydration subsection as a
concise NOTE rather than a separate section, and align the provider factory
example with the build()->create() rename.

Assisted-by: Kiro:claude-opus-4.8
… types

ProviderDefinedAttribute.HydrateIfRegistered is called directly from
GraphBinaryReader.ReadAsync but lacked a try/catch and did not recurse into
nested ProviderDefinedType values. An unconvertible field (including a nested
PDT value such as Person.address) caused Convert.ChangeType to throw, which
propagated out of the reader and failed deserialization of the entire response
stream.

Wrap construction/population in try/catch returning the raw PDT on failure, and
recursively resolve nested PDT-valued fields before assignment, matching the
fallback contract already used by ProviderDefinedTypeRegistry.Hydrate. Adds
unit tests for nested hydration, conversion failure, unregistered nested types,
and reader-level no-throw behavior.

Assisted-by: Kiro:claude-opus-4.8
Previously every GLV's PDT hydration short-circuited when the outer
ProviderDefinedType had no registered adapter, so a registered/annotated inner
type nested inside an unregistered outer was never hydrated (it came back as a
raw ProviderDefinedType). Dehydration already handled this correctly via
per-value recursion in the translators.

Fix the hydration path in all GLVs (Java, Python, Go, .NET, JavaScript) to
always recurse into the outer's fields and hydrate nested registered types,
returning the outer raw (identity preserved when nothing nested changed) with
its inner registered types hydrated. In .NET this applies to both the registry
and the @ProviderDefined annotation hydration paths.

Adds nested-registration contract tests (registered inner inside unregistered
outer, both hydration and dehydration directions) across all GLVs.

Also completes the properties->fields rename in the JavaScript GLV, which was
missed when the other GLVs were updated, keeping the PDT field-map terminology
consistent across all languages.

Assisted-by: Kiro:claude-opus-4.8
…h properties->fields rename

Two issues surfaced only in a full 'mvn install' that builds the test-server
Docker image and runs the GLV integration tests:

- Restore the maven-jar-plugin test-jar execution in gremlin-server. It had been
  removed as 'vestigial', but docker/gremlin-test-server/Dockerfile copies
  gremlin-server-*-tests.jar to put the PDT test fixtures (Point/Person/Address)
  on the test server's classpath. Without it the image build failed, breaking
  every GLV's integration tests. Added a comment documenting the dependency.

- Complete the PDT properties->fields rename in the JavaScript and Python
  integration tests (client/traversal), which still read pdt.properties and were
  missed when the rename was applied to the unit tests. Graph Element .properties
  references are unchanged.

Verified with a full reactor build from gremlin-javascript onward: JS, Python,
.NET, and Go integration suites all pass.

Assisted-by: Kiro:claude-opus-4.8
Two corrections after rebasing onto master:

- Go: master refactored GraphTraversalSource.remoteConnection from the concrete
  *DriverRemoteConnection to a remoteConnection interface. Access the PDT
  registry via a type assertion to *DriverRemoteConnection instead of the
  no-longer-valid .settings field access on the interface.
- .NET: master removed IRemoteConnection.IsSessionBound; the conflict
  resolution had wrongly reintroduced it, breaking the implementing classes.
  Drop IsSessionBound, keeping only the PdtRegistry member (defaulted to null).

Assisted-by: Kiro:claude-opus-4.8
…T access

The rebase took master's 'protected readonly options', but PDT code in
anonymous-traversal.ts reads connection.options?.pdtRegistry from outside the
class, which requires public access. Restore 'public readonly options' while
keeping master's ConnectionOptions type (which already includes pdtRegistry).
… tests

The PDT public member is 'fields' in all five GLVs (getFields/Fields/.fields),
but several 'properties' references were missed in the rename:

- Fix a real codegen bug: GoTranslateVisitor emitted ProviderDefinedType
  with a 'Properties:' field, but the Go struct field is 'Fields' -- the
  generated Go did not compile. Update the visitor and its translator test.
- Rename the literal-visitor validation message 'PDT properties map must
  have String keys' to 'PDT fields map ...' (and the matching assertion).
- Align AsciiDoc descriptions (gremlin-variants, release-4.x.x, provider
  index) that described the PDT data as a 'properties' map/dict/dictionary
  with the 'fields' member they already show in adjacent code examples.
- Align .NET IncludedFields/ExcludedFields XML-doc wording to 'field names'.
- Rename PDT test methods/descriptions/comments referencing the member in
  Java, .NET, JS, and Python.

Element.properties / .properties() traversal steps, CLR PropertyInfo
reflection, and local variable names were intentionally left unchanged.

Assisted-by: Kiro:claude-opus-4.8
Follow-up to the properties->fields rename: the public PDT member is 'fields'
in all GLVs, but local variables, lambda/function parameters, doc example
parameters, and a few test descriptions still used 'props'/'properties'.
Rename them for consistency across:

- Production: ProviderDefinedType.from() local map (Java); the reflection-based
  RegisterType FromFields parameter (Go).
- Docs: Go hydrate/dehydrate func parameter (props -> fields) and the JS
  deserialize lambda parameter in gremlin-variants and provider/index examples.
- Tests: PDT adapter fromFields parameters and PDT field-map locals across
  Java (gremlin-core, gremlin-util, gremlin-server IT), .NET, Python, and JS,
  plus PDT-related test descriptions.

Element.properties / .properties() traversal steps and unrelated local names
were left unchanged. All affected unit tests pass.

Assisted-by: Kiro:claude-opus-4.8
The dehydration logic in ArgAsString had inverted naming: the CLR
PropertyInfo[] was bound to a variable named 'fields' while the PDT field
map (passed to ProviderDefinedType) was named 'props'. Swap them so the PDT
field map is 'fields' (matching the public member) and the CLR reflection
objects are 'properties'/'property'. GetPdtInfo's internal PropertyInfo
locals are left as-is since they accurately describe CLR properties.

Assisted-by: Kiro:claude-opus-4.8
The grep audit surfaced PDT-context 'props' identifiers missed by earlier
passes (which focused on Java/.NET/docs/JS-unit). Rename the PDT field maps
and adapter parameters to 'fields' in:

- Java core GremlinLang dehydration adapter path.
- Go: gremlinlang.go (PDT literal translate + adapter dehydrate),
  graphBinaryDeserializer.go (PDT reader), pdtRegistry.go (fromProps/toProps
  -> fromFields/toFields), and the Go PDT test lambdas.
- Python: traversal.py adapter dehydrate, graphbinaryV4.py _hydrate_decorated,
  and the PDT unit-test/conftest lambdas.
- JS integration test deserialize lambda.

Element/Edge/Vertex property reads in graphBinaryDeserializer.go and
graphbinaryV4.py, and CLR PropertyInfo reflection, were left unchanged.

Assisted-by: Kiro:claude-opus-4.8
@Cole-Greer Cole-Greer merged commit 6c34b26 into master Jun 12, 2026
68 of 69 checks passed
@Cole-Greer Cole-Greer deleted the PDT branch June 12, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants