Skip to content

Add type annotations for primary key types by model [FC-0117]#534

Open
bradenmacdonald wants to merge 24 commits intomainfrom
braden/typed-pks
Open

Add type annotations for primary key types by model [FC-0117]#534
bradenmacdonald wants to merge 24 commits intomainfrom
braden/typed-pks

Conversation

@bradenmacdonald
Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald commented Apr 7, 2026

This PR has some experimental implementations of #436 as well as moving closer toward the consistent terms preferred in OEP-68.

Second Approach: fully-typed .id, avoid .pk

(Revised from my initial attempt)

  • PublishableEntity, LearningPackage, Container, and Component all get a fully-typed .id field.
    • PublishableEntity.id is of type PublishableEntity.PK which is a sub-type of int.
    • Container.id is of type Container.PK which is a sub-type of PublishableEntity.PK.
    • Component.id is of type Component.PK which is a sub-type of PublishableEntity.PK.
    • LearningPackage.id is of type LearningPackage.PK which is a sub-type of int.
    • Thanks to django-stubs, these types generally get propagated and used correctly pretty much everywhere.
    • Example:
      Screenshot 2026-04-07 at 2 18 09 PM
  • Likewise, CatalogCourse and CourseRun get typed.
  • Unfortunately, it is impossible to change the type of .pk on models that inherit PublishableEntityMixin (or any other models that use a OneToOneField as their primary_key, at least without modifying django-stubs or implementing a mypy plugin. It will always have type Any, circumventing all type definitions. However, although we cannot override its type, we can mark it as deprecated. As a result, and for consistency, accessing .pk on any of these will raise a deprecation warning in pytest/console and show a warning in your IDE:
    Screenshot 2026-04-07 at 2 11 26 PM
  • I did not (yet?) add a deprecation warning for CatalogCourse.pk and CourseRun.pk, because (a) they are typed correctly, and (b) mypy crashes when I do 😬.

Prior Approach: fully-typed .pk, avoid .id

Expand for pros/cons/details of this earlier approach In this earlier approach:
  • PublishableEntity's primary key is the .pk field, and it has type PublishableEntity.PK which is a sub-type of int.

    • Example: Screenshot 2026-04-07 at 12 59 30 PM
    • In the approach I took, accessing some_entity.id will show a deprecation warning, telling you to use pk instead. And since this is an actual field called pk, any DB lookups need to be changed from e.g. entity__id=... to entity__pk=...
      pk
    • A slightly less disruptive approach would be to keep the ID field called id and make an alias called entity_pk, then use deprecation warnings to encourage the use of entity_pk over id and pk; this would be more consistent with what I did for containers and components, but it's slightly annoying to type entity_pk instead of pk everywhere.
    • Keeping the ID field called id and making pk a fully-typed alias is unfortunately not possible, as django-stubs forcibly overrides the type of pk unless you have a concrete field named pk as I did here.
  • Container and Component get Container.PK and Component.PK types which are sub-types of PublishableEntity.PK

  • Container and Component cannot have correctly-typed .pk accessors due to django-stubs limitations, so I created a .container_pk and .component_pk accessor for them:
    container_pk

    • Details: because Container and Component have a OneToOneField(PublishsableEntity, primary_key=True), django-stubs will always override their pk attribute to have type Any, and it's impossible to change this without modifying django-stubs or writing our own mypy plugin. But, we can use the @deprecated decorator to annotate the attribute with a deprecation warning which does show up in the IDE and in the pytest logs wherever .pk is used.
    • I think this is clear and explicit, but would something shorter be better (.pk_ ?)

General Notes

One downside of using deprecation warnings as I have here is that Django itself uses .pk, so in the case of Container and Component (but not PublishableEntity) where we've marked .pk as deprecated, even if our code doesn't use .pk itself, you'll see warnings like this when running pytest: django/db/models/fields/related.py:812: DeprecationWarning: Use component_pk instead. It is possible to suppress these deprecation warnings, and I have done so (see tox.ini) but it's an extra step required to use this library well and I'm not a big fan of it.

@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Apr 7, 2026
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@bradenmacdonald
Copy link
Copy Markdown
Contributor Author

@ormsbee @kdmccormick Can I get your thoughts on this? It also has implications for OEP-68.

@kdmccormick kdmccormick self-requested a review April 7, 2026 20:12
Comment on lines +146 to +150
# Note: Django-Stubs forces mypy to identify the `.pk` attribute of this model as having 'Any' type (due to our
# use of a OneToOneField primary key), and this is impossible for us to override, so we prefer to use
# `.id` which we can control fully.
# Since Django uses '.pk' internally, we have to make sure it still works, however. So the best we can do is
# override this with a deprecated marker, so it shows a warning in developer's IDEs like VS Code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. Very timely to know this given the OEP in flight on identifiers, where we recommend using .pk.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as you can read above in the edited description, I actually changed my approach here, as it turns out it's much easier to override the type of .id than .pk in general. So I'm wondering if we should just recommend .id / _id more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, good find. I've opened a thread here: openedx/openedx-proposals#773 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we standardize on .id, then I presume we can drop this deprecation warning? I think most developers naturally use .id anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that .pk was being used in quite a few places in this codebase, and the deprecation warnings made it easy for me to track them down and replace them with .id.

@bradenmacdonald
Copy link
Copy Markdown
Contributor Author

If we want to also introduce typed keys for Unit/Section/Subsection subclasses, that's easy to do: #535 . Would welcome thoughts on this as I'm a bit on the fence.

entities for different learners or at different times.
"""

PK: TypeAlias = ContainerPK
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we standardize on id, I think it'd make sense to call the types Container.ID, ContainerID, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated to Container.ID/ContainerID etc.

@bradenmacdonald
Copy link
Copy Markdown
Contributor Author

bradenmacdonald commented Apr 8, 2026

OK, good news! I found a nicer way to do this. It still requires a custom field type per model, but we can put the definitions entirely within the model class itself, and there is now one global id_fields module in openedx_django_lib rather than needing id_fields.py and id_fields.pyi alongside each app[let]'s models.

e.g. within the class LearningPackage(models.Model): definition:

LearningPackageID = NewType("LearningPackageID", int)
type ID = LearningPackageID
class IDField(TypedBigAutoField[ID]):
pass
id = IDField(primary_key=True)

Note:

This could be made even more concise, and copy-pastable without changes:

ID = NewType("ID", int) 
class IDField(TypedBigAutoField[ID]): ...
id = IDField(primary_key=True) 

But: this creates ambiguous type names (what is "ID", without context?) in the mypy output, and the use of an ellipsis instead of pass implies that something is incomplete. So I'm recommending the more verbose version as our new standard boilerplate.

Edit: actually this works great, so I think I prefer the second version now. If you have several types called ID, mypy distinguished them in the output like so:

error: Argument 1 to "update_learning_package" has incompatible type
"openedx_content.applets.publishing.models.learning_package.LearningPackage.ID"; expected
"openedx_content.applets.publishing.models.publishable_entity.PublishableEntity.ID"  [arg-type]

@bradenmacdonald bradenmacdonald added the FC Relates to an Axim Funded Contribution project label Apr 9, 2026
@bradenmacdonald bradenmacdonald changed the title Add type annotations for primary key types by model Add type annotations for primary key types by model [FC-0117] Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U

Projects

Status: In Eng Review

Development

Successfully merging this pull request may close these issues.

5 participants