Skip to content

gh-150228: Improve the PEP 829 batch processing APIs#150542

Open
warsaw wants to merge 10 commits into
python:mainfrom
warsaw:gh150228
Open

gh-150228: Improve the PEP 829 batch processing APIs#150542
warsaw wants to merge 10 commits into
python:mainfrom
warsaw:gh150228

Conversation

@warsaw
Copy link
Copy Markdown
Member

@warsaw warsaw commented May 28, 2026

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this exposes the batch processing APIs for addsitedir() and friends. We remove the defer_processing_start_files flag which required some implicit module global state, and promote StartupState to the public documented API. This removes the need for module global implicit state and allows callers to control when accumulated .start and .pth file state is processed if they want.

This also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on.

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this exposes the batch processing APIs for addsitedir() and friends.  We
remove the `defer_processing_start_files` flag which required some implicit
module global state, and promote StartupState to the public documented API.
This removes the need for module global implicit state and allows callers to
control when accumulated .start and .pth file state is processed if they want.

This also fixes the interleaving regression identified by @ncoghlan in the
same issue.  Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.
@warsaw warsaw self-assigned this May 28, 2026
@warsaw warsaw requested a review from FFY00 as a code owner May 28, 2026 01:35
@warsaw warsaw added the 3.15 pre-release feature fixes, bugs and security fixes label May 28, 2026
@warsaw warsaw requested a review from AA-Turner as a code owner May 28, 2026 01:35
@warsaw warsaw added 3.16 new features, bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels May 28, 2026
@warsaw warsaw requested review from hugovk and ncoghlan May 28, 2026 01:35
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 28, 2026

Comment thread Misc/NEWS.d/next/Library/2026-05-27-11-18-36.gh-issue-150228.pNPiO-.rst Outdated
Comment thread Misc/NEWS.d/next/Library/2026-05-27-11-18-36.gh-issue-150228.pNPiO-.rst Outdated
warsaw and others added 2 commits May 27, 2026 22:51
…NPiO-.rst

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
…NPiO-.rst

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@warsaw warsaw requested a review from hugovk May 28, 2026 05:52
warsaw added 4 commits May 29, 2026 12:22
* Add a note that if known_paths is provided to StartupState.__init__(), it
  will get mutated in place.
* Improve some conditional flows.
* Improve some comments.
* Improve the what's new entry.
Comment thread Lib/test/test_site.py
Comment thread Doc/library/site.rst
Comment on lines +371 to +373
Apply the accumulated state by first adding the path extensions to
:data:`sys.path`, then executing the :file:`.start` file entry points
and :file:`.pth` file ``import`` lines (:ref:`deprecated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ordering here does not match the code in site.py's process():

    self._extend_syspath()
    self._exec_imports()
    self._execute_start_entrypoints()

make sure this and the docstring and the code all agree. (claude /review flagged this)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worded it this way because a) I want to consistently emphasize the .start files over the .pth file import lines, and b) because it read better with the parenthesized deprecation note. I've tried to be consistent about a) but maybe I can find a better way to phrase it.

Comment thread Doc/library/site.rst


.. function:: addsitedir(sitedir, known_paths=None, *, defer_processing_start_files=False)
.. class:: StartupState(known_paths=None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should StartupState.read_pth_file() and StartupState.read_start_file() be documented? they're public by API name. if not, why not make them _private named?

Comment thread Lib/site.py
Comment on lines 244 to 246
The internal data is intentionally private; the public methods
(read_pth_file, read_start_file, process) are the only supported write
APIs.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... documented as public here but not in the site.rst docs as methods?

Comment thread Doc/library/site.rst
Instances of this class are used as an accumulator for interpreter startup
configuration data, such as ``.pth`` and ``.start`` files, from one or more
site directories. These are used to batch the processing of these startup
files. The optional *known_paths* argument is a set of case-normalized
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one get "a set of case-normalized paths" if not using the default? As a public API that feels like a footgun. Accepting an iterable or sequence of paths and doing that normalization for the user would be easier to use.

Comment thread Doc/library/site.rst
(the default), this set is built from the current :data:`sys.path`.
:func:`main` implicitly uses an instance of this class.

.. method:: process()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API question: calling this twice will re-exec imports and entry point code twice. At a minimum, document that this should not be done and is undefined behavior.

Ideally protect against it. Can we have it consume the internal state (draining the internal path entries, import execs, and entrypoints) so that a repeat call is a no-op?

Comment thread Lib/site.py
# (callers can pass an empty set), and multiple StartupState
# instances against the same sys.path don't share state, so always
# do a final anti-duplication check.
if dir_ in sys.path:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lack of case normalization and case insensitive filesystems could trip this backstop check up?

Comment thread Lib/site.py
if startup_state is not None:
# Explicit batch mode: accumulate startup data in the caller's state.
# The caller is responsible for calling startup_state.process().
known_paths = startup_state._known_paths
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically an unused assignment given flush_now = False and what gets returned below. maybe del known_paths instead so it becomes an obvious error if that logic changes? or consider if flush_now and known_paths need to co-exist.

Comment thread Doc/library/site.rst
and :file:`.pth` file ``import`` lines (:ref:`deprecated
<site-pth-files>`).

.. versionadded:: 3.15
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be moved up to apply to the whole class, not the method.

Comment thread Lib/test/test_site.py
Comment on lines +1533 to +1536
indexes = [
sys.path.index(path) for path in (
self.sitedir, extdir1, sitedir2, extdir2
)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indexes = [
sys.path.index(path) for path in (
self.sitedir, extdir1, sitedir2, extdir2
)]
indexes = [
sys.path.index(path) for path in (
self.sitedir, extdir1, sitedir2, extdir2)
]

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Copy link
Copy Markdown
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval is for the updated API design - much tidier without the implicit global state. Thanks @warsaw!

For the exact implementation and docs details, +1 to @gpshead's comments and questions (I don't have any strong opinions on how the open questions should be resolved, I just agree there are some details still to be tweaked for consistency)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.15 pre-release feature fixes, bugs and security fixes 3.16 new features, bugs and security fixes awaiting merge needs backport to 3.15 pre-release feature fixes, bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants