gh-150228: Improve the PEP 829 batch processing APIs#150542
Conversation
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this exposes the batch processing APIs for addsitedir() and friends. We remove the `defer_processing_start_files` flag which required some implicit module global state, and promote StartupState to the public documented API. This removes the need for module global implicit state and allows callers to control when accumulated .start and .pth file state is processed if they want. This also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior. Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on.
Documentation build overview
10 files changed ·
|
…NPiO-.rst Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
…NPiO-.rst Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
* Add a note that if known_paths is provided to StartupState.__init__(), it will get mutated in place. * Improve some conditional flows. * Improve some comments. * Improve the what's new entry.
| Apply the accumulated state by first adding the path extensions to | ||
| :data:`sys.path`, then executing the :file:`.start` file entry points | ||
| and :file:`.pth` file ``import`` lines (:ref:`deprecated |
There was a problem hiding this comment.
I think the ordering here does not match the code in site.py's process():
self._extend_syspath()
self._exec_imports()
self._execute_start_entrypoints()
make sure this and the docstring and the code all agree. (claude /review flagged this)
There was a problem hiding this comment.
I worded it this way because a) I want to consistently emphasize the .start files over the .pth file import lines, and b) because it read better with the parenthesized deprecation note. I've tried to be consistent about a) but maybe I can find a better way to phrase it.
| Instances of this class are used as an accumulator for interpreter startup | ||
| configuration data, such as ``.pth`` and ``.start`` files, from one or more | ||
| site directories. These are used to batch the processing of these startup | ||
| files. The optional *known_paths* argument is a set of case-normalized |
There was a problem hiding this comment.
How does one get "a set of case-normalized paths" if not using the default? As a public API that feels like a footgun. Accepting an iterable or sequence of paths and doing that normalization for the user would be easier to use.
| (the default), this set is built from the current :data:`sys.path`. | ||
| :func:`main` implicitly uses an instance of this class. | ||
|
|
||
| .. method:: process() |
There was a problem hiding this comment.
API question: calling this twice will re-exec imports and entry point code twice. At a minimum, document that this should not be done and is undefined behavior.
Ideally protect against it. Can we have it consume the internal state (draining the internal path entries, import execs, and entrypoints) so that a repeat call is a no-op?
| # (callers can pass an empty set), and multiple StartupState | ||
| # instances against the same sys.path don't share state, so always | ||
| # do a final anti-duplication check. | ||
| if dir_ in sys.path: |
There was a problem hiding this comment.
lack of case normalization and case insensitive filesystems could trip this backstop check up?
| if startup_state is not None: | ||
| # Explicit batch mode: accumulate startup data in the caller's state. | ||
| # The caller is responsible for calling startup_state.process(). | ||
| known_paths = startup_state._known_paths |
There was a problem hiding this comment.
technically an unused assignment given flush_now = False and what gets returned below. maybe del known_paths instead so it becomes an obvious error if that logic changes? or consider if flush_now and known_paths need to co-exist.
| and :file:`.pth` file ``import`` lines (:ref:`deprecated | ||
| <site-pth-files>`). | ||
|
|
||
| .. versionadded:: 3.15 |
There was a problem hiding this comment.
this should be moved up to apply to the whole class, not the method.
| indexes = [ | ||
| sys.path.index(path) for path in ( | ||
| self.sitedir, extdir1, sitedir2, extdir2 | ||
| )] |
There was a problem hiding this comment.
| indexes = [ | |
| sys.path.index(path) for path in ( | |
| self.sitedir, extdir1, sitedir2, extdir2 | |
| )] | |
| indexes = [ | |
| sys.path.index(path) for path in ( | |
| self.sitedir, extdir1, sitedir2, extdir2) | |
| ] |
|
When you're done making the requested changes, leave the comment: |
ncoghlan
left a comment
There was a problem hiding this comment.
My approval is for the updated API design - much tidier without the implicit global state. Thanks @warsaw!
For the exact implementation and docs details, +1 to @gpshead's comments and questions (I don't have any strong opinions on how the open questions should be resolved, I just agree there are some details still to be tweaked for consistency)
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this exposes the batch processing APIs for addsitedir() and friends. We remove the
defer_processing_start_filesflag which required some implicit module global state, and promote StartupState to the public documented API. This removes the need for module global implicit state and allows callers to control when accumulated .start and .pth file state is processed if they want.This also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior.
Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on.