-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[mypyc] Document librt.vecs #21437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+256
−0
Merged
[mypyc] Document librt.vecs #21437
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
d139fff
Start documenting librt.vecs
JukkaL dcdeffc
Improve vec docs
JukkaL 3cca60d
Updates
JukkaL 82bc37c
Various updates
JukkaL 79a45f8
More doc updates
JukkaL d779643
Minor tweaks
JukkaL 2550a65
More updates
JukkaL 6f9bd02
Updates
JukkaL 443edc4
Small updates
JukkaL e75651c
More small tweaks
JukkaL 71b9df1
More discussion about construction
JukkaL 5760191
Explain buffer sharing
JukkaL 20c4544
Update mypyc/doc/librt_vecs.rst
JukkaL File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,7 @@ generate fast code. | |
| librt_base64 | ||
| librt_strings | ||
| librt_time | ||
| librt_vecs | ||
|
|
||
| .. toctree:: | ||
| :maxdepth: 2 | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,253 @@ | ||
| librt.vecs | ||
| ========== | ||
|
|
||
| The ``librt.vecs`` module defines the ``vec`` type, a low-level, uniform growable array type. | ||
| It's part of the ``librt`` package on PyPI. | ||
|
|
||
| When constructing a ``vec``, the item type ``T`` is always explicitly given via ``vec[T]``:: | ||
|
|
||
| from librt.vecs import append, vec | ||
|
|
||
| v = vec[float]([1.0, 2.5]) # Construct vec[float] with two items | ||
|
|
||
| ``vec`` supports many sequence operations, though it's not a full sequence type:: | ||
|
|
||
| len(v) # 2 | ||
| v[0] # 1.0 | ||
| v[-1] # 2.5 | ||
| for x in v: | ||
| print(x) | ||
|
|
||
| The length of each ``vec`` value is immutable. Appending an item is still a fast operation, | ||
| but it returns a new ``vec`` value:: | ||
|
|
||
| v = append(v, -0.5) | ||
| print(v) # vec[float]([1.0, 2.5, -0.5]) | ||
|
|
||
| ``vec`` only supports simple, uniform item types. It uses an efficient packed binary encoding | ||
| for these *value item types*: | ||
|
|
||
| * ``mypy_extensions.i64`` (signed 64-bit integer) | ||
| * ``mypy_extensions.i32`` (signed 32-bit integer) | ||
| * ``mypy_extensions.i16`` (signed 16-bit integer) | ||
| * ``mypy_extensions.u8`` (unsigned byte) | ||
| * ``float`` (64-bit float) | ||
| * ``bool`` | ||
|
|
||
| ``int`` is not a valid item type, since it has an arbitrary precision, and vec is an | ||
| efficiency-focused type. Use one of the fixed-length integer types instead. | ||
|
|
||
| Class item types (e.g. ``str`` or ``MyNativeClass``) are represented as regular object references. | ||
| Optional class item types (e.g. ``str | None``) are supported for convenience, but arbitrary | ||
| union types are not supported as item types. Nested vecs are supported, e.g. ``vec[vec[i64]]``. | ||
|
|
||
| A vec value is often used as an efficient alternative to ``list`` or ``array.array`` in code | ||
| compiled using mypyc. Its primary advantages are an efficient packed memory representation | ||
| for value item types and very fast inlined get and set item operations. | ||
|
|
||
| Vec instances perform runtime checking of item types. Since values of type variables are | ||
| not available at runtime (they are *erased*), type variables can't be used as item types. | ||
|
|
||
| A vec value is effectively an immutable (length, buffer) pair. This means that any operation | ||
| that changes the length of a vec, including ``append`` as we saw above, returns a modified | ||
| value. | ||
|
|
||
| .. note:: | ||
| An immutable length allows more efficient code to be generated by mypyc, and vec values | ||
| can be allocated to machine registers effectively. However, vec values must be boxed | ||
| if used in a non-native context, such as if added to a list or dict. | ||
|
|
||
| Here are some examples of valid vec types: | ||
|
|
||
| .. list-table:: | ||
| :header-rows: 1 | ||
|
|
||
| * - Type | ||
| - Item representation | ||
| * - ``vec[i32]`` | ||
| - Packed 32-bit integers | ||
| * - ``vec[float]`` | ||
| - Packed 64-bit floats | ||
| * - ``vec[str]`` | ||
| - Object references | ||
| * - ``vec[vec[u8]]`` | ||
| - Packed vec values | ||
|
|
||
| The ``vec`` class | ||
| ----------------- | ||
|
|
||
| .. class:: vec[T](items: Iterable[T] = ..., *, capacity: i64 = ...) | ||
|
|
||
| A generic growable array type. The runtime type parameter ``T`` used when | ||
| calling ``vec[T](...)`` determines the element type. | ||
|
|
||
| The ``capacity`` parameter allows defining the minimum initial | ||
| capacity of the buffer, some of which may be unused after | ||
| construction. Unused capacity allows fast ``append`` and ``extend`` | ||
| operations that don't need to reallocate the buffer. Actual capacity | ||
| will be larger than ``capacity`` if ``items`` has more than ``capacity`` | ||
| items. | ||
|
|
||
| Construction from ``list`` and ``tuple`` objects is optimized. | ||
| Also, for value item types, construction from an object that implements | ||
| the buffer protocol is optimized (such as ``bytes``), if the format | ||
| is compatible with the vec item type. | ||
|
|
||
| Mypyc treats ``vec[T]([x] * n)`` as a special form. For example, | ||
| ``vec[u8]([0] * n)`` constructs a zero-initialized vec object | ||
| efficiently, without building an intermediate list. There are | ||
| also other constructor-related special forms -- see `Special | ||
| forms`_ below. | ||
|
|
||
| It's an error to construct a ``vec`` object without providing an | ||
| item type: ``vec()`` raises an exception. | ||
|
|
||
| .. describe:: len(v) → i64 | ||
|
|
||
| Return the length of ``v``. | ||
|
|
||
| .. describe:: v[i] → T | ||
|
|
||
| Return item at index ``i`` (index may be negative). | ||
|
|
||
| .. describe:: v[i:j] → vec[T] | ||
|
|
||
| Return a slice. This constructs a new ``vec`` object. ``i`` and ``j`` may be negative. | ||
|
|
||
| .. describe:: v[i] = o | ||
|
|
||
| Assign to an item (index may be negative). | ||
|
|
||
| .. describe:: o in v → bool | ||
|
|
||
| Return True if ``v`` contains ``o``. | ||
|
|
||
| .. describe:: for o in v | ||
|
|
||
| Iterate over items. | ||
|
|
||
| .. describe:: memoryview(v) | ||
|
|
||
| ``vec`` implements the buffer protocol, but only for value item types that use a | ||
| packed representation. | ||
|
|
||
| Functions | ||
| --------- | ||
|
|
||
| Since the following operations return a modified value, they are module-level functions | ||
| instead of methods. | ||
|
|
||
| .. function:: append(v: vec[T], o: T) -> vec[T] | ||
|
|
||
| Return ``v`` with item ``o`` appended to it. If ``v`` has unused capacity, reuse | ||
| the existing buffer. The time complexity is O(1) on average. Example:: | ||
|
|
||
| v = vec[i32]() | ||
| v = append(v, 1) | ||
|
|
||
| .. function:: extend(v: vec[T], it: Iterable[T]) -> vec[T] | ||
|
|
||
| Return ``v`` with all items from iterable ``it`` appended to it. If ``v`` has sufficient | ||
| unused capacity, reuse the existing buffer. The time complexity is O(n) on average, | ||
| where n is the length of ``it``. Example:: | ||
|
|
||
| v = vec[u8]() | ||
| v = extend(v, b"foo") | ||
|
|
||
| .. function:: remove(v: vec[T], o: T) -> vec[T] | ||
|
|
||
| Return ``v`` with the first instance of item ``o`` removed. Reuse the buffer | ||
| from ``v``. Raise ``ValueError`` if value doesn't exist. Example:: | ||
|
|
||
| v = vec[i32]([1, 2, 3]) | ||
| v = remove(v, 2) | ||
| # v has items [1, 3] | ||
|
|
||
| .. function:: pop(v: vec[T], i: i64 = -1) -> tuple[vec[T], T] | ||
|
|
||
| Return ``(new_v, item)``, where ``item`` is the value at index ``i`` and | ||
| ``new_v`` is ``v`` with that item removed. Reuse the buffer from ``v``. | ||
| Example:: | ||
|
|
||
| v = vec[i32]([1, 2, 3]) | ||
| v, x = pop(v) | ||
| # x is 3; v has items [1, 2] | ||
|
|
||
| Special forms | ||
| -------------- | ||
|
|
||
| Certain combinations of operations that would be multiple separate operations in | ||
| regular Python are guaranteed to be compiled by mypyc to direct operations | ||
| with no unnecessary temporary objects. | ||
|
|
||
| .. list-table:: | ||
| :header-rows: 1 | ||
|
|
||
| * - Special form | ||
| - Description | ||
| * - ``vec[T]()`` | ||
| - Construct empty vec with no buffer. This doesn't perform any dynamic allocation | ||
| (at least for non-nested vecs). | ||
| * - ``vec[T]([element1, ...])`` | ||
| - Directly construct a vec object with given items, without a temporary list. | ||
| * - ``vec[T]([element1] * n)`` | ||
| - Directly construct a vec with length n, without any temporary list. | ||
| * - ``vec[T]([<expr> for ... in <expr>])`` | ||
| - Vec comprehension creates no temporary list. | ||
|
|
||
| Thread safety | ||
| ------------- | ||
|
|
||
| In free-threaded Python builds, it's unsafe to write or modify an item if other | ||
| threads might be concurrently accessing *the same item*. For example, writing ``v[4]`` | ||
| is not safe to do if another thread might be reading ``v[4]``. Similarly, two | ||
| threads concurrently calling ``append`` or ``remove`` on the same vec object is not safe. | ||
|
|
||
| This is different from list objects, since vec is a lower-level type where implicit | ||
| synchronization would have a significant performance cost. However, since vec lengths | ||
| are immutable, some race conditions that lists can be susceptible to are not possible | ||
| with vecs. | ||
|
|
||
| Implementation details | ||
| ---------------------- | ||
|
|
||
| In a native context, such as in a local variable or a parameter in a native function, | ||
| or in an attribute of a native class, vec values are implemented as value objects with two | ||
| fields: length and buffer. The buffer is a normal Python object, but it's not directly | ||
| accessible to users. If a vec object is empty, no buffer object is required. This means that | ||
| empty vecs are particularly efficient in a native context (usually 16 bytes). | ||
|
|
||
| A packed representation is used for buffers with supported value item types, including for | ||
| nested vecs. The packed representation is much more efficient than a Python list object, and | ||
| it's also significantly more efficient than ``array.array`` for small sequences. | ||
|
|
||
| Multiple vec values can share the same underlying buffer. For example, assigning a vec | ||
| to another variable creates an alias that refers to the same buffer:: | ||
|
|
||
| v = vec[i32]([1, 2, 3], capacity=3) | ||
| w = v # v and w share the same buffer | ||
|
|
||
| w[0] = 99 | ||
| print(v[0]) # 99 -- both see the change | ||
|
|
||
| However, this sharing is not guaranteed to persist if there are operations that change | ||
| the length (such as ``append``). These may reallocate the buffer, breaking the sharing | ||
| silently:: | ||
|
|
||
| v = append(v, 4) # reallocates the buffer since there is no free capacity | ||
| v[0] = 0 | ||
| print(w[0]) # still 99 -- v and w no longer share a buffer | ||
|
|
||
|
JukkaL marked this conversation as resolved.
|
||
| If you need independent copies, use slicing (``v[:]``) to explicitly create a vec with | ||
| its own buffer. It's not recommended to rely on the details of buffer reallocation, | ||
| as these might change between ``librt`` releases. | ||
|
|
||
| Using vecs outside compiled code | ||
| -------------------------------- | ||
|
|
||
| ``vec`` is fully supported in non-compiled code, but ``vec`` values will be boxed in such | ||
| non-native contexts. There will be always two objects, a boxed vec object and a buffer object, | ||
| whereas in native contexts usually only the buffer is a dynamically allocated object. | ||
| ``vec`` is primarily useful in code compiled using mypyc, and it's been heavily optimized | ||
| for this use case. There may be no performance benefit in interpreted code over using | ||
| ``list`` or ``array.array``. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe mention some of the potential surprises related to sharing the same buffer among multiple vecs?
like an example where after creating an alias of a vec object, modifying either the original or the alias initially affects both, but that's not guaranteed to continue because one of the vecs might reallocate its buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a discussion of this with some examples.