Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mypyc/doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ generate fast code.
librt_base64
librt_strings
librt_time
librt_vecs

.. toctree::
:maxdepth: 2
Expand Down
2 changes: 2 additions & 0 deletions mypyc/doc/librt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ Follow submodule links in the table to a detailed description of each submodule.
- String and bytes utilities
* - :doc:`librt.time <librt_time>`
- Time utilities
* - :doc:`librt.vecs <librt_vecs>`
- Fast growable array type ``vec``

Installing librt
----------------
Expand Down
253 changes: 253 additions & 0 deletions mypyc/doc/librt_vecs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
librt.vecs
==========

The ``librt.vecs`` module defines the ``vec`` type, a low-level, uniform growable array type.
It's part of the ``librt`` package on PyPI.

When constructing a ``vec``, the item type ``T`` is always explicitly given via ``vec[T]``::

from librt.vecs import append, vec

v = vec[float]([1.0, 2.5]) # Construct vec[float] with two items

``vec`` supports many sequence operations, though it's not a full sequence type::

len(v) # 2
v[0] # 1.0
v[-1] # 2.5
for x in v:
print(x)

The length of each ``vec`` value is immutable. Appending an item is still a fast operation,
but it returns a new ``vec`` value::

v = append(v, -0.5)
print(v) # vec[float]([1.0, 2.5, -0.5])

``vec`` only supports simple, uniform item types. It uses an efficient packed binary encoding
for these *value item types*:

* ``mypy_extensions.i64`` (signed 64-bit integer)
* ``mypy_extensions.i32`` (signed 32-bit integer)
* ``mypy_extensions.i16`` (signed 16-bit integer)
* ``mypy_extensions.u8`` (unsigned byte)
* ``float`` (64-bit float)
* ``bool``

``int`` is not a valid item type, since it has an arbitrary precision, and vec is an
efficiency-focused type. Use one of the fixed-length integer types instead.

Class item types (e.g. ``str`` or ``MyNativeClass``) are represented as regular object references.
Optional class item types (e.g. ``str | None``) are supported for convenience, but arbitrary
union types are not supported as item types. Nested vecs are supported, e.g. ``vec[vec[i64]]``.

A vec value is often used as an efficient alternative to ``list`` or ``array.array`` in code
compiled using mypyc. Its primary advantages are an efficient packed memory representation
for value item types and very fast inlined get and set item operations.

Vec instances perform runtime checking of item types. Since values of type variables are
not available at runtime (they are *erased*), type variables can't be used as item types.

A vec value is effectively an immutable (length, buffer) pair. This means that any operation
that changes the length of a vec, including ``append`` as we saw above, returns a modified
value.

.. note::
An immutable length allows more efficient code to be generated by mypyc, and vec values
can be allocated to machine registers effectively. However, vec values must be boxed
if used in a non-native context, such as if added to a list or dict.

Here are some examples of valid vec types:

.. list-table::
:header-rows: 1

* - Type
- Item representation
* - ``vec[i32]``
- Packed 32-bit integers
* - ``vec[float]``
- Packed 64-bit floats
* - ``vec[str]``
- Object references
* - ``vec[vec[u8]]``
- Packed vec values

The ``vec`` class
-----------------

.. class:: vec[T](items: Iterable[T] = ..., *, capacity: i64 = ...)

A generic growable array type. The runtime type parameter ``T`` used when
calling ``vec[T](...)`` determines the element type.

The ``capacity`` parameter allows defining the minimum initial
capacity of the buffer, some of which may be unused after
construction. Unused capacity allows fast ``append`` and ``extend``
operations that don't need to reallocate the buffer. Actual capacity
will be larger than ``capacity`` if ``items`` has more than ``capacity``
items.

Construction from ``list`` and ``tuple`` objects is optimized.
Also, for value item types, construction from an object that implements
the buffer protocol is optimized (such as ``bytes``), if the format
is compatible with the vec item type.

Mypyc treats ``vec[T]([x] * n)`` as a special form. For example,
``vec[u8]([0] * n)`` constructs a zero-initialized vec object
efficiently, without building an intermediate list. There are
also other constructor-related special forms -- see `Special
forms`_ below.

It's an error to construct a ``vec`` object without providing an
item type: ``vec()`` raises an exception.

.. describe:: len(v) → i64

Return the length of ``v``.

.. describe:: v[i] → T

Return item at index ``i`` (index may be negative).

.. describe:: v[i:j] → vec[T]

Return a slice. This constructs a new ``vec`` object. ``i`` and ``j`` may be negative.

.. describe:: v[i] = o

Assign to an item (index may be negative).

.. describe:: o in v → bool

Return True if ``v`` contains ``o``.

.. describe:: for o in v

Iterate over items.

.. describe:: memoryview(v)

``vec`` implements the buffer protocol, but only for value item types that use a
packed representation.

Functions
---------

Since the following operations return a modified value, they are module-level functions
instead of methods.

.. function:: append(v: vec[T], o: T) -> vec[T]

Return ``v`` with item ``o`` appended to it. If ``v`` has unused capacity, reuse
the existing buffer. The time complexity is O(1) on average. Example::

v = vec[i32]()
v = append(v, 1)

.. function:: extend(v: vec[T], it: Iterable[T]) -> vec[T]

Return ``v`` with all items from iterable ``it`` appended to it. If ``v`` has sufficient
unused capacity, reuse the existing buffer. The time complexity is O(n) on average,
where n is the length of ``it``. Example::

v = vec[u8]()
v = extend(v, b"foo")

.. function:: remove(v: vec[T], o: T) -> vec[T]

Return ``v`` with the first instance of item ``o`` removed. Reuse the buffer
from ``v``. Raise ``ValueError`` if value doesn't exist. Example::

v = vec[i32]([1, 2, 3])
v = remove(v, 2)
# v has items [1, 3]

.. function:: pop(v: vec[T], i: i64 = -1) -> tuple[vec[T], T]

Return ``(new_v, item)``, where ``item`` is the value at index ``i`` and
``new_v`` is ``v`` with that item removed. Reuse the buffer from ``v``.
Example::

v = vec[i32]([1, 2, 3])
v, x = pop(v)
# x is 3; v has items [1, 2]

Special forms
--------------

Certain combinations of operations that would be multiple separate operations in
regular Python are guaranteed to be compiled by mypyc to direct operations
with no unnecessary temporary objects.

.. list-table::
:header-rows: 1

* - Special form
- Description
* - ``vec[T]()``
- Construct empty vec with no buffer. This doesn't perform any dynamic allocation
(at least for non-nested vecs).
* - ``vec[T]([element1, ...])``
- Directly construct a vec object with given items, without a temporary list.
* - ``vec[T]([element1] * n)``
- Directly construct a vec with length n, without any temporary list.
* - ``vec[T]([<expr> for ... in <expr>])``
- Vec comprehension creates no temporary list.

Thread safety
-------------

In free-threaded Python builds, it's unsafe to write or modify an item if other
threads might be concurrently accessing *the same item*. For example, writing ``v[4]``
is not safe to do if another thread might be reading ``v[4]``. Similarly, two
threads concurrently calling ``append`` or ``remove`` on the same vec object is not safe.

This is different from list objects, since vec is a lower-level type where implicit
synchronization would have a significant performance cost. However, since vec lengths
are immutable, some race conditions that lists can be susceptible to are not possible
with vecs.

Implementation details
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention some of the potential surprises related to sharing the same buffer among multiple vecs?
like an example where after creating an alias of a vec object, modifying either the original or the alias initially affects both, but that's not guaranteed to continue because one of the vecs might reallocate its buffer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a discussion of this with some examples.

----------------------

In a native context, such as in a local variable or a parameter in a native function,
or in an attribute of a native class, vec values are implemented as value objects with two
fields: length and buffer. The buffer is a normal Python object, but it's not directly
accessible to users. If a vec object is empty, no buffer object is required. This means that
empty vecs are particularly efficient in a native context (usually 16 bytes).

A packed representation is used for buffers with supported value item types, including for
nested vecs. The packed representation is much more efficient than a Python list object, and
it's also significantly more efficient than ``array.array`` for small sequences.

Multiple vec values can share the same underlying buffer. For example, assigning a vec
to another variable creates an alias that refers to the same buffer::

v = vec[i32]([1, 2, 3], capacity=3)
w = v # v and w share the same buffer

w[0] = 99
print(v[0]) # 99 -- both see the change

However, this sharing is not guaranteed to persist if there are operations that change
the length (such as ``append``). These may reallocate the buffer, breaking the sharing
silently::

v = append(v, 4) # reallocates the buffer since there is no free capacity
v[0] = 0
print(w[0]) # still 99 -- v and w no longer share a buffer

Comment thread
JukkaL marked this conversation as resolved.
If you need independent copies, use slicing (``v[:]``) to explicitly create a vec with
its own buffer. It's not recommended to rely on the details of buffer reallocation,
as these might change between ``librt`` releases.

Using vecs outside compiled code
--------------------------------

``vec`` is fully supported in non-compiled code, but ``vec`` values will be boxed in such
non-native contexts. There will be always two objects, a boxed vec object and a buffer object,
whereas in native contexts usually only the buffer is a dynamically allocated object.
``vec`` is primarily useful in code compiled using mypyc, and it's been heavily optimized
for this use case. There may be no performance benefit in interpreted code over using
``list`` or ``array.array``.
Loading