Feature Request
Description of Problem:
Perspective expects arrow data to be loaded as an ArrayBuffer in Javascript, and a binary string in Python. This requires, in PyArrow at least, a few lines to convert an Arrow Table into binary:
stream = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(stream, arrow_table.schema)
writer.write_table(arrow_table)
writer.close()
perspective_table = perspective.Table(stream.getvalue().to_pybytes())
When loading arrow Tables, I expect Perspective to be compatible with Arrow Tables without having to do any conversion. The requirement of bytes for an Arrow binary is outlined slightly in the Python user guide, but the conversion process from an Arrow Table -> bytes is unclear.
Potential Solutions:
Write in an Arrow Table to binary conversion layer solely in the binding layer (using PyArrow or Arrow typescript), which would be simple but incomplete (would require reimplementation in future binding languages) and less performant. If there is a way to transfer a pointer to an arrow table from the binding layer into C++, then we should be able to write the conversion entirely in C++. We already malloc and memcpy the arrow binary from JS/Python into C++, so there might be something already there worth looking into.
Feature Request
Description of Problem:
Perspective expects arrow data to be loaded as an ArrayBuffer in Javascript, and a binary string in Python. This requires, in PyArrow at least, a few lines to convert an Arrow
Tableinto binary:When loading arrow Tables, I expect Perspective to be compatible with Arrow Tables without having to do any conversion. The requirement of
bytesfor an Arrow binary is outlined slightly in the Python user guide, but the conversion process from an Arrow Table -> bytes is unclear.Potential Solutions:
Write in an Arrow Table to binary conversion layer solely in the binding layer (using PyArrow or Arrow typescript), which would be simple but incomplete (would require reimplementation in future binding languages) and less performant. If there is a way to transfer a pointer to an arrow table from the binding layer into C++, then we should be able to write the conversion entirely in C++. We already malloc and memcpy the arrow binary from JS/Python into C++, so there might be something already there worth looking into.