What would you like to happen?
Currently, SparkReceiverIO reads data using a single worker because the Read transform initializes with Impulse.create(), which produces a single initial element. This creates a scalability bottleneck as all data ingestion is constrained to one machine, regardless of the available worker pool.
I would like to implement a parallel reading mechanism in SparkReceiverIO. This involves:
- Adding a withNumReaders(int) configuration method to the builder.
- Refactoring the implementation to use
Create.of(shards) followed by Reshuffle when numReaders > 1 is specified.
- Ensuring backward compatibility by defaulting to the single-reader behavior if
numReaders is unnecessary.
This enhancement will allow SparkReceiverIO to scale horizontally, significantly increasing throughput for high-volume use cases.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
What would you like to happen?
Currently, SparkReceiverIO reads data using a single worker because the Read transform initializes with
Impulse.create(), which produces a single initial element. This creates a scalability bottleneck as all data ingestion is constrained to one machine, regardless of the available worker pool.I would like to implement a parallel reading mechanism in SparkReceiverIO. This involves:
Create.of(shards)followed byReshufflewhennumReaders > 1is specified.numReadersis unnecessary.This enhancement will allow
SparkReceiverIOto scale horizontally, significantly increasing throughput for high-volume use cases.Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components