-
Notifications
You must be signed in to change notification settings - Fork 65
Description
It's possible that users might wish to serialize OTAP batches and distribute them in a way where deserialization cannot always be guaranteed to be handled by the same reader. For example, pushing messages onto a remote queue where several readers are opportunistically pulling the messages.
This is a problem because each payload in the OTAP stream uses an Arrow IPC stream, and readers that don't see the first batch are missing the IPC header, which contains the schemas, dictionaries, etc.
For these cases, it would be nice if we offered a mode where the Producer could produce single batch streams for each BatchArrowRecord it produces.
The workaround today is to just create a new Producer for each OTAP batch, but some of the state on it's StreamWriters are expensive to instantiate - notably the zstd context.
My proposal is that we should expose a method on the Producer that can be used to reset its internal StreamWriters between batches that users wishing to produce these single-record streams could use.
When this is implemented, we should write a test using multiple consumers to ensure the solution solves the stated problem.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status