Skip to content

Can batched Streaming Inference be used in real-time streaming? #15273

@azziko

Description

@azziko

I'm trying to adopt this script for real-time streaming as a PoC, but after processing a couple of sentences the decoder stops processing input and outputs 3 dots(even before max_generation_length is reached).

The only conceptual changes I do to the original script, is reading input audio by chunks from a server. I initialize the batched computer and the buffer once for each client. I believe the batched computer used in the original script is not suited to be used that way.

Is there an other well-suited decoder I could use for that matter that supports AlignAtt?
What would be a reasonable work-around if not?

My testing parameters are:
pretrained_name=nvidia/canary-1b-v2
left_context_secs=10
chunk_secs=1
right_context_secs=0.5
batch_size=1
decoding.streaming_policy=alignatt
decoding.alignatt_thr=8
decoding.exclude_sink_frames=8
decoding.xatt_scores_layer=-2
+prompt.task=asr
+prompt.source_lang=en
+prompt.target_lang=en

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions