Conversation
PocketTTS Smoke Test ✅
Runtime: 0m42s Note: PocketTTS uses CoreML MLState (macOS 15) KV cache + Mimi streaming state. CI VM lacks physical GPU — audio quality and performance may differ from Apple Silicon. |
Parakeet EOU Benchmark Results ✅Status: Benchmark passed Performance Metrics
Streaming Metrics
Test runtime: 1m26s • 04/22/2026, 02:13 PM EST RTFx = Real-Time Factor (higher is better) • Processing includes: Model inference, audio preprocessing, state management, and file I/O |
Kokoro TTS Smoke Test ✅
Runtime: 0m27s Note: Kokoro TTS uses CoreML flow matching + Vocos vocoder. CI VM lacks physical ANE — performance may differ from Apple Silicon. |
Qwen3-ASR int8 Smoke Test ✅
Performance Metrics
Runtime: 4m24s Note: CI VM lacks physical GPU — CoreML MLState (macOS 15) KV cache produces degraded results on virtualized runners. On Apple Silicon: ~1.3% WER / 2.5x RTFx. |
Sortformer High-Latency Benchmark ResultsES2004a Performance (30.4s latency config)
Sortformer High-Latency • ES2004a • Runtime: 3m 18s • 2026-04-22T18:14:52.353Z |
Speaker Diarization Benchmark ResultsSpeaker Diarization PerformanceEvaluating "who spoke when" detection accuracy
Diarization Pipeline Timing BreakdownTime spent in each stage of speaker diarization
Speaker Diarization Research ComparisonResearch baselines typically achieve 18-30% DER on standard datasets
Note: RTFx shown above is from GitHub Actions runner. On Apple Silicon with ANE:
🎯 Speaker Diarization Test • AMI Corpus ES2004a • 1049.0s meeting audio • 59.6s diarization time • Test runtime: 2m 23s • 04/22/2026, 02:16 PM EST |
VAD Benchmark ResultsPerformance Comparison
Dataset Details
✅: Average F1-Score above 70% |
Offline VBx Pipeline ResultsSpeaker Diarization Performance (VBx Batch Mode)Optimal clustering with Hungarian algorithm for maximum accuracy
Offline VBx Pipeline Timing BreakdownTime spent in each stage of batch diarization
Speaker Diarization Research ComparisonOffline VBx achieves competitive accuracy with batch processing
Pipeline Details:
🎯 Offline VBx Test • AMI Corpus ES2004a • 1049.0s meeting audio • 125.8s processing • Test runtime: 2m 10s • 04/22/2026, 02:07 PM EST |
ASR Benchmark Results ✅Status: All benchmarks passed Parakeet v3 (multilingual)
Parakeet v2 (English-optimized)
Streaming (v3)
Streaming (v2)
Streaming tests use 5 files with 0.5s chunks to simulate real-time audio streaming 25 files per dataset • Test runtime: 6m53s • 04/22/2026, 02:10 PM EST RTFx = Real-Time Factor (higher is better) • Calculated as: Total audio duration ÷ Total processing time Expected RTFx Performance on Physical M1 Hardware:• M1 Mac: ~28x (clean), ~25x (other) Testing methodology follows HuggingFace Open ASR Leaderboard |
|
Do not push this. I have an even more optimized version of LS-EEND coming up but without preview frames. |
| public var finalizedPredictions: [Float] = [] | ||
|
|
||
| /// Tentative predictions. | ||
| /// Flat array of shape [numTentative, numSpeakers]. | ||
| public var tentativePredictions: [Float] { | ||
| queue.sync { _tentativePredictions } | ||
| } | ||
| public var tentativePredictions: [Float] = [] |
There was a problem hiding this comment.
🔴 finalizedPredictions and tentativePredictions exposed as unsynchronized public vars, creating data races
These two properties were previously behind synchronized computed properties (queue.sync { _finalizedPredictions }) but are now public var with no locking. Internal methods (_addChunkUnlocked, _finalizeUnlocked, _resetUnlocked, rebuild) still mutate them under self.lock, so any external concurrent reader (e.g., LSEENDBenchmark.swift:589 reading timeline.finalizedPredictions, or LSEENDCommand.swift:212) races against the lock-protected writes. This violates the repository's AGENTS.md rule: "implement proper thread safety with actors/MainActor" and removes thread safety that the old code provided.
Prompt for agents
The properties `finalizedPredictions` and `tentativePredictions` on `DiarizerTimeline` (lines 670-674) are now `public var` but are mutated internally under `self.lock` (e.g. in `_addChunkUnlocked`, `_finalizeUnlocked`, `rebuild`). External callers read them without any synchronization, creating a data race. The old code used computed properties that acquired the dispatch queue before returning the backing store. To fix: either make these properties private and expose them through lock-protected computed properties (matching the old pattern), or ensure all reads and writes go through the lock. Affected callers include `LSEENDBenchmark.swift:589` and `LSEENDCommand.swift:212`.
Was this helpful? React with 👍 or 👎 to provide feedback.
| public var name: String? | ||
|
|
||
| /// Diarizer output slot | ||
| public var index: Int | ||
|
|
||
| /// Finalized speech segments | ||
| public var finalizedSegments: [DiarizerSegment] = [] | ||
|
|
||
| /// Preview speech segments | ||
| public var tentativeSegments: [DiarizerSegment] = [] |
There was a problem hiding this comment.
🟡 DiarizerSpeaker mutable properties exposed without synchronization, breaking prior thread safety
The refactoring changed DiarizerSpeaker.name, index, finalizedSegments, and tentativeSegments from private properties with queue-synchronized public accessors to plain public var. The class has a private NSLock and locking methods like rename(to:), but external code can bypass the lock by accessing properties directly. For example, LSEENDDiarizer.swift:352 writes enrolledSpeaker.name = name directly instead of using rename(to:). This violates the AGENTS.md rule requiring proper thread safety.
Prompt for agents
DiarizerSpeaker now exposes `name`, `index`, `finalizedSegments`, and `tentativeSegments` as `public var` (lines 225-234) while also having a private NSLock and lock-protected methods (`rename`, `reassign`, `append`, `clearTentative`, etc.). This means any external caller can read/write these properties without acquiring the lock, creating data races when accessed concurrently. The old code kept these properties private with queue-synchronized accessors. Fix options: (1) Make these properties private again and use computed properties that acquire the lock. (2) At minimum, change direct property writes like `enrolledSpeaker.name = name` (LSEENDDiarizer.swift:352) to use `enrolledSpeaker.rename(to: name)`.
Was this helpful? React with 👍 or 👎 to provide feedback.
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
| } | ||
| newPreds.append(contentsOf: try model.predict(from: input)) | ||
| processed += 1 | ||
| progressCallback?(processed, totalChunks, 1) |
There was a problem hiding this comment.
🟡 Progress callback receives chunk counts instead of sample counts, violating documented API contract
The Diarizer protocol documents progressCallback as (processedSamples, totalSamples, chunksProcessed), but the new LSEENDDiarizer.flush implementation at Sources/FluidAudio/Diarizer/LS-EEND/LSEENDDiarizer.swift:249 passes (processed, totalChunks, 1) where processed is the number of chunks processed (not samples), totalChunks is the initial ready-chunk count (not total samples), and the third parameter is hardcoded to 1 (not the accumulated chunk count). Any caller that interprets the callback per the documented contract (e.g., computing a percent-complete from processedSamples / totalSamples) will get nonsensical values.
Callback site in flush()
Line 249: progressCallback?(processed, totalChunks, 1) — processed increments per chunk, totalChunks is a snapshot from before the drain loop, and 1 is always literal 1.
The protocol comment at Sources/FluidAudio/Diarizer/DiarizerProtocol.swift:60 says: progressCallback: Optional callback receiving (processedSamples, totalSamples, chunksProcessed).
Prompt for agents
The flush() method at LSEENDDiarizer.swift:225-258 passes chunk-level counts to the progressCallback, but the Diarizer protocol documents this callback as (processedSamples, totalSamples, chunksProcessed). Either update the callback invocation to pass actual sample counts (tracking cumulative samples processed and total audio samples enqueued), or update the protocol documentation to reflect the new semantics. The SortformerDiarizer still uses the sample-count convention, so consistency across implementations matters.
Was this helpful? React with 👍 or 👎 to provide feedback.
Uh oh!
There was an error while loading. Please reload this page.