Skip to main content

Speex

The NSpeexEncoder handles legacy or bandwidth-constrained audio encoding for the HoloMIT SDK.

Implementation Details

The encoder inherits from VoiceEncoder (and ultimately BaseWorker) and operates asynchronously using thread-safe queues. It pulls raw audio frames (FloatMemoryChunk) from an input queue, compresses them, and pushes the resulting ByteMemoryChunk to an output queue.

  • Implementation: Wraps the NSpeex library.
  • Location: Assets\UserRepresentation\Voice\Scripts\Workers\Codecs\NSpeexEncoder.cs
  • Configuration:
    • Band Mode: Wide
    • Quality: 5
  • Buffer Size: Dynamically assigned based on the Speex encoder's FrameSize.

Metrics Tracking

The encoder utilizes an internal Stats class that measures the processing time (via System.Diagnostics.Stopwatch), incoming bytes, and outgoing bytes per frame. These metrics are periodically flushed to Prometheus or local log files depending on the global configuration (Config.Instance.Stats.DebugMetrics).

Decoder (VoiceDecoder)

Instead of having a separate decoder class, the system uses a unified VoiceDecoder script that handles Speex dynamically.

  • When instantiated, it prepares a SpeexDecoder in Wide band mode.
  • It reads the format from the first packet's metadata (_chunk.Info.Dsi).
  • Because the incoming stream is Mono but Unity's spatial audio or multi-channel setups might require more, the decoder duplicates the decoded mono signal across 6 channels into the final FloatMemoryChunk before queueing it for the AudioPreparer.