Architecture Overview
The HoloMIT SDK is composed of modular systems that work together to enable real-time holoportation, streaming, interaction, and networking. Each system is isolated, extensible, and can be selectively enabled based on the needs of your application.
This SDK follows a modular, stream-oriented architecture that separates media capture, processing, distribution, and rendering into clearly defined components. This architecture enables real-time performance, scalability, and platform flexibility (Local, LAN, or Cloud deployments).
Core System Modules
Module | Purpose |
---|---|
Core | Foundation for threading, diagnostics, logging, and plugin extensibility |
Volumetric Video | Real-time 3D capture, encoding, distribution, and rendering |
2D Video | WebCam capture and video streaming using traditional codecs |
Audio | Microphone capture, loopback preview, streaming with compression |
Cloud | Session management, media orchestration, networking, and remote rendering |
Interactions | XR-based interaction logic: raycasting, grabbing, teleportation, etc. |
Advanced Environments | Support for Gaussian Splatting, 360° viewers, and special rendering modes |
Metrics | System performance tracking and telemetry, including Grafana integration |
Each of these modules may internally consist of multiple components and submodules (e.g., Distributors inside Cloud, or Codecs inside Volumetric/Audio).
Extending Modules
HoloMIT SDK modules are designed to be extended or replaced. Developers can:
- Implement or inherit from core interfaces (e.g.,
IUserRepresentationHandler
) to override default behavior. - Inject custom logic via plugin entry points.
- Add additional pipeline stages (e.g., for AI post-processing, filtering, etc.).
- Create your own objects and register them through registrators.
This architecture encourages clean, testable, and scalable extensions — making the SDK adaptable for both rapid prototyping and production-grade applications.
Threading by Default
Most system modules offload heavy operations to background threads:
- Capture
- Compression and decompression
- Network I/O and stream buffering
- Some rendering pre-process
This ensures minimal impact on Unity’s main thread and improves runtime stability in XR environments.
High-Level Layers
The architecture can be viewed as a stack of logical layers:
1. Capture Layer
- Devices such as depth cameras, webcams, and microphones act as capturing devices.
- Volumetric capture includes both raw data acquisition and 3D reconstruction before proceeding to compression.
2. Compression Layer
- Specialized codecs encode and decode the captured data for efficient transport:
HoloCodec
for volumetric videoH264/H265
for 2D videoOpus/Speex
for audio
- This step often includes adaptive bitrate management and LOD reduction for scalability.
- Clients receive and decode media streams for real-time playback.
- The encoding/decoding process is optimized for parallel execution and real-time frame delivery.
3. Transport Layer (Sending/Receiving)
- Encoded data is transmitted over the network using MediaWriters.
- Encoded data is received from the network through MediaReaders.
- Depending on configuration, this may occur:
- Locally (Local)
- Via a local server (LAN)
- Via a cloud backend (Cloud)
4. Rendering Layer
- Media is rendered in Unity using dedicated renderers:
- Volumetric renderer with spatial alignment
- 2D video planes or texture streaming
- Audio sources with spatialization
- Anchors and session state determine media positioning in XR space.
Media Pipeline Flow
[Capture (with Reconstruction)]
→ [Compression]
→ [Sending/Receiving]
→ [Decoding]
→ [Rendering]
This pipeline applies independently to each media type:
- Volumetric Video
- 2D Video
- Audio
Session-Centric Runtime
At runtime, everything is coordinated by a Session entity, which acts as the central context for:
- Media pipelines
- Connected users and clients
- Networked interactions
- Lifecycle events
Multiple sessions can coexist and be dynamically created or destroyed.
Pipeline Extensibility
HoloMIT SDK is designed for extension at multiple levels:
- Workers: Add custom threaded logic to any processing stage.
- Codecs: Integrate new compression algorithms.
- Writers/Readers: Swap out the default transport layer with WebRTC, or proprietary systems.
- Renderers: Customize your own rendering methodology.
Deployment Modes
The architecture supports several operational configurations:
Mode | Description |
---|---|
Local | Everything runs on a single machine (ideal for testing or single-user experiences) |
LAN | Multi-user experience using a locally deployed server over LAN network |
Cloud | Full multi-user streaming with cloud orchestration + scaling |
Deployment mode is configured at runtime with no need to change project structure or scenes.