Architecture Overview

The HoloMIT SDK is composed of modular systems that work together to enable real-time holoportation, streaming, interaction, and networking. Each system is isolated, extensible, and can be selectively enabled based on the needs of your application.

This SDK follows a modular, stream-oriented architecture that separates media capture, processing, distribution, and rendering into clearly defined components. This architecture enables real-time performance, scalability, and platform flexibility (Local, LAN, or Cloud deployments).

Core System Modules

Module	Purpose
Core	Foundation for threading, diagnostics, logging, and plugin extensibility
Volumetric Video	Real-time 3D capture, encoding, distribution, and rendering
2D Video	WebCam capture and video streaming using traditional codecs
Audio	Microphone capture, loopback preview, streaming with compression
Cloud	Session management, media orchestration, networking, and remote rendering
Interactions	XR-based interaction logic: raycasting, grabbing, teleportation, etc.
Advanced Environments	Support for Gaussian Splatting, 360° viewers, and special rendering modes
Metrics	System performance tracking and telemetry, including Grafana integration

Each of these modules may internally consist of multiple components and submodules (e.g., Distributors inside Cloud, or Codecs inside Volumetric/Audio).

Extending Modules

HoloMIT SDK modules are designed to be extended or replaced. Developers can:

Implement or inherit from core interfaces (e.g., IUserRepresentationHandler) to override default behavior.
Inject custom logic via plugin entry points.
Add additional pipeline stages (e.g., for AI post-processing, filtering, etc.).
Create your own objects and register them through registrators.

This architecture encourages clean, testable, and scalable extensions — making the SDK adaptable for both rapid prototyping and production-grade applications.

Threading by Default

Most system modules offload heavy operations to background threads:

Capture
Compression and decompression
Network I/O and stream buffering
Some rendering pre-process

This ensures minimal impact on Unity’s main thread and improves runtime stability in XR environments.

High-Level Layers

The architecture can be viewed as a stack of logical layers:

1. Capture Layer

Devices such as depth cameras, webcams, and microphones act as capturing devices.
Volumetric capture includes both raw data acquisition and 3D reconstruction before proceeding to compression.

2. Compression Layer

Specialized codecs encode and decode the captured data for efficient transport:
- HoloCodec for volumetric video
- H264/H265 for 2D video
- Opus/Speex for audio
This step often includes adaptive bitrate management and LOD reduction for scalability.
Clients receive and decode media streams for real-time playback.
The encoding/decoding process is optimized for parallel execution and real-time frame delivery.

3. Transport Layer (Sending/Receiving)

Encoded data is transmitted over the network using MediaWriters.
Encoded data is received from the network through MediaReaders.
Depending on configuration, this may occur:
- Locally (Local)
- Via a local server (LAN)
- Via a cloud backend (Cloud)

4. Rendering Layer

Media is rendered in Unity using dedicated renderers:
- Volumetric renderer with spatial alignment
- 2D video planes or texture streaming
- Audio sources with spatialization
Anchors and session state determine media positioning in XR space.

Media Pipeline Flow

[Capture (with Reconstruction)] 
    → [Compression] 
    → [Sending/Receiving] 
    → [Decoding] 
    → [Rendering]

This pipeline applies independently to each media type:

Volumetric Video
2D Video
Audio

Session-Centric Runtime

At runtime, everything is coordinated by a Session entity, which acts as the central context for:

Media pipelines
Connected users and clients
Networked interactions
Lifecycle events

Multiple sessions can coexist and be dynamically created or destroyed.

Pipeline Extensibility

HoloMIT SDK is designed for extension at multiple levels:

Workers: Add custom threaded logic to any processing stage.
Codecs: Integrate new compression algorithms.
Writers/Readers: Swap out the default transport layer with WebRTC, or proprietary systems.
Renderers: Customize your own rendering methodology.

Deployment Modes

The architecture supports several operational configurations:

Mode	Description
Local	Everything runs on a single machine (ideal for testing or single-user experiences)
LAN	Multi-user experience using a locally deployed server over LAN network
Cloud	Full multi-user streaming with cloud orchestration + scaling

Deployment mode is configured at runtime with no need to change project structure or scenes.

Found something wrong or missing?

🐞 Report issue ✏️ Missing information ❌ Incorrect information 🤔 Confusing information 💡 Something else

Core System Modules​

Extending Modules​

Threading by Default​

High-Level Layers​

1. Capture Layer​

2. Compression Layer​

3. Transport Layer (Sending/Receiving)​

4. Rendering Layer​

Media Pipeline Flow​

Session-Centric Runtime​

Pipeline Extensibility​

Deployment Modes​