# Sensor Stream Server

![High-level overview of what Sensor Stream Server does](https://3654700078-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MT1_EXYmVY_RUqopPfe%2F-MTU8YwD48Eq3vllfnyU%2F-MTWivUTRrt8o0JZGAK7%2Fimage.png?alt=media\&token=fb47d8bb-67a3-4750-91bd-4476e045b97c)

1. Sensor Stream Server is started with a path to a configuration yaml file as a command line argument
2. The configuration file defines:
   * Where the data should be streamed to (destination)
   * Frame Source (where the data is coming from - frame list/video/kinect/iphone)
   * What frame types will be streamed (color/depth/ir/confidence)
   * Encoders for each of the frame types ( null/libav/nvenc/zdepth)
3. `IReader` is constructed with parameters defined in the configuration file
   * The instantiated `IReader`'s frame types are defined by a vector of integers returned by `GetType()` method
     * 0 - if exists in vector then **color** is being sent (encoders available: null/libav/nvenc)
     * 1 - if exists in vector then **depth** is being sent (encoders available: null/libav/nvenc/zdepth)
     * 2 - if exists in vector then **ir** is being sent (encoders available: null/libav/nvenc)
     * 3 - if exists in vector then **confidence** is being sent (encoders available: null)
   * `IReader` will use `GetCurrentFrame()` in Step #5 to grab data from the frame source and create a `FrameStruct` for each defined frame type
4. Server creates a vector of `IEncoder`s for each frame type returned by `IReader`'s `GetType()`
   * The `IEncoder`s will be used to encode `FrameStruct`s in Step #5
   * `NullEncoder` - passes raw data through
   * `LibAvEncoder` - uses [LibAv](https://libav.org/) to encode video frames
   * `NvEncoder` - uses [Nvidia Pipe](https://github.com/NVIDIA/NvPipe) to encode video frames
   * `ZDepthEncoder` - uses [Zdepth](https://github.com/catid/Zdepth) to encode depth frames&#x20;
5. Server creates a vector of encoded `FrameStruct`s
   * Server is paced to transmit frames only as fast as the fps allows
   * `IReader` creates a vector of `FrameStruct`s
     * One for each frame type returned by `GetType()`
     * `IReader.GetCurrentFrame()` - Fills `FrameStruct`
     * if (`IReader.HasNextFrame()`) - Used to iterate
     * `IReader.NextFrame()` - Used to iterate
   * Server iterates over vector of `FrameStruct`s and  `IEncoder` which returns an encoded `FrameStruct`
     * `IEncoder.AddFrameStruct()` - Attempts encoding `FrameStruct`
     * `IEncoder.HasNextPacket()` - Checks if encoded frame available
     * `IEncoder.CurrentFrameEncoded()` - Returns encoded frame
6. The vector of encoded `FrameStruct`s is sent to destination zmq socket as a zmq message
   * Call `CerealStructToString()` on vector of encoded frames to create string message
   * Create zmq message out of the string message
   * Call `socket.send()` passing through the zmq message
   * Log timings
   * Currently the frame rate&#x20;

![Detailed overview of Sensor Stream Server data flow](https://3654700078-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MT1_EXYmVY_RUqopPfe%2F-MeviRZkinlzhfT54Z0l%2F-Mevl9GV6qvNXYAC7yfA%2Fimage.png?alt=media\&token=1a6d1701-08ea-4a7c-b64b-645493b2ded0)

The Sensor Stream Server can stream from 4 frame (data) sources:

* **Images**: frames from images stored on the disk.
* **Video**: encoded frames that have been captured using a color/depth camera, such as the Kinect. All video types supported by FFmpeg/Libav can be processed.
* **Kinect**: Live Kinect DK frame data.
* **iPhone**: Live iPhone RGBD data (must have lidar)

Each of these data sources have their own implementation of `IReader`defined in `image_reader.h` which is able to create read data and write to `FrameStruct`s.

* **ImageReader:** Can be used to stream public datasets and can be composed into MultiImageReader
* **VideoFileReader:** Reads .mkv files like those created by [Azure Kinect DK recorder](https://docs.microsoft.com/en-us/azure/kinect-dk/azure-kinect-recorder)
* **KinectReader:** Reads Azure Kinect DK stream
* **IPhoneReader:** Reads ARFrame data

For all four data sources, the data can be sent losslessly (very high bandwidth requirements), or compressed (20-50x lower bandwidth requirements), using Libav or NVCodec through the NVPipe.

As any compression will affect quality, we recommend first experimenting with the Sensor Stream Tester to figure out the optimal levels for your use case.

Server File: <https://github.com/moetsi/Sensor-Stream-Pipe/blob/master/servers/ssp_server.cc>

## Building Sensor Stream Server

Building Sensor Stream Server also builds Sensor Stream Client and Sensor Stream Tester.

```
git clone git@github.com:moetsi/Sensor-Stream-Pipe.git
cd Sensor-Stream-Pipe
mkdir build
cd build
cmake .. -DSSP_WITH_KINECT_SUPPORT=ON -DSSP_WITH_K4A_BODYTRACK=ON -DSSP_WITH_NVPIPE_SUPPORT=ON
make ssp_server
```

You can turn off Kinect, Bodytrack and NVPipe support by changing the following to the `cmake ..` options to off:

```
-DSSP_WITH_KINECT_SUPPORT=OFF
-DSSP_WITH_K4A_BODYTRACK=OFF
-DSSP_WITH_NVPIPE_SUPPORT=OFF
```

## Starting the Sensor Stream Server

```
./bin/ssp_server <configuration file>
```

The Sensor Stream Server will start streaming frame data by default, but **it will not keep any frames if it is not connected to a client**. This is a zmq setting. From ssp\_server.cc:

```
// Do not accumulate packets if no client is connected
socket.set(zmq::sockopt::immediate, true);
```

When it connects to Sensor Stream Client, the packets in the buffer will be sent first. After the buffer has emptied, the Sensor Stream Server will resume reading frames from the selected input in order, ensuring that no frames are dropped.

The Sensor Stream Server configuration is stored in a YAML file. It includes Sensor Stream Client host and port, input data configuration and encoding configuration.

The format of the file (encoding Kinect DK frame data with the Nvidia encoder) is as follows:

```
general:
  host: "192.168.1.64"
  port: 9999
  log_level: "debug"
  log_file: "ssp_server.log"
  frame_source: 
    type: "kinect"
    parameters:
        stream_color_video: True
        stream_depth_video: True
        stream_ir_video: True
        streaming_color_format: "K4A_IMAGE_FORMAT_COLOR_BGRA32"
        streaming_color_resolution: "K4A_COLOR_RESOLUTION_720P"
        streaming_depth_mode: "K4A_DEPTH_MODE_NFOV_UNBINNED"
        wired_sync_mode: "K4A_WIRED_SYNC_MODE_STANDALONE"
        streaming_rate: "K4A_FRAMES_PER_SECOND_30"
        absoluteExposureValue: 0
video_encoder:
  0: #color
    type: "nvenc"
    codec_name: "NVPIPE_HEVC"
    input_format: "NVPIPE_RGBA32"
    bit_rate: 4000000
  1: #depth
    type: "zdepth"
  2: #ir
    type: "nvenc"
    codec_name: "NVPIPE_HEVC"
    input_format: "NVPIPE_UINT16"
    bit_rate: 15000000

```

The `config/` folder includes a set of examples for all types of data using multiple encoders, codecs and parameters.
