IEncoders
Overview
IEncoders are used to encode data before serializing and sending. Once an IReader has pulled the FrameStructs, the IEncoders are called for each FrameStruct in Sensor Stream Server to encode the frames.
The encoder converts the raw frame into an encoded frame (represented as a packet in the code). Multiple raw frames may be needed to produce a packet. The code accounts for that and sends frames to the encoder until the encoder returns a frame.
The encoders know what type of frame they are encoding by frame_type
from the FrameStruct.
Table of What Encoders to Use For a Frame Type
Frame Type
Encoders to Use
0 (color)
Null, LibAv, Nvenc all are good, need a good codec, explained below
H264 and H265 are recommended
1 (depth)
Null, ZDepth
2 (ir)
Null (ZDepth is not tuned well to IR)
3 (confidence)
Null
NullEncoder
In config.yaml:
type: "null"
There are no parameters other than “type”. The Null encoder does no processing and just returns the original uncompressed frame
Performance on Frame Types
Color/Depth/IR - Pixel/voxel are sent exactly as captured, at the cost of very high bandwidth. The codec is also very fast, unless the stream is being limited by network bandwidth.
LibAvEncoder
LibAv is supported for hardware encoding on many platforms so check if there is support for libav on your desired platform and use a codec that is platform specific for best results. Example: iPhones have videotoolbox to accelerate encoding, see the ios.yaml
In config.yaml:
type: "libav" codec_name: <string> pix_fmt: <string> bit_rate: <int>
Descriptions of Codec Fields
codec_name - name of the codec to use. SSP supports all codes that LibAv supports. The simplest way to see these options is to run ffmpeg -encoders. The name to use is the abbreviated name of the codec (second column). Please check if the V(ideo) flag is supported.
pix_fmt - pixel format to be used by the codec. Common values include yuv420 for color or gray12le for infrared data. You can check the list of supported pixel formats for each codec using ffmpeg -h encoder=libx265
bit_rate - target (average) bit rate for the encoder to use in bits.
options - codec specific parameters, see below for examples of common codecs
NvEncoder
Nvidia encoder has been deprecated but there is a fork here: https://github.com/udnaan/NvPipe
It is possible to run nvenc with libav, if it was built with the adequate options. We have not tested it in SSP. Check the options using ffmpeg -h encoder=nvenc_hevc (h265) or ffmpeg -h encoder=nvenc (h264).
In config.yaml:
type: “nvenc” codec_name: “NVPIPE_HEVC” input_format: “NVPIPE_RGBA32” bit_rate: 2000000
Descriptions of Codec Fields
codec_name - name of the codec to use (NVPIPE_HEVC or NVPIPE_H264).
input_format - input format of the data to be streamed: NVPIPE_RGBA32 (for color data), NVPIPE_UINT4, NVPIPE_UINT8, NVPIPE_UINT16 (for IR data), NVPIPE_UINT32
bit_rate - target (average) bit rate for the encoder to use in bits.
ZDepthEncoder
Zdepth is from this github repo: https://github.com/catid/Zdepth
In config.yaml:
type: “zdepth” send_I_frame_interval: “30”
Descriptions of Codec Fields
send_I_frame_interval - Omit to send only an I frame once. Check the discussion regarding these parameter here
ZDepth encoder can send both full (“I”) and partial (“P”) frames. In the best case scenario, you would only need to send the first frame of the stream as a full frame; all other frames can be compressed and sent as partial frames. This is the default behaviour, as ZeroMQ guarantees that all frames are delivered. If ZeroMQ is compiled to not block when waiting for a frame; the previous guarantee does not hold.
Thus, we’ve added the send_I_frame_interval parameter to the ZDepth YAML config. This value defines how often to send full frames. This is implemented by setting a counter in the FrameServer’s ZDepth encoder, and setting the encoder to send a full frame when the counter is a multiple of the interval.
This interaction is defined here https://github.com/moetsi/Sensor-Stream-Pipe/blob/ad751d07301da4d988a3295bcbc1b67c2187112e/encoders/zdepth_encoder.cc#L116
Codecs (to be used in LibAv or Nvenc)
H265
Descriptions of Codec Fields
This is a subset of common parameters. You can get the full list by running ffmpeg -h encoder=libx265 or accessing https://trac.ffmpeg.org/wiki/Encode/H.265
preset - Sets the tradeoff between the speed of encoding and compression efficiency crf - Sets video quality for variable bitrate video. The default is 28, and it should visually correspond to libx264 video at CRF 23. Setting both CRF and bitrate results on bitrate being ignored.
Performance on Frame Types
Color - The default yuv420p pixel format (it is most cameras, including Azure Kinect default pixel format) and CRF value of 28 achieve good results. Depth - Although you can encode 12 bit grayscale video, this is not recommended, as it will limit depth data to 4096 millimeters. We recommend you use the zdepth codec IR - We haven’t tested many encoding options for IR data. As with depth data, although you can encode 12 bit grayscale video, this is not recommended, as it will limit depth data to 4096. ZDepth is not ideal for IR data.
On iOS can use h265_videotoolbox
as a codec name, see the ios.yaml
H264
Descriptions of Codec Fields
This is a subset of common parameters. You can get the full list by running ffmpeg -h encoder=libx264 or accessing https://trac.ffmpeg.org/wiki/Encode/H.264
preset - Sets the tradeoff between the speed of encoding and compression efficiency, crf - Sets video quality for variable bitrate video. The default is 28, and it should visually correspond to libx264 video at CRF 23. Setting both CRF and bitrate results on bitrate being ignored.
Performance on Frame Types
Color - As with h265, The default yuv420p pixel format and CRF value of 28 achieve good results, at the cost of a higher bitrate. Depth - No adequate settings. We recommend you use the zdepth codec IR - No adequate settings.
On iOS can use h264_videotoolbox
as a codec name, see the ios.yaml
Decoding
The decoder works “in reverse” compared with the encoder. For each frame type, when you first sent a frame, it sends all the information required to decode those frames (codec type and all other coded information needed, remember the data and extra data fields).
The decoder of the client has a hashmap that keeps track of all these decoder information for all received frame streams. The correct decoder class is chosen according to the frame type. The hashmap keys are the randomly generated stream ids. For the FFMPEG, this information is used to build a full decoder, as you’d do when decoding a video locally.
When a frame arrives, the decoder that was previously step-up is retrieved, and the encoded frame data is passed to the decoder. The decoder transforms this frame into a raw OpenCV image, with the same number of channels, width, height, .. as the original.
The flow of calling this decoder is the same as with the encoder, but in reverse. You pass an encoded FrameStruct and get a decoded cv::Mat frame. The decoder is meant to be called as a part of a while hasNext(), get next() frame process of a network reader.
Last updated