FrameStruct

IReader pulls data from the frame source into a FrameStruct on GetCurrentFrame(). FrameStruct contains all the information necessary to be received and decoded by a Sensor Stream Client.

FrameStructs are a sample of sensor data of a certain data type. Sensors that collect different data can send different frame types.

Azure Kinect can stream: RGB (color), depth, and IR data. This means that there are 3 frame types that Sensor Stream Server can send when streaming Azure Kinect data. Each frame type gets its own FrameStruct when sampling data.

Sensor Stream Server reads the config yaml and depending on its configuration will create an IReader that pulls data from the frame source and generates FrameStructs for each frame type.

FrameStruct contains additional information to binary sensor data. It also contains information like the CameraCalibrationStruct, which provides the intrinsics of the sensor. This can be used in downstream computer vision/spatial computing pipelines.

FrameStruct also contains information for how it has been encoded in CodecParamsStruct. This is so the receiving end, Sensor Stream Pipe, knows how to decode the frame struct.

FrameStructs are sent by the Sensor Stream Pipe by being converted to string through CerealStructToString method. The string is then packaged as a zmq message and sent through a zmq socket.

FrameStruct Interface (frame_struct.hpp)

#pragma once

#include <iterator>
#include <vector>

#include <cereal/archives/binary.hpp>
#include <cereal/cereal.hpp>
#include <cereal/types/base_class.hpp>
#include <cereal/types/memory.hpp>
#include <cereal/types/vector.hpp>

#include "../utils/utils.h"

struct CameraCalibrationStruct {
  // 0: Kinect parameters
  short type = -1;
  std::vector<unsigned char> data;
  std::vector<unsigned char> extra_data;

  CameraCalibrationStruct() {}

  CameraCalibrationStruct(unsigned int t, std::vector<unsigned char> d,
                          std::vector<unsigned char> ed)
      : type(t), data(d), extra_data(ed) {}

  template <class Archive> void serialize(Archive &ar) {
    ar(type, data, extra_data);
  }
};

struct CodecParamsStruct {
  // 0: av parameters, 1: nvPipe parameters, 2: zDepth parameters
  short type = -1;
  std::vector<unsigned char> data;
  std::vector<unsigned char> extra_data;

  CodecParamsStruct() {}

  CodecParamsStruct(unsigned int t, std::vector<unsigned char> d,
                    std::vector<unsigned char> ed)
      : type(t), data(d), extra_data(ed) {}

  void SetData(std::vector<unsigned char> &d) { data = d; }

  void SetExtraData(std::vector<unsigned char> &ed) { extra_data = ed; }

  template <class Archive> void serialize(Archive &ar) {
    ar(type, data, extra_data);
  }
};

struct FrameStruct {

  // message id, currenly set to 0
  //This is to be used as "versioning", so if how messages are updated so that Sensor Stream Client
  //must interpret different "versions" of messages then this field will indicate the message version
  unsigned short message_type;

  // 0 for color, 1 for depth, 2 for ir, 3 for confidence
  unsigned short frame_type;

  // 0 for image frames, 1 for libav packets, 2 for raw RGBA data, 3 for raw
  // GRAY16LE data, 4 for NvPipe packets, 5 for raw 32FC1 data, 6 for YUV data
  // 7 for raw U8C1 data
  //This is used to select the decoder on the "receiving" side of the Pipe
  //Not all frame_type + frame_data_type combinations "make sense" or will be used
  unsigned short frame_data_type;

  // random 16 char string that uniquely ids the frame stream
  //Some decoders (like video) are stateful and so must keep track of streams
  //This is automatically generated
  std::string stream_id;

  // frame binary data
  //We use a vector to know the size, basically a vector of bytes to store binary data
  std::vector<unsigned char> frame;

  // codec info for video frames, null for image frames
  //Video decoder needs to know about the last receive frame
  //Requires to know the codec as well as additional parameters
  CodecParamsStruct codec_data;

  // codec info for video frames, null for image frames
  CameraCalibrationStruct camera_calibration_data;

  // optional: scene description
  std::string scene_desc;

  // 0 for color, 1 for depth: currently redundant with frameType, but
  // distinction may be needed in the future
  unsigned int sensor_id;

  // integer device id: distingish between devices in the same scene
  //Can be set by user
  unsigned int device_id;

  // current frame number (increases over time)
  //Increases by 1 for each frame automatically when SSP server starts
  unsigned int frame_id;

  //Use for logging and timing to understand processing speeds
  std::vector<unsigned long> timestamps;

  //Serialize method (not used by Server but is available)
  template <class Archive> void serialize(Archive &ar) {
    ar(message_type, frame_type, frame_data_type, stream_id, frame, codec_data,
       camera_calibration_data, scene_desc, sensor_id, device_id, frame_id,
       timestamps);
  }
};

template <typename T>
static const std::string CerealStructToString(const T &t) {
  std::ostringstream os(std::ios::binary);
  {
    cereal::BinaryOutputArchive oarchive(os);
    oarchive(t);
  }

  return os.str();
}

template <typename T> static const std::string FrameStructToString(const T *t) {
  std::ostringstream os(std::ios::binary);
  {
    cereal::BinaryOutputArchive oarchive(os);
    oarchive(*t);
  }

  return os.str();
}

template <typename T> static T ParseCerealStructFromString(std::string &data) {
  T frame_in;
  std::istringstream is(data, std::ios::binary);
  {
    cereal::BinaryInputArchive iarchive(is);
    iarchive(frame_in);
  }
  return frame_in;
}

Last updated