Zed Camera - Get data in c++ with openCV - c++

I used with Zed-Camera for get depth (https://www.stereolabs.com/).
I want to get the data in c++ (using the OpenCV library).
I took the code from here:
https://www.stereolabs.com/blog/index.php/2015/06/28/zed-with-opencv/
The code on the website is not working, because one line does not compile:
sl::zed::ERRCODE err = zed->init(sl::zed::MODE::PERFORMANCE, 0, true);
I get 2 errors:
initial value of reference to non-const must be an lvalue
too many arguments in function call.
I looked in the function, the function get:
ERRCODE init(InitParams &parameters);
I would appreciate your help

Yes as you can see the parameters was changed to InitParams.
sl::zed::InitParams params;
params.verbose = true;
sl::zed::ERRCODE err = camera->init(params);

You can do something similar to this.
m_pZed = new sl::Camera();
sl::InitParameters zedInit;
zedInit.camera_buffer_count_linux = 4;
zedInit.camera_disable_self_calib = false;
zedInit.camera_fps = m_zedFPS;
zedInit.camera_image_flip = m_bZedFlip;
zedInit.camera_linux_id = 0;
zedInit.camera_resolution = (sl::RESOLUTION) m_zedResolution;
zedInit.coordinate_system = sl::COORDINATE_SYSTEM::COORDINATE_SYSTEM_IMAGE;
zedInit.coordinate_units = sl::UNIT::UNIT_METER;
zedInit.depth_minimum_distance = m_zedMinDist;
zedInit.depth_mode = (sl::DEPTH_MODE) m_zedDepthMode;
zedInit.sdk_gpu_id = -1;
zedInit.sdk_verbose = true;
sl::ERROR_CODE err = m_pZed->open(zedInit);
if (err != sl::SUCCESS)
{
LOG(ERROR)<< "ZED Error code: " << sl::errorCode2str(err) << std::endl;
return false;
}
m_pZed->setConfidenceThreshold(m_zedConfidence);
m_pZed->setDepthMaxRangeValue((float) m_zedMaxDist);
// Set runtime parameters after opening the camera
m_zedRuntime.sensing_mode = (sl::SENSING_MODE) m_zedSenseMode;
m_zedRuntime.enable_depth = true;
m_zedRuntime.enable_point_cloud = false;
m_zedRuntime.move_point_cloud_to_world_frame = false;
// Create sl and cv Mat to get ZED left image and depth image
sl::Resolution zedImgSize = m_pZed->getResolution();
// Initialize color image and depth
m_width = zedImgSize.width;
m_height = zedImgSize.height;
m_centerH = m_width / 2;
m_centerV = m_height / 2;
// Best way of sharing sl::Mat and cv::Mat :
// Create a sl::Mat and then construct a cv::Mat using the ptr to sl::Mat data.
m_pzDepth = new sl::Mat(zedImgSize, sl::MAT_TYPE_32F_C1, sl::MEM_GPU);
m_gDepth = slMat2cvGpuMat(*m_pzDepth);
m_gDepth2 = GpuMat(m_gDepth.size(), m_gDepth.type());
More details can be found here( https://github.com/yankailab/OpenKAI/blob/master/src/Vision/_ZED.cpp )

Related

Clip Raster with Polygon with GDAL C++

I am trying to clip a raster using a polygon an GDAL. At the moment i get an error that there is a read access violation when initializing the WarpOperation. I can access my Shapefile and check the num of features so the access is fine i think. Also i can access my Raster Data (GetProjectionRef).. All files are in the same CRS. Is there a way to use GdalWarp with Cutline?
const char* inputPath = "input.tif";
const char* outputPath = "output.tif";
//clipper Polygon
auto w_read_filenamePoly = "Polygon.shp";
char* read_filenamePoly = new char[w_read_filenamePoly.length() + 1];
wcstombs(read_filenamePoly, w_read_filenamePoly.c_str(), w_read_filenamePoly.length() + 1);
GDALDataset* hSrcDS;
GDALDataset* hDstDS;
GDALAllRegister();
hSrcDS =(GDALDataset *) GDALOpen(inputPath, GA_Update);
hDstDS = (GDALDataset*)GDALOpen(outputPath, GA_Update);
const char* proj = hSrcDS->GetProjectionRef();
const char* proj2 = hDstDS->GetProjectionRef();
//clipper Layer
GDALDataset* poDSClipper;
poDSClipper = (GDALDataset*)GDALOpenEx(read_filenamePoly, GDAL_OF_UPDATE, NULL, NULL, NULL);
Assert::IsNotNull(poDSClipper);
delete[]read_filenamePoly;
OGRLayer* poLayerClipper;
poLayerClipper = poDSClipper->GetLayerByName("Polygon");
int numClip = poLayerClipper->GetFeatureCount();
//setup warp options
GDALWarpOptions* psWarpOptions = GDALCreateWarpOptions();
psWarpOptions->hSrcDS = hSrcDS;
psWarpOptions->hDstDS = hDstDS;
psWarpOptions->nBandCount = 1;
psWarpOptions->panSrcBands = (int *) CPLMalloc(sizeof(int) * psWarpOptions->nBandCount);
psWarpOptions->panSrcBands[0] = 1;
psWarpOptions->panDstBands = (int*)CPLMalloc(sizeof(int) * psWarpOptions->nBandCount);
psWarpOptions->panDstBands[0] = 1;
psWarpOptions->pfnProgress = GDALTermProgress;
psWarpOptions->hCutline = poLayerClipper;
// Establish reprojection transformer.
psWarpOptions->pTransformerArg = GDALCreateGenImgProjTransformer(hSrcDS,proj, hDstDS, proj2, FALSE, 0.0, 1);
psWarpOptions->pfnTransformer = GDALGenImgProjTransform;
GDALWarpOperation oOperation;
oOperation.Initialize(psWarpOptions);
oOperation.ChunkAndWarpImage(0, 0, GDALGetRasterXSize(hDstDS), GDALGetRasterYSize(hDstDS));
GDALDestroyGenImgProjTransformer(psWarpOptions->pTransformerArg);
GDALDestroyWarpOptions(psWarpOptions);
GDALClose(hDstDS);
GDALClose(hSrcDS);
Your psWarpOptions->hCutline should be a polygon, not a layer.
Also the cutline should be in source pixel/line coordinates.
Check TransformCutlineToSource from gdalwarp_lib.cpp, you can probably simply get the code from there.
This particular GDAL operation, when called from C++, is so full of pitfalls - and there are so many open questions about it here - that I am reproducing a full working example:
Warping (reprojecting) a raster image with a polygon mask (cutline):
#include <gdal/gdal.h>
#include <gdal/gdal_priv.h>
#include <gdal/gdalwarper.h>
#include <gdal/ogrsf_frmts.h>
int main() {
const char *inputPath = "input.tif";
const char *outputPath = "output.tif";
// clipper Polygon
// THIS FILE MUST BE IN PIXEL/LINE COORDINATES or otherwise one should
// copy the function gdalwarp_lib.cpp:TransformCutlineToSource()
// from GDAL's sources
// It is expected that it contains a single polygon feature
const char *read_filenamePoly = "cutline.json";
GDALDataset *hSrcDS;
GDALDataset *hDstDS;
GDALAllRegister();
auto poDriver = GetGDALDriverManager()->GetDriverByName("GTiff");
hSrcDS = (GDALDataset *)GDALOpen(inputPath, GA_ReadOnly);
hDstDS = (GDALDataset *)poDriver->CreateCopy(
outputPath, hSrcDS, 0, nullptr, nullptr, nullptr);
// Without this step the cutline is useless - because the background
// will be carried over from the original image
CPLErr e = hDstDS->GetRasterBand(1)->Fill(0);
const char *src_srs = hSrcDS->GetProjectionRef();
const char *dst_srs = hDstDS->GetProjectionRef();
// clipper Layer
GDALDataset *poDSClipper;
poDSClipper = (GDALDataset *)GDALOpenEx(
read_filenamePoly, GDAL_OF_UPDATE, NULL, NULL, NULL);
auto poLayerClipper = poDSClipper->GetLayer(0);
auto geom = poLayerClipper->GetNextFeature()->GetGeometryRef();
// setup warp options
GDALWarpOptions *psWarpOptions = GDALCreateWarpOptions();
psWarpOptions->hSrcDS = hSrcDS;
psWarpOptions->hDstDS = hDstDS;
psWarpOptions->nBandCount = 1;
psWarpOptions->panSrcBands =
(int *)CPLMalloc(sizeof(int) * psWarpOptions->nBandCount);
psWarpOptions->panSrcBands[0] = 1;
psWarpOptions->panDstBands =
(int *)CPLMalloc(sizeof(int) * psWarpOptions->nBandCount);
psWarpOptions->panDstBands[0] = 1;
psWarpOptions->pfnProgress = GDALTermProgress;
psWarpOptions->hCutline = geom;
// Establish reprojection transformer.
psWarpOptions->pTransformerArg = GDALCreateGenImgProjTransformer(
hSrcDS, src_srs, hDstDS, dst_srs, TRUE, 1000, 1);
psWarpOptions->pfnTransformer = GDALGenImgProjTransform;
GDALWarpOperation oOperation;
oOperation.Initialize(psWarpOptions);
oOperation.ChunkAndWarpImage(
0, 0, GDALGetRasterXSize(hDstDS), GDALGetRasterYSize(hDstDS));
GDALDestroyGenImgProjTransformer(psWarpOptions->pTransformerArg);
GDALDestroyWarpOptions(psWarpOptions);
GDALClose(hDstDS);
GDALClose(hSrcDS);
}

GDALWarpRegionToBuffer & Tiling when Dst Frame not strictly contained in Src Frame

I'm currently working with gdal api C/C++ and I'm facing an issue with gdal warp region to buffer functionality (WarpRegionToBuffer).
When my destination dataset is not strictly contained in the frame of my source dataset, the area where there should be no data values is filled with random data (see out_code.tif enclosed). However gdalwarp command line functionality, which also uses WarpRegionToBuffer function, does not seem to have this problem.
1/ Here is the code I use:
#include <iostream>
#include <string>
#include <vector>
#include "gdal.h"
#include "gdalwarper.h"
#include "cpl_conv.h"
int main(void)
{
std::string pathSrc = "in.dt1";
//these datas will be provided by command line
std::string pathDst = "out_code.tif";
double resolutionx = 0.000833333;
double resolutiony = 0.000833333;
//destination corner coordinates: top left (tl) bottom right (br)
float_t xtl = -1;
float_t ytl = 45;
float_t xbr = 2;
float_t ybr = 41;
//tile size defined by user
int tilesizex = 256;
int tilesizey = 256;
float width = std::ceil((xbr - xtl)/resolutionx);
float height = std::ceil((ytl - ybr)/resolutiony);
double adfDstGeoTransform[6] = {xtl, resolutionx, 0, ytl, 0, -resolutiony};
GDALDatasetH hSrcDS, hDstDS;
// Open input file
GDALAllRegister();
hSrcDS = GDALOpen(pathSrc.c_str(), GA_ReadOnly);
GDALDataType eDT = GDALGetRasterDataType(GDALGetRasterBand(hSrcDS,1));
// Create output file, using same spatial reference as input image, but new geotransform
GDALDriverH hDriver = GDALGetDriverByName( "GTiff" );
hDstDS = GDALCreate( hDriver, pathDst.c_str(), width, height, GDALGetRasterCount(hSrcDS), eDT, NULL );
OGRSpatialReference oSRS;
char *pszWKT = NULL;
//force geo projection
oSRS.SetWellKnownGeogCS( "WGS84" );
oSRS.exportToWkt( &pszWKT );
GDALSetProjection( hDstDS, pszWKT );
//Fetches the coefficients for transforming between pixel/line (P,L) raster space,
//and projection coordinates (Xp,Yp) space.
GDALSetGeoTransform( hDstDS, adfDstGeoTransform );
// Setup warp options
GDALWarpOptions *psWarpOptions = GDALCreateWarpOptions();
psWarpOptions->hSrcDS = hSrcDS;
psWarpOptions->hDstDS = hDstDS;
psWarpOptions->nBandCount = 1;
psWarpOptions->panSrcBands = (int *) CPLMalloc(sizeof(int) * psWarpOptions->nBandCount );
psWarpOptions->panSrcBands[0] = 1;
psWarpOptions->panDstBands = (int *) CPLMalloc(sizeof(int) * psWarpOptions->nBandCount );
psWarpOptions->panDstBands[0] = 1;
psWarpOptions->pfnProgress = GDALTermProgress;
//these datas will be calculated in order to warp tile by tile
//current tile size
int cursizex = 0;
int cursizey = 0;
double nbtilex = std::ceil(width/tilesizex);
double nbtiley = std::ceil(height/tilesizey);
int starttilex = 0;
int starttiley = 0;
// Establish reprojection transformer
psWarpOptions->pTransformerArg =
GDALCreateGenImgProjTransformer(hSrcDS,
GDALGetProjectionRef(hSrcDS),
hDstDS,
GDALGetProjectionRef(hDstDS),
FALSE, 0.0, 1);
psWarpOptions->pfnTransformer = GDALGenImgProjTransform;
// Initialize and execute the warp operation on region
GDALWarpOperation oOperation;
oOperation.Initialize(psWarpOptions);
for (int ty = 0; ty < nbtiley; ty++) {
//handle last tile size
//if it last tile change size otherwise keep tilesize
for (int tx = 0; tx < nbtilex; tx++) {
//if it last tile change size otherwise keep tilesize
starttiley = ty * tilesizey;
starttilex = tx * tilesizex;
cursizex = std::min(starttilex + tilesizex, (int)width) - starttilex;
cursizey = std::min(starttiley + tilesizey, (int)height) - starttiley;
float * buffer = new float[cursizex*cursizey];
memset(buffer, 0, cursizex*cursizey);
//warp source
CPLErr ret = oOperation.WarpRegionToBuffer(
starttilex, starttiley, cursizex, cursizey,
buffer,
eDT);
if (ret != 0) {
CEA_SIMONE_ERROR(CPLGetLastErrorMsg());
throw std::runtime_error("warp error");
}
//write the fuzed tile in dest
ret = GDALRasterIO(GDALGetRasterBand(hDstDS,1),
GF_Write,
starttilex, starttiley, cursizex, cursizey,
buffer, cursizex, cursizey,
eDT,
0, 0);
if (ret != 0) {
CEA_SIMONE_ERROR("raster io write error");
throw std::runtime_error("raster io write error");
}
delete(buffer);
}
}
// Clean memory
GDALDestroyGenImgProjTransformer( psWarpOptions->pTransformerArg );
GDALDestroyWarpOptions( psWarpOptions );
GDALClose( hDstDS );
GDALClose( hSrcDS );
return 0;
}
The result:
output image of previous sample of code (as png, as I can't enclose TIF img)
The GdalWarp command line:
gdalwarp -te -1 41 2 45 -tr 0.000833333 0.000833333 in.dt1 out_cmd_line.tif
The command line result:
output image of previous command line (as png, as I can't enclose TIF img)
Can you please help me find what is wrong with my use of GDAL C/C++ API in order to have a similar behaviour as gdalwarp command line? There is probably an algorithm in gdalwarp that computes a mask of useful pixels in destination frame before calling WarpRegionToBuffer, but I didn't find it.
I would really appreciate help on this problem!
Best regards

Setting multiple descriptors in a descriptor range in Direct3D 12

First of all, my understanding of a descriptor range is that I can specify multiple buffers (constant buffers in my case) that a shader may use, is that correct? If not, then this is where my misunderstanding is, and the rest of the question will make no sense.
Lets say I want to pass a couple of constant buffers in my vertex shader
// Vertex.hlsl
float value0 : register(b0)
float value1 : register(b1)
...
And for whatever reason I want to use a descriptor range to specify b0 and b1. I fill out a D3D12_DESCRIPTOR_RANGE:
D3D12_DESCRIPTOR_RANGE range;
range.RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_CBV;
range.NumDescriptors = 2;
range.BaseShaderRegister = 0;
range.RegisterSpace = 0;
range.OffsetInDescriptorsFromTableStart = D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND;
I then go on to shove this into a root parameter
D3D12_ROOT_PARAMETER param;
param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE;
param.DescriptorTable.NumDescriptorRanges = 1;
param.DescriptorTable.pDescriptorRanges = &range;
param.ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX;
Root parameter goes into my signature description
D3D12_ROOT_SIGNATURE_DESC1 signatureDesc;
signatureDesc.NumParameters = 1;
signatureDesc.pParameters = &param;
signatureDesc.NumStaticSamplers = 0;
signatureDesc.pStaticSamplers = nullptr;
D3D12_ROOT_SIGNATURE_FLAGS = D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT;
After this is create my root signature and so on. I also created a heap for 2 descriptors
D3D12_DESCRIPTOR_HEAP_DESC heapDescCbv;
heapDescCbv.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
heapDescCbv.NumDescriptors = 2;
heapDescCbv.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
heapDescCbv.NodeMask = 0;
ThrowIfFailed(m_device->CreateDescriptorHeap(&heapDescCbv, IID_PPV_ARGS(&m_cbvHeap)));
I then mapped the respective ID3D12Resource's to get two pointers so I can memcpy my values to them.
void D3D12App::AllocateConstantBuffer(SIZE_T index, size_t dataSize, ID3D12Resource** buffer, void** mappedPtr)
{
D3D12_HEAP_PROPERTIES heapProp;
heapProp.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heapProp.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heapProp.CreationNodeMask = 1;
heapProp.VisibleNodeMask = 1;
heapProp.Type = D3D12_HEAP_TYPE_UPLOAD;
D3D12_RESOURCE_DESC resDesc;
resDesc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
resDesc.Alignment = 0;
resDesc.Width = (dataSize + 255) & ~255;
resDesc.Height = 1;
resDesc.DepthOrArraySize = 1;
resDesc.MipLevels = 1;
resDesc.Format = DXGI_FORMAT_UNKNOWN;
resDesc.SampleDesc.Count = 1;
resDesc.SampleDesc.Quality = 0;
resDesc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
resDesc.Flags = D3D12_RESOURCE_FLAG_NONE;
ThrowIfFailed(m_device->CreateCommittedResource(&heapProp, D3D12_HEAP_FLAG_NONE,
&resDesc, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(buffer)));
D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc;
cbvDesc.BufferLocation = (*buffer)->GetGPUVirtualAddress();
cbvDesc.SizeInBytes = (dataSize + 255) & ~255;
auto cbvHandle = m_cbvHeap->GetCPUDescriptorHandleForHeapStart();
cbvHandle.ptr += index * m_device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
m_device->CreateConstantBufferView(&cbvDesc, cbvHandle);
D3D12_RANGE readRange;
readRange.Begin = 0;
readRange.End = 0;
ThrowIfFailed((*buffer)->Map(0, &readRange, mappedPtr));
}
AllocateConstantBuffer(0, sizeof(m_value0), &m_value0Resource, reinterpret_cast<void**>&m_constPtrvalue0));
AllocateConstantBuffer(1, sizeof(m_value1), &m_value1Resource, reinterpret_cast<void**>&m_constPtrvalue1));
The problem is when I want to feed them to the pipeline. When rendering, I used
auto cbvHandle = m_cbvHeap->GetGPUDescriptorHandleForHeapStart();
m_commandList->SetGraphicsRootDescriptorTable(0, cbvHandle);
The result I get is only register(b0) got the the correct value, and register(b1) remains uninitialized. What did I do wrong?
OK I got it to work. Turned out I need to change the shader a bit:
cbuffer a : register(b0) { float value0; }
cbuffer b : register(b1) { float value1; };
This gave me another question though, according to this link:
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-constants
the buffer names a and b should be optional, but when I tried that the shaders cannot compile. I guess that is a different question.

YCbCr Sampler in Vulkan

I've been trying to sample a YCbCr image in Vulkan but I keep getting incorrect results, and I was hoping someone might be able to spot my mistake.
I have a NV12 YCbCr image which I want to render onto two triangles forming a quad. If i understand correctly, the VkFormat that corresponds to NV12 is VK_FORMAT_G8_B8R8_2PLANE_420_UNORM. Below is the code that I would expect to work, but I'll try to explain what I'm trying to do as well:
Create a VkSampler with a VkSamplerYcbcrConversion (with the correct format) in pNext
Read NV12 data into staging buffer
Create VkImage with the correct format and specify that the planes are disjoint
Get memory requirements (and offset for plane 1) for each plane (0 and 1)
Allocate device local memory for the image data
Bind each plane to the correct location in memory
Copy staging buffer to image memory
Create VkImageView with the same format as the VkImage and the same VkSamplerYcbcrConversionInfo as the VkSampler in pNext.
Code:
VkSamplerYcbcrConversion ycbcr_sampler_conversion;
VkSamplerYcbcrConversionInfo ycbcr_info;
VkSampler ycbcr_sampler;
VkImage image;
VkDeviceMemory image_memory;
VkDeviceSize memory_offset_plane0, memory_offset_plane1;
VkImageView image_view;
enum YCbCrStorageFormat
{
NV12
};
unsigned char* ReadYCbCrFile(const std::string& filename, YCbCrStorageFormat storage_format, VkFormat vulkan_format, uint32_t* buffer_size, uint32_t* buffer_offset_plane1, uint32_t* buffer_offset_plane2)
{
std::ifstream file;
file.open(filename.c_str(), std::ios::in | std::ios::binary | std::ios::ate);
if (!file.is_open()) { ELOG("Failed to open YCbCr image"); }
*buffer_size = file.tellg();
file.seekg(0);
unsigned char* data;
switch (storage_format)
{
case NV12:
{
if (vulkan_format != VK_FORMAT_G8_B8R8_2PLANE_420_UNORM)
{
ILOG("A 1:1 relationship doesn't exist between NV12 and 420, exiting");
exit(1);
}
*buffer_offset_plane1 = (*buffer_size / 3) * 2;
*buffer_offset_plane2 = 0; //Not used
data = new unsigned char[*buffer_size];
file.read((char*)(data), *buffer_size);
break;
}
default:
ELOG("A YCbCr storage format is required");
break;
}
file.close();
return data;
}
VkFormatProperties format_properties;
vkGetPhysicalDeviceFormatProperties(physical_device, VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, &format_properties);
bool cosited = false, midpoint = false;
if (format_properties.optimalTilingFeatures & VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT)
{
cosited = true;
}
else if (format_properties.optimalTilingFeatures & VK_FORMAT_FEATURE_MIDPOINT_CHROMA_SAMPLES_BIT)
{
midpoint = true;
}
if (!cosited && !midpoint)
{
ELOG("Nither VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT nor VK_FORMAT_FEATURE_MIDPOINT_CHROMA_SAMPLES_BIT is supported for VK_FORMAT_G8_B8R8_2PLANE_420_UNORM");
}
VkSamplerYcbcrConversionCreateInfo conversion_info = {};
conversion_info.sType = VK_STRUCTURE_TYPE_SAMPLER_YCBCR_CONVERSION_CREATE_INFO;
conversion_info.pNext = NULL;
conversion_info.format = VK_FORMAT_G8_B8R8_2PLANE_420_UNORM;
conversion_info.ycbcrModel = VK_SAMPLER_YCBCR_MODEL_CONVERSION_YCBCR_709;
conversion_info.ycbcrRange = VK_SAMPLER_YCBCR_RANGE_ITU_FULL;
conversion_info.components.r = VK_COMPONENT_SWIZZLE_IDENTITY;
conversion_info.components.g = VK_COMPONENT_SWIZZLE_IDENTITY;
conversion_info.components.b = VK_COMPONENT_SWIZZLE_IDENTITY;
conversion_info.components.a = VK_COMPONENT_SWIZZLE_IDENTITY;
if (cosited)
{
conversion_info.xChromaOffset = VK_CHROMA_LOCATION_COSITED_EVEN;
conversion_info.yChromaOffset = VK_CHROMA_LOCATION_COSITED_EVEN;
}
else
{
conversion_info.xChromaOffset = VK_CHROMA_LOCATION_MIDPOINT;
conversion_info.yChromaOffset = VK_CHROMA_LOCATION_MIDPOINT;
}
conversion_info.chromaFilter = VK_FILTER_LINEAR;
conversion_info.forceExplicitReconstruction = VK_FALSE;
VkResult res = vkCreateSamplerYcbcrConversion(logical_device, &conversion_info, NULL, &ycbcr_sampler_conversion);
CHECK_VK_RESULT(res, "Failed to create YCbCr conversion sampler");
ILOG("Successfully created YCbCr conversion");
ycbcr_info.sType = VK_STRUCTURE_TYPE_SAMPLER_YCBCR_CONVERSION_INFO;
ycbcr_info.pNext = NULL;
ycbcr_info.conversion = ycbcr_sampler_conversion;
VkSamplerCreateInfo sampler_info = {};
sampler_info.sType = VK_STRUCTURE_TYPE_SAMPLER_CREATE_INFO;
sampler_info.pNext = &ycbcr_info;
sampler_info.flags = 0;
sampler_info.magFilter = VK_FILTER_LINEAR;
sampler_info.minFilter = VK_FILTER_LINEAR;
sampler_info.mipmapMode = VK_SAMPLER_MIPMAP_MODE_LINEAR;
sampler_info.addressModeU = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
sampler_info.addressModeV = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
sampler_info.addressModeW = VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE;
sampler_info.mipLodBias = 0.0f;
sampler_info.anisotropyEnable = VK_FALSE;
//sampler_info.maxAnisotropy IGNORED
sampler_info.compareEnable = VK_FALSE;
//sampler_info.compareOp = IGNORED
sampler_info.minLod = 0.0f;
sampler_info.maxLod = 1.0f;
sampler_info.borderColor = VK_BORDER_COLOR_FLOAT_OPAQUE_BLACK;
sampler_info.unnormalizedCoordinates = VK_FALSE;
res = vkCreateSampler(logical_device, &sampler_info, NULL, &ycbcr_sampler);
CHECK_VK_RESULT(res, "Failed to create YUV sampler");
ILOG("Successfully created sampler with YCbCr in pNext");
std::string filename = "tree_nv12_1920x1080.yuv";
uint32_t width = 1920, height = 1080;
VkFormat format = VK_FORMAT_G8_B8R8_2PLANE_420_UNORM;
uint32_t buffer_size, buffer_offset_plane1, buffer_offset_plane2;
unsigned char* ycbcr_data = ReadYCbCrFile(filename, NV12, VK_FORMAT_G8_B8R8_2PLANE_420_UNORM, &buffer_size, &buffer_offset_plane1, &buffer_offset_plane2);
//Load image into staging buffer
VkDeviceMemory stage_buffer_memory;
VkBuffer stage_buffer = create_vk_buffer(buffer_size, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stage_buffer_memory);
void* stage_memory_ptr;
vkMapMemory(logical_device, stage_buffer_memory, 0, buffer_size, 0, &stage_memory_ptr);
memcpy(stage_memory_ptr, ycbcr_data, buffer_size);
vkUnmapMemory(logical_device, stage_buffer_memory);
delete[] ycbcr_data;
//Create image
VkImageCreateInfo img_info = {};
img_info.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
img_info.flags = VK_IMAGE_CREATE_DISJOINT_BIT;
img_info.imageType = VK_IMAGE_TYPE_2D;
img_info.extent.width = width;
img_info.extent.height = height;
img_info.extent.depth = 1;
img_info.mipLevels = 1;
img_info.arrayLayers = 1;
img_info.format = format;
img_info.tiling = VK_IMAGE_TILING_LINEAR;//VK_IMAGE_TILING_OPTIMAL;
img_info.initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED;
img_info.usage = VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_IMAGE_USAGE_SAMPLED_BIT;
img_info.samples = VK_SAMPLE_COUNT_1_BIT;
img_info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
VkResult result = vkCreateImage(logical_device, &img_info, NULL, &image);
CHECK_VK_RESULT(result, "vkCreateImage failed to create image handle");
ILOG("Image created!");
//Get memory requirements for each plane and combine
//Plane 0
VkImagePlaneMemoryRequirementsInfo image_plane_info = {};
image_plane_info.sType = VK_STRUCTURE_TYPE_IMAGE_PLANE_MEMORY_REQUIREMENTS_INFO;
image_plane_info.pNext = NULL;
image_plane_info.planeAspect = VK_IMAGE_ASPECT_PLANE_0_BIT;
VkImageMemoryRequirementsInfo2 image_info2 = {};
image_info2.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_REQUIREMENTS_INFO_2;
image_info2.pNext = &image_plane_info;
image_info2.image = image;
VkImagePlaneMemoryRequirementsInfo memory_plane_requirements = {};
memory_plane_requirements.sType = VK_STRUCTURE_TYPE_IMAGE_PLANE_MEMORY_REQUIREMENTS_INFO;
memory_plane_requirements.pNext = NULL;
memory_plane_requirements.planeAspect = VK_IMAGE_ASPECT_PLANE_0_BIT;
VkMemoryRequirements2 memory_requirements2 = {};
memory_requirements2.sType = VK_STRUCTURE_TYPE_MEMORY_REQUIREMENTS_2;
memory_requirements2.pNext = &memory_plane_requirements;
vkGetImageMemoryRequirements2(logical_device, &image_info2, &memory_requirements2);
VkDeviceSize image_size = memory_requirements2.memoryRequirements.size;
uint32_t image_bits = memory_requirements2.memoryRequirements.memoryTypeBits;
//Set offsets
memory_offset_plane0 = 0;
memory_offset_plane1 = image_size;
//Plane 1
image_plane_info.planeAspect = VK_IMAGE_ASPECT_PLANE_1_BIT;
memory_plane_requirements.planeAspect = VK_IMAGE_ASPECT_PLANE_1_BIT;
vkGetImageMemoryRequirements2(logical_device, &image_info2, &memory_requirements2);
image_size += memory_requirements2.memoryRequirements.size;
image_bits = image_bits | memory_requirements2.memoryRequirements.memoryTypeBits;
//Allocate image memory
VkMemoryAllocateInfo allocate_info = {};
allocate_info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
allocate_info.allocationSize = image_size;
allocate_info.memoryTypeIndex = get_device_memory_type(image_bits, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT);
result = vkAllocateMemory(logical_device, &allocate_info, NULL, &image_memory);
CHECK_VK_RESULT(result, "vkAllocateMemory failed to allocate image memory");
//Bind each image plane to memory
std::vector<VkBindImageMemoryInfo> bind_image_memory_infos(2);
//Plane 0
VkBindImagePlaneMemoryInfo bind_image_plane0_info = {};
bind_image_plane0_info.sType = VK_STRUCTURE_TYPE_BIND_IMAGE_PLANE_MEMORY_INFO;
bind_image_plane0_info.pNext = NULL;
bind_image_plane0_info.planeAspect = VK_IMAGE_ASPECT_PLANE_0_BIT;
VkBindImageMemoryInfo& bind_image_memory_plane0_info = bind_image_memory_infos[0];
bind_image_memory_plane0_info.sType = VK_STRUCTURE_TYPE_BIND_IMAGE_MEMORY_INFO;
bind_image_memory_plane0_info.pNext = &bind_image_plane0_info;
bind_image_memory_plane0_info.image = image;
bind_image_memory_plane0_info.memory = image_memory;
bind_image_memory_plane0_info.memoryOffset = memory_offset_plane0;
//Plane 1
VkBindImagePlaneMemoryInfo bind_image_plane1_info = {};
bind_image_plane1_info.sType = VK_STRUCTURE_TYPE_BIND_IMAGE_PLANE_MEMORY_INFO;
bind_image_plane1_info.pNext = NULL;
bind_image_plane1_info.planeAspect = VK_IMAGE_ASPECT_PLANE_1_BIT;
VkBindImageMemoryInfo& bind_image_memory_plane1_info = bind_image_memory_infos[1];
bind_image_memory_plane1_info.sType = VK_STRUCTURE_TYPE_BIND_IMAGE_MEMORY_INFO;
bind_image_memory_plane1_info.pNext = &bind_image_plane1_info;
bind_image_memory_plane1_info.image = image;
bind_image_memory_plane1_info.memory = image_memory;
bind_image_memory_plane1_info.memoryOffset = memory_offset_plane1;
vkBindImageMemory2(logical_device, bind_image_memory_infos.size(), bind_image_memory_infos.data());
context.transition_vk_image_layout(image, format, VK_IMAGE_ASPECT_COLOR_BIT, VK_IMAGE_LAYOUT_PREINITIALIZED, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);
//Copy staging buffer to device local buffer
VkCommandBuffer tmp_cmd_buffer = begin_tmp_vk_cmd_buffer();
std::vector<VkBufferImageCopy> plane_regions(2);
plane_regions[0].bufferOffset = 0;
plane_regions[0].bufferRowLength = 0;
plane_regions[0].bufferImageHeight = 0;
plane_regions[0].imageSubresource.aspectMask = VK_IMAGE_ASPECT_PLANE_0_BIT;
plane_regions[0].imageSubresource.mipLevel = 0;
plane_regions[0].imageSubresource.baseArrayLayer = 0;
plane_regions[0].imageSubresource.layerCount = 1;
plane_regions[0].imageOffset = { 0, 0, 0 };
plane_regions[0].imageExtent = { width, height, 1 };
plane_regions[1].bufferOffset = buffer_offset_plane1;
plane_regions[1].bufferRowLength = 0;
plane_regions[1].bufferImageHeight = 0;
plane_regions[1].imageSubresource.aspectMask = VK_IMAGE_ASPECT_PLANE_1_BIT;
plane_regions[1].imageSubresource.mipLevel = 0;
plane_regions[1].imageSubresource.baseArrayLayer = 0;
plane_regions[1].imageSubresource.layerCount = 1;
plane_regions[1].imageOffset = { 0, 0, 0 };
plane_regions[1].imageExtent = { width / 2, height / 2, 1 };
vkCmdCopyBufferToImage(tmp_cmd_buffer, stage_buffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, plane_regions.size(), plane_regions.data());
end_tmp_vk_cmd_buffer(tmp_cmd_buffer); //Submit and waits
vkFreeMemory(logical_device, stage_buffer_memory, NULL);
vkDestroyBuffer(logical_device, stage_buffer, NULL);
transition_vk_image_layout(image, format, VK_IMAGE_ASPECT_COLOR_BIT, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL);
VkImageViewCreateInfo image_view_info = {};
image_view_info.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
image_view_info.pNext = &ycbcr_info;
image_view_info.flags = 0;
image_view_info.image = image;
image_view_info.viewType = VK_IMAGE_VIEW_TYPE_2D;
image_view_info.format = format;
image_view_info.components.r = VK_COMPONENT_SWIZZLE_IDENTITY;
image_view_info.components.b = VK_COMPONENT_SWIZZLE_IDENTITY;
image_view_info.components.g = VK_COMPONENT_SWIZZLE_IDENTITY;
image_view_info.components.a = VK_COMPONENT_SWIZZLE_IDENTITY;
image_view_info.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
image_view_info.subresourceRange.baseMipLevel = 0;
image_view_info.subresourceRange.levelCount = 1;
image_view_info.subresourceRange.baseArrayLayer = 0;
image_view_info.subresourceRange.layerCount = 1;
VkResult res = vkCreateImageView(logical_device, &image_view_info, NULL, &.image_view);
CHECK_VK_RESULT(res, "Failed to create image view");
ILOG("Successfully created image, allocated image memory and created image view");
I receive one validation error: vkCmdCopyBufferToImage() parameter, VkImageAspect pRegions->imageSubresource.aspectMask, is an unrecognized enumerator, but from inspecting the validation code, it seems that it's just a bit outdated and this shouldn't be an issue.
The rest of the code just sets up regular descriptor layouts/pools and allocated and updates accordingly (I've verified with a regular RGB texture).
The fragment shader is as follows:
vec2 uv = vec2(gl_FragCoord.x / 1024.0, 1.0 - (gl_FragCoord.y / 1024.0));
out_color = vec4(texture(ycbcr_image, uv).rgb, 1.0f);
When I run my program I only get a red components (the image is essentially a greyscale image). from a little testing, it seems that the VkSamplerYcbcrconversion setup as removing it from both the VkSamplerCreateInfo.pNext and VkImageViewCreateInfo.pNext doesn't change anything.
I've also looked here, Khronos YCbCr tests, but I can't find any real mistake.
Solution: according to the spec, sec. 12.1, Conversion must be fixed at pipeline creation time, through use of a combined image sampler with an immutable sampler in VkDescriptorSetLayoutBinding.
By adding the ycbcr_sampler to pImmutableSamplers when setting up the descriptor set layout binding it now works:
VkDescriptorSetLayoutBinding image_binding = {};
image_binding.binding = 0;
image_binding.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
image_binding.descriptorCount = 1;
image_binding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
image_binding.pImmutableSamplers = &ycbcr_sampler;

CNTK Evaluate model has two inputs C++

I have a project based on CNTK 2.3. I used the code from the integration tests to train MNIST classifier like this:
auto device = DeviceDescriptor::GPUDevice(0);
const size_t inputDim = sizeBlob * sizeBlob;
const size_t numOutputClasses = numberOfClasses;
const size_t hiddenLayerDim = 200;
auto input = InputVariable({ inputDim }, CNTK::DataType::Float, L"features");
auto scaledInput = ElementTimes(Constant::Scalar(0.00390625f, device), input);
auto classifierOutput = FullyConnectedDNNLayer(scaledInput, hiddenLayerDim, device, std::bind(Sigmoid, _1, L""));
auto outputTimesParam = Parameter(NDArrayView::RandomUniform<float>({ numOutputClasses, hiddenLayerDim }, -0.05, 0.05, 1, device));
auto outputBiasParam = Parameter(NDArrayView::RandomUniform<float>({ numOutputClasses }, -0.05, 0.05, 1, device));
classifierOutput = Plus(outputBiasParam, Times(outputTimesParam, classifierOutput), L"classifierOutput");
auto labels = InputVariable({ numOutputClasses }, CNTK::DataType::Float, L"labels");
auto trainingLoss = CNTK::CrossEntropyWithSoftmax(classifierOutput, labels, L"lossFunction");;
auto prediction = CNTK::ClassificationError(classifierOutput, labels, L"classificationError");
// Test save and reload of model
Variable classifierOutputVar = classifierOutput;
Variable trainingLossVar = trainingLoss;
Variable predictionVar = prediction;
auto combinedNet = Combine({ trainingLoss, prediction, classifierOutput }, L"MNISTClassifier");
//SaveAndReloadModel<float>(combinedNet, { &input, &labels, &trainingLossVar, &predictionVar, &classifierOutputVar }, device);
classifierOutput = classifierOutputVar;
trainingLoss = trainingLossVar;
prediction = predictionVar;
const size_t minibatchSize = 64;
const size_t numSamplesPerSweep = 60000;
const size_t numSweepsToTrainWith = 2;
const size_t numMinibatchesToTrain = (numSamplesPerSweep * numSweepsToTrainWith) / minibatchSize;
auto featureStreamName = L"features";
auto labelsStreamName = L"labels";
auto minibatchSource = TextFormatMinibatchSource(trainingSet, { { featureStreamName, inputDim },{ labelsStreamName, numOutputClasses } });
auto featureStreamInfo = minibatchSource->StreamInfo(featureStreamName);
auto labelStreamInfo = minibatchSource->StreamInfo(labelsStreamName);
LearningRateSchedule learningRatePerSample = TrainingParameterPerSampleSchedule<double>(0.003125);
auto trainer = CreateTrainer(classifierOutput, trainingLoss, prediction, { SGDLearner(classifierOutput->Parameters(), learningRatePerSample) });
size_t outputFrequencyInMinibatches = 20;
for (size_t i = 0; i < numMinibatchesToTrain; ++i)
{
auto minibatchData = minibatchSource->GetNextMinibatch(minibatchSize, device);
trainer->TrainMinibatch({ { input, minibatchData[featureStreamInfo] },{ labels, minibatchData[labelStreamInfo] } }, device);
PrintTrainingProgress(trainer, i, outputFrequencyInMinibatches);
size_t trainingCheckpointFrequency = 100;
if ((i % trainingCheckpointFrequency) == (trainingCheckpointFrequency - 1))
{
const wchar_t* ckpName = L"feedForward.net";
//trainer->SaveCheckpoint(ckpName);
//trainer->RestoreFromCheckpoint(ckpName);
}
}
combinedNet->Save(g_dnnFile);
That part works fine and I train the model then save to a model file. But when I try to evaluate a simple image to test the model it looks like something is wrong in the model.
// Load the model.
// The model is trained by <CNTK>/Examples/Image/Classification/ResNet/Python/TrainResNet_CIFAR10.py
// Please see README.md in <CNTK>/Examples/Image/Classification/ResNet about how to train the model.
FunctionPtr modelFunc = Function::Load(modelFile, device);
// Get input variable. The model has only one single input.
std::vector<Variable> inputs = modelFunc->Arguments();
Variable inputVar = modelFunc->Arguments()[0];
// The model has only one output.
// If the model has more than one output, use modelFunc->Outputs to get the list of output variables.
std::vector<Variable> outputs = modelFunc->Outputs();
Variable outputVar = outputs[0];
// Prepare input data.
// For evaluating an image, you first need to perform some image preprocessing to make sure that the input image has the correct size and layout
// that match the model inputs.
// Please note that the model used by this example expects the CHW image layout.
// inputVar.Shape[0] is image width, inputVar.Shape[1] is image height, and inputVar.Shape[2] is channels.
// For simplicity and avoiding external dependencies, we skip the preprocessing step here, and just use some artificially created data as input.
Mat image = imread(".....");
uint8_t* imagePtr = (uint8_t*)(image).data;
auto width = image.cols;
auto heigth = image.rows;
std::vector<float> inputData(inputVar.Shape().TotalSize());
for (size_t i = 0; i < inputData.size(); ++i)
{
auto curChVal = imagePtr[(i)];
inputData[i] = curChVal;
}
// Create input value and input data map
ValuePtr inputVal = Value::CreateBatch(inputVar.Shape(), inputData, device);
std::unordered_map<Variable, ValuePtr> inputDataMap = { { inputVar, inputVal } };
// Create output data map. Using null as Value to indicate using system allocated memory.
// Alternatively, create a Value object and add it to the data map.
std::unordered_map<Variable, ValuePtr> outputDataMap = { { outputVar, nullptr } };
// Start evaluation on the device
modelFunc->Evaluate(inputDataMap, outputDataMap, device);
// Get evaluate result as dense output
ValuePtr outputVal = outputDataMap[outputVar];
std::vector<std::vector<float>> outputData;
outputVal->CopyVariableValueTo(outputVar, outputData);
PrintOutput<float>(outputVar.Shape().TotalSize(), outputData);
I run the same code on c# and it works fine. What I found as a difference is that modelFunc->Arguments() should have one argument but it has two - it finds features and labels as two inputs but I need to have only feature as an input and it throws the following error:
Find input and output variables by name, instead of modelFunc->Arguments()[0].
Variable inputVar;
GetInputVariableByName(modelFunc, L"features", inputVar);
Variable outputVar;
GetOutputVaraiableByName(modelFunc, L"classifierOutput", outputVar);
GetInputVariableByName and GetOutputVaraiableByName() come from
https://github.com/Microsoft/CNTK/blob/v2.3.1/Tests/EndToEndTests/EvalClientTests/CNTKLibraryCPPEvalExamplesTest/EvalMultithreads.cpp#L316