problem with sending a float number in a stream in vivado_hls

problem with sending a float number in a stream in vivado_hls - c++

I am trying to do a simple image processing filter where the pixel values will be divided by half to reduce the intensity and I am trying to develop the hardware for the same. hence I am using vivado hls to generate the IP. As explained here https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Float-numbers-with-hls-stream/m-p/942747 to send floating numbers in a hls stream , an union needs to be used and I did the same. However, the results don't seem to be matching for the red and green components of the image whereas it is matching for the blue component of the image. It is a very simple algorithm where a pixel value will be divided by half.
I have been trying to resolve it but I am not able to see where the problem is. I have attached all the files below, can someone can help me resolve it??
////header file
#include "ap_fixed.h"
#include "hls_stream.h"
typedef union {
unsigned int i;
float r;
float g;
float b;
} conv;
typedef hls::stream <unsigned int> Stream_t;
void ftest(Stream_t& Sin,Stream_t& Sout);
////testbench
#include "stream_check_h.hpp"
int main()
{
Mat img_rev = imread("C:/Users/20181217/Desktop/images/imgs/output_fwd_v3.png");//(256x512)
Mat final_img(img_rev.rows,img_rev.cols,CV_8UC3);
Mat ref_img(img_rev.rows,img_rev.cols,CV_8UC3);
Stream_t S1,S2;
int err_r = 0;
int err_g = 0;
int err_b = 0;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
conv c;
c.r = (float)img_rev.at<Vec3b>(i,j)[0];
c.g = (float)img_rev.at<Vec3b>(i,j)[1];
c.b = (float)img_rev.at<Vec3b>(i,j)[2];
S1 << c.i;
}
}
ftest(S1,S2);
conv c;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
S2 >> c.i;
final_img.at<Vec3b>(i,j)[0]=(unsigned char)c.r;
final_img.at<Vec3b>(i,j)[1]=(unsigned char)c.g;
final_img.at<Vec3b>(i,j)[2]=(unsigned char)c.b;
ref_img.at<Vec3b>(i,j)[0] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[0])/2.0);
ref_img.at<Vec3b>(i,j)[1] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[1])/2.0);
ref_img.at<Vec3b>(i,j)[2] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[2])/2.0);
}
}
Mat diff;
cout<<diff;
diff= abs(final_img-ref_img);
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
if((int)diff.at<Vec3b>(i,j)[0] > 0)
{
err_r++;
cout<<"expected value: "<<(int)ref_img.at<Vec3b>(i,j)[0]<<", final_value: "<<(int)final_img.at<Vec3b>(i,j)[0]<<", actual value:"<<(int)img_rev.at<Vec3b>(i,j)[0]<<endl;
}
if((int)diff.at<Vec3b>(i,j)[1] > 0)
err_g++;
if((int)diff.at<Vec3b>(i,j)[2] > 0)
err_b++;
}
}
cout<<"number of errors: "<<err_r<<", "<<err_g<<", "<<err_b;
return 0;
}
////core
#include "stream_check_h.hpp"
void ftest(Stream_t& Sin,Stream_t& Sout)
{
conv cin,cout;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
Sin >> cin.i;
cout.r = cin.r/2.0 ;
cout.g = cin.g/2.0 ;
cout.b = cin.b/2.0 ;
Sout << cout.i;
}
}
}
when I debugged, it showed that the blue components of the pixels are matching. for one red pixel it showed me the following:
expected value: 22, final_value: 14, actual value:45
and the total errors for red, green, and blue are:
number of errors: 126773, 131072, 0
I am not able to see why it is going wrong for red and green. I posted here hoping a fresh set of eyes would help my problem.
Thanks in advance

I'm assuming you're using a 32bit-wide stream with 3 RGB pixels 8bit unsigned (CV_8U3). I believe the problem with the union type in your case is the overlapping of its three members (not just like the one float value in the example you cite). This means that by doing the division, you're actually doing it over the whole 32bit data you're receiving.
I possible workaround I quickly cam up with would be to cast the unsigned int you're getting from the stream into an ap_uint<32> type, then chop it in the R, G, B chunks (with the range() method) and divide. Finally, assemble back the result and stream it back.
unsigned int packet;
Sin >> packet;
ap_uint<32> packet_uint32 = *((ap_uint<32>*)&packet); // casting (not elegant, but works)
ap_int<8> b = packet_uint32.range(7, 0);
ap_int<8> g = packet_uint32.range(15, 8);
ap_int<8> r = packet_uint32.range(23, 16); // In case they are in the wrong bit range/order, just flip the r, g, b assignements
b /= 2;
g /= 2;
r /= 2;
packet_uint32.range(7, 0) = b;
packet_uint32.range(15, 8) = g;
packet_uint32.range(23, 16) = r;
packet = packet_uint32.to_int();
Sout << packet;
NOTE: I've reused the same variables in the code above: HLS shouldn't complain about it and come out with a good RTL anyway. In case it shouldn't, just create new ones.

Related

What's the correct way to assign one GPU memory buffer value from another GPU memory buffer with some arithmetic on each source buffer's element?

I'm a newbie for GPU programming using Cuda toolkit, and I have to write some code offering the functionality as I mentioned in the title.
I'd like to paste the code to show what exactly I want to do.
void CTrtModelWrapper::forward(void **bindings,
unsigned height,
unsigned width,
short channel,
ColorSpaceFmt colorFmt,
PixelDataType pixelType) {
uint16_t *devInRawBuffer_ptr = (uint16_t *) bindings[0];
uint16_t *devOutRawBuffer_ptr = (uint16_t *) bindings[1];
const unsigned short bit = 16;
float *devInputBuffer_ptr = nullptr;
float *devOutputBuffer_ptr = nullptr;
unsigned volume = height * width * channel;
common::cudaCheck(cudaMalloc((void **) &devInputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
common::cudaCheck(cudaMalloc((void **) &devOutputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
unsigned short npos = 0;
switch (pixelType) {
case PixelDataType::PDT_INT8: // high 8bit
npos = bit - 8;
break;
case PixelDataType::PDT_INT10: // high 10bit
npos = bit - 10;
break;
default:
break;
}
switch (colorFmt) {
case CFMT_RGB: {
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); // SEGMENTATION Fault at this line
}
}
break;
default:
break;
}
void *rtBindings[2] = {devInputBuffer_ptr, devOutputBuffer_ptr};
// forward
this->_forward(rtBindings);
// convert output
unsigned short ef_bit = bit - npos;
switch (colorFmt) {
case CFMT_RGB: {
for (unsigned i = 0; i < volume; ++i) {
devOutRawBuffer_ptr[i] = clip< uint16_t >((uint16_t) devOutputBuffer_ptr[i],
0,
(uint16_t) pow(2, ef_bit)) << npos;
}
}
break;
default:
break;
}
}
bindings is a pointer to an array, the 1st element in the array is a device pointer that points to a buffer allocated using cudaMalloc on the gpu, each element in the buffer is a 16bit integer.the 2nd one the same, used to store the output data.
height,width,channel,colorFmt(RGB here),pixelType(PDT_INT8, aka 8bit) respective to the image height, width,channel number, colorspace, bits to store one pixel value.
the _forward function requires a pointer to an array, similar to bindings except that each element in the buffer should be a 32bit float number.
so I make some transformation using a loop
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); // SEGMENTATION Fault at this line
}
the >> operation is because the actual 8bit data is stored in the high 8 bit.
SEGMENTATION FAULT occurred at this line of code devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); and i equals 0.
I try to separate this code into several line:
uint16_t value = devInRawBuffer_ptr[i];
float transferd = float(value >> npos);
devInputBuffer_ptr[i] = transferd;
and SEGMENTATION FAULT occurred at this line uint16_t value = devInRawBuffer_ptr[i];
I wonder that is this a valid way to assign value to an allocated gpu memory buffer?
PS: the buffer given in bindings are totally fine. they are from host memory using cudaMemcpy before the call to forward function, but I still paste the code below
nvinfer1::DataType type = nvinfer1::DataType::kHALF;
HostBuffer hostInputBuffer(volume, type);
DeviceBuffer deviceInputBuffer(volume, type);
HostBuffer hostOutputBuffer(volume, type);
DeviceBuffer deviceOutputBuffer(volume, type);
// HxWxC --> WxHxC
auto *hostInputDataBuffer = static_cast<unsigned short *>(hostInputBuffer.data());
for (unsigned w = 0; w < W; ++w) {
for (unsigned h = 0; h < H; ++h) {
for (unsigned c = 0; c < C; ++c) {
hostInputDataBuffer[w * H * C + h * C + c] = (unsigned short )(*(ppm.buffer.get() + h * W * C + w * C + c));
}
}
}
auto ret = cudaMemcpy(deviceInputBuffer.data(), hostInputBuffer.data(), volume * getElementSize(type),
cudaMemcpyHostToDevice);
if (ret != 0) {
std::cout << "CUDA failure: " << ret << std::endl;
return EXIT_FAILURE;
}
void *bindings[2] = {deviceInputBuffer.data(), deviceOutputBuffer.data()};
model->forward(bindings, H, W, C, sbsisr::ColorSpaceFmt::CFMT_RGB, sbsisr::PixelDataType::PDT_INT8);

In CUDA, it's generally not advisable to dereference a device pointer in host code. For example, you are creating a "device pointer" when you use cudaMalloc:
common::cudaCheck(cudaMalloc((void **) &devInputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
From the code you have posted, it's not possible to deduce that for devInRawBuffer_ptr but I'll assume it also is a device pointer.
In that case, to perform this operation:
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos);
}
You would launch a CUDA kernel, something like this:
// put this function definition at file scope
__global__ void shift_kernel(float *dst, uint16_t *src, size_t sz, unsigned short npos){
for (size_t idx = blockIdx.x*blockDim.x+threadIdx.x, idx < sz; idx += gridDim.x*blockDim.x) dst[idx] = (float)((src[idx]) >> npos);
}
// call it like this in your code:
kernel<<<160, 1024>>>(devInputBuffer_ptr, devInRawBuffer_ptr, volume, npos);
(coded in browser, not tested)
If you'd like to learn more about what's going on here, you may wish to study CUDA. For example, you can get most of the basic concepts here and by studying the CUDA sample code vectorAdd. The grid-stride loop is discussed here.

Using saturate_cast or not

This is a simple program to change contrast and brightness of an image. I have noticed that there is a an another program with one simple difference:saturate_cast is added to code.
And I don't realize what is the reason of doing this and there is no need to converting to unsigned char or uchar both code (with saturate_cast<uchar> and to not use this) are outputting the same result. I appreciate if anyone help.
Here it is code :
#include "opencv2/imgcodecs.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include "Source.h"
using namespace cv;
double alpha;
int beta;
int main(int, char** argv)
{
/// Read image given by user
Mat image = imread(argv[1]);
Mat image2 = Mat::zeros(image.size(), image.type());
/// Initialize values
std::cout << " Basic Linear Transforms " << std::endl;
std::cout << "-------------------------" << std::endl;
std::cout << "* Enter the alpha value [1.0-3.0]: ";std::cin >> alpha;
std::cout << "* Enter the beta value [0-100]: "; std::cin >> beta;
for (int x = 0; x < image.rows; x++)
{
for (int y = 0; y < image.cols; y++)
{
for (int c = 0; c < 3; c++)
{
image2.at<Vec3b>(x, y)[c] =
saturate_cast<uchar>(alpha*(image.at<Vec3b>(x, y)[c]) + beta);
}
}
/// Create Windows
namedWindow("Original Image", 1);
namedWindow("New Image", 1);
/// Show stuff
imshow("Original Image", image);
imshow("New Image", image2);
/// Wait until user press some key
waitKey();
return 0;
}

Since the result of your expression may go outside the valid range for uchar, i.e. [0,255], you'd better always use saturate_cast.
In your case, the result of the expression: alpha*(image.at<Vec3b>(x, y)[c]) + beta is a double, so it's safer to use saturate_cast<uchar> to clamp values correctly.
Also, this improves readability, since it's easy to see that you want a uchar out of an expression.
Without using saturate_cast you may have unexpected values:
uchar u1 = 257; // u1 = 1, why a very bright value is set to almost black?
uchar u2 = saturate_cast<uchar>(257); // u2 = 255, a very bright value is set to white

inline unsigned char saturate_cast_uchar(double val) {
val += 0.5; // to round the value
return unsigned char(val < 0 ? 0 : (val > 0xff ? 0xff : val));
}
if val lies between 0 to 255 than this function will return rounded value,
if val lies outside the range [0, 255] than it will return lower or upper boundary value.

why does this function keep crashing?

Forethought: If it is needed then I can add the class definition.
Problem: I get a STATUS_ACCESS_VIOLATION whenever I try to run this function in my program. I was wondering what was going on. Am I out of bounds somewhere? If I could reason it out myself I would. But I cannot figure it out alone. I'm very close to just hiring someone to do the debugging for me. It's worth my wile. So anyway, this needs to be looked over and given a little TLC. Thanks in advance!
int S_Rend::count(bitset<8> alpha, bitset<8> spec) {
int bn;
vector< bitset<8> > cnt;
bitset<8> curr;
int chmp;
eta = (alpha & spec);
theta = (alpha | spec);
cnt[0] = eta & alpha;
cnt[1] = eta | alpha;
cnt[2] = eta & spec;
cnt[3] = eta | spec;
cnt[4] = theta & alpha;
cnt[5] = theta | alpha;
cnt[6] = theta & spec;
cnt[7] = theta | spec;
cnt[8] = cnt[0] & cnt[5];
cnt[9] = cnt[6] | cnt[1];
cnt[10] = cnt[2] & cnt[7];
cnt[11] = cnt[4] | cnt[3];
for (int i=0;i<11;i++)
for (int j=i;j<=11;j++) {
curr = cnt[i];
if (cnt[j] == curr)
bn++;
if (bn>chmp)
chmp=bn;
}
return chmp;
}
int S_Rend::s_render(ifstream& in, ofstream& out) {
int i, n;
int t;
int chk;
in >> lambda;
in >> size;
in >> delta;
in >> chk;
t=(int&)beta;
int bn=0;
while (size-1>=bn) {
t=s_nop((int&)t,0);
cred.push_back(t);
bn++;
}
if (cred[bn-1]==chk)
cout << "\nValidity Pass... Success!" << endl;
else {
printf("\nValidity Pass...Fail! %u != %u",cred[cred.size()-1],chk);
return 1;
}
cout << "\nWriting to Buffer..." << endl;
i=0;
spec = lambda;
int f;
while (bn-1>=0) {
alpha = (int&)cred[bn-1];
f=count(alpha, spec);
eta = (int&)f;
spec ^= alpha ^ eta;
btrace.push_back(f);
cout << f << " ";
bn--;
}
cout << "One more second..\n";
while (i<=bn-1) {
delta = (int&)btrace[bn];
out << (const char)(int&)delta;
i++;
}
cout << "\nBuffer Written... Exiting..\n";
in.close();
out.close();
printf("*");
return 0;
}

At this point :
while (bn) {
alpha = (int&)cred[bn];
f=count(alpha, spec);
bn == size and size == cred.size() so you read ouside the vector. Assuming cred is empty at start

The most glaring issue is that you're writing (and reading) from a vector with out-of-bounds indices:
int S_Rend::count(bitset<8> alpha, bitset<8> spec)
{
int bn;
vector< bitset<8> > cnt;
bitset<8> curr;
int chmp;
eta = (alpha & spec);
theta = (alpha | spec);
cnt[0] = eta & alpha; // <-- Illegal access
...
}
Well, we can stop right there.
The std::vector is not sized, so it cannot hold any elements. The last line assumes that vector can hold at least one item (item 0). Thus this is undefined behavior.
Size the vector appropriately before accessing elements by calling push_back, vector::resize, vector::insert, vector::emplace_back, or construct the vector by issuing one of the constructors that sizes the vector in addition to constructing it.
std::vector description
Since it seems you want 12 items, then you can construct it with 12 items.
int S_Rend::count(bitset<8> alpha, bitset<8> spec)
{
int bn;
vector< bitset<8> > cnt(12);
bitset<8> curr;
int chmp;
eta = (alpha & spec);
theta = (alpha | spec);
cnt[0] = eta & alpha; // <-- ok
...
}
Also, as a debugging aid, you could have used std::vector::at instead of operator [ ] to access the vector items. Using at() would have immediately thrown a std::out_of_bounds exception indicating that you were accessing the vector with an invalid index (instead of the program continuing as if nothing is wrong).
The second thing that seems dubious is you casting a std::bitset to an int reference, or vice-versa. As you observed, without the cast, you get an error. Using a C-style cast to "shut the compiler up" is not a good idea, as all you're doing is bypassing the type-safety mechanism that C++ has in place at compile-time.
If the compiler says to you "this shouldn't be done", and then you override it by issuing a C-style cast, prepare to suffer the consequences if / when your program shows erratic behavior.
To properly convert between a std::bitset and an int, there are methods for this, namely std::to_ulong.
How to convert from bitset to an int

OpenCV: how to read .pfm files?

Is there a way to read .pfm files in OpenCV?
Thank you very much for any suggestions!

PFM is an uncommon image format and I don't know why the Middlebury dataset chose to use that, probably because it uses floating point values.
Anyway I was able to read the images with OpenCV:
import numpy as np
import cv2
groundtruth = cv2.imread('disp0.pfm', cv2.IMREAD_UNCHANGED)
Note the IMREAD_UNCHANGED flag. Somehow it is able to read all the correct values even if OpenCV does not support it.
But wait a minute: inf values are commonly used to set INVALID pixel disparity, so to properly display the image you should do:
# Remove infinite value to display
groundtruth[groundtruth==np.inf] = 0
# Normalize and convert to uint8
groundtruth = cv2.normalize(groundtruth, None, 0, 255, cv2.NORM_MINMAX).astype(np.uint8)
# Show
cv2.imshow("groundtruth", groundtruth)
cv2.waitKey(0)
cv2.destroyAllWindows()

Based on the description of the ".pfm" file formate (see http://netpbm.sourceforge.net/doc/pfm.html), I wrote the following read/write functions, which only depend standard C/C++ library. It is proved to work well on reading/writing the pfm file, like, the ground truth disparity ".pfm" files from MiddleBury Computer Vision (see http://vision.middlebury.edu/stereo/submit3/).
#ifndef _PGM_H_
#define _PGM_H_
#include <fstream>
#include <iostream>
#include <algorithm>
#include <string>
#include <cstdint>
#include <cstdlib>
#include <cstring>
#include <bitset>　/*std::bitset<32>*/
#include <cstdio>
enum PFM_endianness { BIG, LITTLE, ERROR};
class PFM {
public:
PFM();
inline bool is_little_big_endianness_swap(){
if (this->endianess == 0.f) {
std::cerr << "this-> endianness is not assigned yet!\n";
exit(0);
}
else {
uint32_t endianness = 0xdeadbeef;
//std::cout << "\n" << std::bitset<32>(endianness) << std::endl;
unsigned char * temp = (unsigned char *)&endianness;
//std::cout << std::bitset<8>(*temp) << std::endl;
PFM_endianness endianType_ = ((*temp) ^ 0xef == 0 ?
LITTLE : (*temp) ^ (0xde) == 0 ? BIG : ERROR);
// ".pfm" format file specifies that:
// positive scale means big endianess;
// negative scale means little endianess.
return ((BIG == endianType_) && (this->endianess < 0.f))
|| ((LITTLE == endianType_) && (this->endianess > 0.f));
}
}
template<typename T>
T * read_pfm(const std::string & filename) {
FILE * pFile;
pFile = fopen(filename.c_str(), "rb");
char c[100];
if (pFile != NULL) {
fscanf(pFile, "%s", c);
// strcmp() returns 0 if they are equal.
if (!strcmp(c, "Pf")) {
fscanf(pFile, "%s", c);
// atoi: ASCII to integer.
// itoa: integer to ASCII.
this->width = atoi(c);
fscanf(pFile, "%s", c);
this->height = atoi(c);
int length_ = this->width * this->height;
fscanf(pFile, "%s", c);
this->endianess = atof(c);
fseek(pFile, 0, SEEK_END);
long lSize = ftell(pFile);
long pos = lSize - this->width*this->height * sizeof(T);
fseek(pFile, pos, SEEK_SET);
T* img = new T[length_];
//cout << "sizeof(T) = " << sizeof(T);
fread(img, sizeof(T), length_, pFile);
fclose(pFile);
/* The raster is a sequence of pixels, packed one after another,
* with no delimiters of any kind. They are grouped by row,
* with the pixels in each row ordered left to right and
* the rows ordered bottom to top.
*/
T* tbimg = (T *)malloc(length_ * sizeof(T));// top-to-bottom.
//PFM SPEC image stored bottom -> top reversing image
for (int i = 0; i < this->height; i++) {
memcpy(&tbimg[(this->height - i - 1)*(this->width)],
&img[(i*(this->width))],
(this->width) * sizeof(T));
}
if (this->is_little_big_endianness_swap()){
std::cout << "little-big endianness transformation is needed.\n";
// little-big endianness transformation is needed.
union {
T f;
unsigned char u8[sizeof(T)];
} source, dest;
for (int i = 0; i < length_; ++i) {
source.f = tbimg[i];
for (unsigned int k = 0, s_T = sizeof(T); k < s_T; k++)
dest.u8[k] = source.u8[s_T - k - 1];
tbimg[i] = dest.f;
//cout << dest.f << ", ";
}
}
delete[] img;
return tbimg;
}
else {
std::cout << "Invalid magic number!"
<< " No Pf (meaning grayscale pfm) is missing!!\n";
fclose(pFile);
exit(0);
}
}
else {
std::cout << "Cannot open file " << filename
<< ", or it does not exist!\n";
fclose(pFile);
exit(0);
}
}
template<typename T>
void write_pfm(const std::string & filename, const T* imgbuffer,
const float & endianess_) {
std::ofstream ofs(filename.c_str(), std::ifstream::binary);
// ** 1) Identifier Line: The identifier line contains the characters
// "PF" or "Pf". PF means it's a color PFM.
// Pf means it's a grayscale PFM.
// ** 2) Dimensions Line:
// The dimensions line contains two positive decimal integers,
// separated by a blank. The first is the width of the image;
// the second is the height. Both are in pixels.
// ** 3) Scale Factor / Endianness:
// The Scale Factor / Endianness line is a queer line that jams
// endianness information into an otherwise sane description
// of a scale. The line consists of a nonzero decimal number,
// not necessarily an integer. If the number is negative, that
// means the PFM raster is little endian. Otherwise, it is big
// endian. The absolute value of the number is the scale
// factor for the image.
// The scale factor tells the units of the samples in the raster.
// You use somehow it along with some separately understood unit
// information to turn a sample value into something meaningful,
// such as watts per square meter.
ofs << "Pf\n"
<< this->width << " " << this->height << "\n"
<< endianess_ << "\n";
/* PFM raster:
* The raster is a sequence of pixels, packed one after another,
* with no delimiters of any kind. They are grouped by row,
* with the pixels in each row ordered left to right and
* the rows ordered bottom to top.
* Each pixel consists of 1 or 3 samples, packed one after another,
* with no delimiters of any kind. 1 sample for a grayscale PFM
* and 3 for a color PFM (see the Identifier Line of the PFM header).
* Each sample consists of 4 consecutive bytes. The bytes represent
* a 32 bit string, in either big endian or little endian format,
* as determined by the Scale Factor / Endianness line of the PFM
* header. That string is an IEEE 32 bit floating point number code.
* Since that's the same format that most CPUs and compiler use,
* you can usually just make a program use the bytes directly
* as a floating point number, after taking care of the
* endianness variation.
*/
int length_ = this->width*this->height;
this->endianess = endianess_;
T* tbimg = (T *)malloc(length_ * sizeof(T));
// PFM SPEC image stored bottom -> top reversing image
for (int i = 0; i < this->height; i++) {
memcpy(&tbimg[(this->height - i - 1)*this->width],
&imgbuffer[(i*this->width)],
this->width * sizeof(T));
}
if (this->is_little_big_endianness_swap()) {
std::cout << "little-big endianness transformation is needed.\n";
// little-big endianness transformation is needed.
union {
T f;
unsigned char u8[sizeof(T)];
} source, dest;
for (int i = 0; i < length_; ++i) {
source.f = tbimg[i];
for (size_t k = 0, s_T = sizeof(T); k < s_T; k++)
dest.u8[k] = source.u8[s_T - k - 1];
tbimg[i] = dest.f;
//cout << dest.f << ", ";
}
}
ofs.write((char *)tbimg, this->width*this->height * sizeof(T));
ofs.close();
free(tbimg);
}
inline float getEndianess(){return endianess;}
inline int getHeight(void){return height;}
inline int getWidth(void){return width;}
inline void setHeight(const int & h){height = h;}
inline void setWidth(const int & w){width = w;}
private:
int height;
int width;
float endianess;
};
#endif /* PGM_H_ */
Forgive me to leave lots of useless comments in the code.
A simple example shows the write/read:
int main(){
PFM pfm_rw;
string temp = "img/Motorcycle/disp0GT.pfm";
float * p_disp_gt = pfm_rw.read_pfm<float>(temp);
//int imgH = pfm_rw.getHeight();
//int imgW = pfm_rw.getWidth();
//float scale = pfm_rw.getEndianess();
string temp2 = "result/Motorcycle/disp0GT_n1.pfm";
pfm_rw.write_pfm<float>(temp2, p_disp_gt, -1.0f);
return 1;
}

As far as I know, OpenCV doesn't support to read PFM files directly.
You can refer to the code snippet here for a simple PFM reader, which will enable you to read PFM files into COLOR *data with COLOR defined as follows:
typedef struct {
float r;
float g;
float b;
} COLOR;

How to speed up my .bmp class?

Well greetings to you all :)
A few days ago I finally managed to create a functional C++ class to make .bmp images. Even though it's functional (no errors yet) it isn't efficient in terms of speed (in my opinion). Doing a few test to see how much time it took to write different sizes of images I ended up with these results:
Image Dimensions Time taken(in seconds) Comparison to the 1000x1000 image
10x100 0.0491 x 1000 = 49.1 seconds
100x100 0.2471 x 100 = 24.7 seconds
100x1000 2.3276 x 10 = 23.3 seconds
1000x1000 22.515 x 1 = 22.5 seconds
1000x10000 224.76 \ 10 = 22.4 seconds
For example the 10x100 image had 1000 pixels (each with with a ARGB channel [32 bits or 4 bytes]) plus the 54 bytes for the header, it took 0.05 seconds to write 4054 bytes (char).
I feel this is super slow, because my computer can copy a ~85MB file in like a second or two. I'm using fstream to do the writing to disk and any help to make the class go faster is appreciated. Thank You!!!
My class it's called SimpleBMP and here it is (I only put the revelent functions):
#include <fstream>
class SimpleBMP{
struct PIXEL{
unsigned char A, R, G, B;
}*PixelArray;
unsigned char *BMPHEADER, *BMPINFOHEADER;
std::string DATA;
unsigned int Size_Of_BMP, Size_Of_PixelArray;
int BMP_Width, BMP_Height;
public:
void SetPixel(int Column, int Row, unsigned char A, unsigned char R, unsigned char G, unsigned char B){
PixelArray[(Row*BMP_Width)+Column].A = A;
PixelArray[(Row*BMP_Width)+Column].R = R;
PixelArray[(Row*BMP_Width)+Column].G = G;
PixelArray[(Row*BMP_Width)+Column].B = B;
};
bool MakeImage(std::string Name){
Name.append(".bmp");
std::ofstream OffFile(Name, std::ios::out|std::ios::binary);
if(OffFile.is_open()){
DATA.clear();
for(int temp = 0; temp < 14; temp++){
BMPHEADER[temp] = 0x00;
};
BMPHEADER[0] = 'B';
BMPHEADER[1] = 'M';
BMPHEADER[2] = Size_Of_BMP;
BMPHEADER[3] = (Size_Of_BMP >> 8);
BMPHEADER[4] = (Size_Of_BMP >> 16);
BMPHEADER[5] = (Size_Of_BMP >> 24);
BMPHEADER[10] = 0x36;
for(int temp = 0; temp < 40; temp++){
BMPINFOHEADER[temp] = 0x00;
};
BMPINFOHEADER[0] = 0x28;
for(int temp = 0; temp < 4; temp++){
BMPINFOHEADER[temp+4] = (BMP_Width >> (temp*8));
};
for(int temp = 0; temp < 4; temp++){
BMPINFOHEADER[temp+8] = (BMP_Height >> (temp*8));
};
BMPINFOHEADER[12] = 0x01;
BMPINFOHEADER[14] = 0x20;
for(int temp = 0; temp < 4; temp++){
BMPINFOHEADER[temp+20] = (Size_Of_PixelArray >> (temp*8));
};
BMPINFOHEADER[24] = 0x13;
BMPINFOHEADER[25] = 0x0b;
BMPINFOHEADER[28] = 0x13;
BMPINFOHEADER[29] = 0x0b;
for(int temp = 0; temp < 14; temp++){
DATA.push_back(BMPHEADER[temp]);
};
for(int temp = 0; temp < 40; temp++){
DATA.push_back(BMPINFOHEADER[temp]);
};
for(int temp = 0; temp < (Size_Of_PixelArray/4); temp++){
DATA.push_back(PixelArray[temp].B);
DATA.push_back(PixelArray[temp].G);
DATA.push_back(PixelArray[temp].R);
DATA.push_back(PixelArray[temp].A);
};
OffFile.write(DATA.c_str(), Size_Of_BMP);
OffFile.close();
return true;
}
else
return false;
};
};

When running tests you should compile your project in release mode. Debug mode in most environments introduces additional checks and code. The debug libraries linked can also include additional checks such as bounds checking and validation of iterators that are not present in release mode. All of this can introduce performance hits that are not present in release mode.
There are other optimizations that you can apply such as reserving memory in DATA before loading the data. This will reduce the number of copies that need to be made when the buffer is expanded. Although the performance gain may not be significant it can definitely help. I suggest running your code through a profiler to see where all of the bottlenecks are and optimize accordingly.

If you know you are on a little-endian machine, you can completely skip the re-packing of the data, and just store the pixelarray data directly.
OffFile.Write((char *)&PixelArray, Size_Of_BMP);
It may not be quite as portable, but it will certainly speed up the saving to file.
(And you could have a
#ifdef LITTLE_ENDIAN
struct PIXEL{
unsigned char A, R, G, B;
};
#else
struct PIXEL{
unsigned char B, G, R, A;
};
#endif
PIXEL *PixelArray;
in the declaration.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

problem with sending a float number in a stream in vivado_hls - c++

Related

What's the correct way to assign one GPU memory buffer value from another GPU memory buffer with some arithmetic on each source buffer's element?

Using saturate_cast or not

why does this function keep crashing?

OpenCV: how to read .pfm files?

How to speed up my .bmp class?

Categories

Resources