I want to apply a simple derive/gradient filter, [-1, 0, 1], to an image from a .ppm file.
The raw binary data from the .ppm file is read into a one-dimensional array:
uint8_t* raw_image_data;
size_t n_rows, n_cols, depth;
// Open the file as an input binary file
std::ifstream file;
file.open("test_image.ppm", std::ios::in | std::ios::binary);
if (!file.is_open()) { /* error */ }
std::string temp_line;
// Check that it's a valid P6 file
if (!(std::getline(file, temp_line) && temp_line == "P6")) {}
// Then skip all the comments (lines that begin with a #)
while (std::getline(file, temp_line) && temp_line.at(0) == '#');
// Try read in the info about the number of rows and columns
try {
n_rows = std::stoi(temp_line.substr(0, temp_line.find(' ')));
n_cols = std::stoi(temp_line.substr(temp_line.find(' ')+1,temp_line.size()));
std::getline(file, temp_line);
depth = std::stoi(temp_line);
} catch (const std::invalid_argument & e) { /* stoi has failed */}
// Allocate memory and read in all image data from ppm
raw_image_data = new uint8_t[n_rows*n_cols*3];
file.read((char*)raw_image_data, n_rows*n_cols*3);
file.close();
I then read a grayscale image from the data into a two-dimensional array, called image_grayscale:
uint8_t** image_grayscale;
image_grayscale = new uint8_t*[n_rows];
for (size_t i = 0; i < n_rows; ++i) {
image_grayscale[i] = new uint8_t[n_cols];
}
// Convert linear array of raw image data to 2d grayscale image
size_t counter = 0;
for (size_t r = 0; r < n_rows; ++r) {
for (size_t c = 0; c < n_cols; ++c) {
image_grayscale[r][c] = 0.21*raw_image_data[counter]
+ 0.72*raw_image_data[counter+1]
+ 0.07*raw_image_data[counter+2];
counter += 3;
}
}
I want to write the resulting filtered image to another two-dimensional array, gradient_magnitude:
uint32_t** gradient_magnitude;
// Allocate memory
gradient_magnitude = new uint32_t*[n_rows];
for (size_t i = 0; i < n_rows; ++i) {
gradient_magnitude[i] = new uint32_t[n_cols];
}
// Filtering operation
int32_t grad_h, grad_v;
for (int r = 1; r < n_rows-1; ++r) {
for (int c = 1; c < n_cols-1; ++c) {
grad_h = image_grayscale[r][c+1] - image_grayscale[r][c-1];
grad_v = image_grayscale[r+1][c] - image_grayscale[r-1][c];
gradient_magnitude[r][c] = std::sqrt(pow(grad_h, 2) + pow(grad_v, 2));
}
}
Finally, I write the filtered image to a .ppm output.
std::ofstream out;
out.open("output.ppm", std::ios::out | std::ios::binary);
// ppm header
out << "P6\n" << n_rows << " " << n_cols << "\n" << "255\n";
// Write data to file
for (int r = 0; r < n_rows; ++r) {
for (int c = 0; c < n_cols; ++c) {
for (int i = 0; i < 3; ++i) {
out.write((char*) &gradient_magnitude[r][c],1);
}
}
}
out.close();
The output image, however, is a mess.
When I simply set grad_v = 0; in the loop (i.e. solely calculate the horizontal gradient), the output is seemingly correct:
When I instead set grad_h = 0; (i.e. solely calculate the vertical gradient), the output is strange:
It seems like part of the image has been circularly shifted, but I cannot understand why. Moreover, I have tried with many images and the same issue occurs.
Can anyone see any issues? Thanks so much!
Ok, first clue is that the image looks circularly shifted. This hints that strides are wrong. The core of your problem is simple:
n_rows = std::stoi(temp_line.substr(0, temp_line.find(' ')));
n_cols = std::stoi(temp_line.substr(temp_line.find(' ')+1,temp_line.size()));
but in the documentation you can read:
Each PPM image consists of the following:
A "magic number" for identifying the file type. A ppm image's magic number is the two
characters "P6".
Whitespace (blanks, TABs, CRs, LFs).
A width, formatted as ASCII characters in decimal.
Whitespace.
A height, again in ASCII decimal.
[...]
Width is columns, height is rows. So that's the classical error that you get when implementing image processing stuff: swapping rows and columns.
From a didactic point of view, why are you doing this mistake? My guess: poor debugging tools. After making a working example from your question (effort that I would have saved if you had provided a MCVE), I run to the end of image loading and used Image Watch to see the content of your image with #mem(raw_image_data, UINT8, 3, n_cols, n_rows, n_cols*3). Result:
Ok, let's try to swap them: #mem(raw_image_data, UINT8, 3, n_rows, n_cols, n_rows*3). Result:
Much better. Unfortunately I don't know how to specify RGB instead of BGR in Image Watch #mem pseudo command, so the wrong colors.
Then we come back to your code: please compile with all warnings on. Then I'd use more of the std::stream features for parsing your input and less std::stoi() or find(). Avoid memory allocation by using std::vector and make a (possibly template) class for images. Even if you stick to your pointer to pointer, don't make multiple new for each row: make a single new for the pointer at row 0, and have the other pointers point to it:
uint8_t** image_grayscale = new uint8_t*[n_rows];
image_grayscale[0] = new uint8_t[n_rows*n_cols];
for (size_t i = 1; i < n_rows; ++i) {
image_grayscale[i] = image_grayscale[i - 1] + n_cols;
}
Same effect, but easier to deallocate and to manage as a single piece of memory. For example, saving as a PGM becomes:
{
std::ofstream out("output.pgm", std::ios::binary);
out << "P5\n" << n_rows << " " << n_cols << "\n" << "255\n";
out.write(reinterpret_cast<char*>(image_grayscale[0]), n_rows*n_cols);
}
Fill your borders! Using the single allocation style I showed you you can do it as:
uint32_t** gradient_magnitude = new uint32_t*[n_rows];
gradient_magnitude[0] = new uint32_t[n_rows*n_cols];
for (size_t i = 1; i < n_rows; ++i) {
gradient_magnitude[i] = gradient_magnitude[i - 1] + n_cols;
}
std::fill_n(gradient_magnitude[0], n_rows*n_cols, 0);
Finally the gradient magnitude is an integer value between 0 and 360 (you used a uint32_t). Then you save only the least significant byte of it! Of course it's wrong. You need to map from [0,360] to [0,255]. How? You can saturate (if greater than 255 set to 255) or apply a linear scaling (*255/360). Of course you can do also other things, but it's not important.
Here you can see the result on a zoomed version of the three cases: saturate, scale, only LSB (wrong):
With the wrong version you see dark pixels where the value should be higer than 255.
Related
I'm trying to convert an hls file to jpeg. firstly, I used openh264 to convert HLS file to YUV. I got a two dimensional array containing Y, U, V buffer (*pData[3]). After that, I try to combine the three arrays into one to pass it to CompressYUYV2JPEG.
here is how I convert:
for(i = 0; i < l; i++) {
inbuf.push_back(yuvData[0][i]);
}
l = bufferInfo.UsrData.sSystemBuffer.iWidth*bufferInfo.UsrData.sSystemBuffer.iHeight/4;
for(i = 0; i < l; i++) {
inbuf.push_back(yuvData[1][i]);
}
l = bufferInfo.UsrData.sSystemBuffer.iWidth*bufferInfo.UsrData.sSystemBuffer.iHeight/4;
for(i = 0; i < l; i++) {
inbuf.push_back(yuvData[2][i]);
}
but unfortunately, It doesn't produce the expected result. What is the proper way to convert 2-dimensional YUV array into a one-dimensional array?
You need YUV422. That means YUYV. inbuf must be dividable by 4 for every input data. You can use
for(i = 0; i < l/2; i++) {
inbuf.push_back(yuvData[0][2*i]);
inbuf.push_back((yuvData[1][2*i] + yuvData[1][2*i + 1])/2);
inbuf.push_back(yuvData[0][2*i + 1]);
inbuf.push_back((yuvData[2][2*i] + yuvData[2][2*i + 1])/2);
}
In this code snippet all Y values are used but only the average of two Cr resp. Cb values are used. Of course the number of elements in each yuvData channel must be even. Otherwise you have to find a solution for the last element.
I just now saw that you use YUV420. Then you can use this snippet
for(i = 0; i < l/2; i++) {
inbuf.push_back(yuvData[0][2*i]);
inbuf.push_back(yuvData[1][i/2]);
inbuf.push_back(yuvData[0][2*i + 1]);
inbuf.push_back(yuvData[2][i/2]);
}
In this code all Y values are used once and all Cr resp. Cb values are used twice.
I'm writing a simple PGM file reader for a basic CV idea, and I'm having a weird issue. My method seems to work alright for symmetric files (255 x 255, for example), but when I try to read an asymmetric file (300 x 246), I get some weird input. One file reads to a certain point and then dumps ESCAPE characters (ASCII 27) into the remainder of the image (see below), and others just won't read. I think this might be some flawed logic or a memory issue. Any help would be appreciated.
// Process files of binary type (P5)
else if(holdString[1] == '5') {
// Assign fileType value
fileType = 5;
// Read in comments and discard
getline(fileIN, holdString);
// Read in image Width value
fileIN >> width;
// Read in image Height value
fileIN >> height;
// Read in Maximum Grayscale Value
fileIN >> max;
// Determine byte size if Maximum value is over 256 (1 byte)
if(max < 256) {
// Collection variable for bytes
char readChar;
// Assign image dynamic memory
*image = new int*[height];
for(int index = 0; index < height; index++) {
(*image)[index] = new int[width];
}
// Read in 1 byte at a time
for(int row = 0; row < height; row++) {
for(int column = 0; column < width; column++) {
fileIN.get(readChar);
(*image)[row][column] = (int) readChar;
}
}
// Close the file
fileIN.close();
} else {
// Assign image dynamic memory
// Read in 2 bytes at a time
// Close the file
}
}
Tinkered with it a bit, and came up with at least most of a solution. Using the .read() function, I was able to draw the whole file in and then read it piece by piece into the int array. I kept the dynamic memory because I wanted to draw in files of different sizes, but I did pay more attention to how it was read into the array, so thank you for the suggestion, Mark. The edits seem to work well on files up to 1000 pixels wide or tall, which is fine for what I'm using it for. After, it distorts, but I'll still take that over not reading the file.
if(max < 256) {
// Collection variable for bytes
int size = height * width;
unsigned char* data = new unsigned char[size];
// Assign image dynamic memory
*image = new int*[height];
for(int index = 0; index < height; index++) {
(*image)[index] = new int[width];
}
// Read in 1 byte at a time
fileIN.read(reinterpret_cast<char*>(data), size * sizeof(unsigned char));
// Close the file
fileIN.close();
// Set data to the image
for(int row = 0; row < height; row++) {
for(int column = 0; column < width; column++) {
(*image)[row][column] = (int) data[row*width+column];
}
}
// Delete temporary memory
delete[] data;
}
I am trying to make a fast image threshold function. Currently what I do is:
void threshold(const cv::Mat &input, cv::Mat &output, uchar threshold) {
int rows = input.rows;
int cols = input.cols;
// cv::Mat for result
output.create(rows, cols, CV_8U);
if(input.isContinuous()) { //we have to make sure that we are dealing with a continues memory chunk
const uchar* p;
for (int r = 0; r < rows; ++r) {
p = input.ptr<uchar>(r);
for (int c = 0; c < cols; ++c) {
if(p[c] >= threshold)
//how to access output faster??
output.at<uchar>(r,c) = 255;
else
output.at<uchar>(r,c) = 0;
}
}
}
}
I know that the at() function is quite slow. How can I set the output faster, or in other words how to relate the pointer which I get from the input to the output?
You are thinking of at as the C++ standard library documents it for a few containers, performing a range check and throwing if out of bounds, however this is not the standard library but OpenCV.
According to the cv::Mat::at documentation:
The template methods return a reference to the specified array element. For the sake of higher performance, the index range checks are only performed in the Debug configuration.
So there's no range check as you may be thinking.
Comparing both cv::Mat::at and cv::Mat::ptr in the source code we can see they are almost identical.
So cv::Mat::ptr<>(row) is as expensive as
return (_Tp*)(data + step.p[0] * y);
While cv::Mat::at<>(row, column) is as expensive as:
return ((_Tp*)(data + step.p[0] * i0))[i1];
You might want to take cv::Mat::ptr directly instead of calling cv::Mat::at every column to avoid further repetition of the data + step.p[0] * i0 operation, doing [i1] by yourself.
So you would do:
/* output.create and stuff */
const uchar* p, o;
for (int r = 0; r < rows; ++r) {
p = input.ptr<uchar>(r);
o = output.ptr<uchar>(r); // <-----
for (int c = 0; c < cols; ++c) {
if(p[c] >= threshold)
o[c] = 255;
else
o[c] = 0;
}
}
As a side note you don't and shouldn't check for cv::Mat::isContinuous here, the gaps are from one row to another, you are taking pointers to a single row, so you don't need to deal with the matrix gaps.
I need to read an image with OpenCV, get its size and send it to a server so it processes the image and give it back to me the extracted features.
I have been thinking of using a vector<byte>, but I don't understand how to copy the data to a cv::Mat. I wan't it to be fast so I am trying to access the data with a pointer but I have a runtime exception. I have something like this.
Mat image = imread((path + "name.jpg"), 0);
vector<byte> v_char;
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
v_char.push_back(*(uchar*)(image.data+ i + j));
}
}
Which is the best approach for this task?
Direct access is a good idea as it is the fastest for OpenCV, but you are missing the step and that is probably the reason why your program breaks. The next line is wrong:
v_char.push_back(*(uchar*)(image.data+ i + j));
You don't have to increment i, you have to increment i + image.step. It will be this way:
Mat image = imread((path + "name.jpg"), 0);
vector<byte> v_char;
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
v_char.push_back(*(uchar*)(image.data+ i*image.step + j));
}
}
You have received great answers so far, but this is not your main problem. What you probably want to do before sending an image to a server is to compress it.
So, take a look at cv::imencode() on how to compress it, and cv::imdecode() to transform it back to an OpenCV matrix in the server. just push the imencode ouptut to a socket and you're done.
Improving on Jav_Rock's answer here's how I would do it.
Mat image = ...;
vector<byte> v_char(image.rows * image.cols);
for (int i = 0; i < image.rows; i++)
memcpy(&v_char[i * image.cols], image.data + i * image.step, image.cols);
EDIT: Initialization by constructor will allocate enough space to avoid extra reallocation, but it will also set all items in the vector to default value (0). The following code avoids this extra initialization.
Mat image = ...;
vector<byte> v_char;
v_char.reserve(image.rows * image.cols);
for (int i = 0; i < image.rows; i++)
{
int segment_start = image.data + i * image.step;
v_char.insert(v_char.end(), segment_start, segment_start + image.cols);
}
I don't understand completely why you need to use a vector, but if it's really necessary I recommend you to do a simple memcpy:
vector<byte> v_char(image.width * image.height); //Allocating the vector with the same size of the matrix
memcpy(v_char.data(), image.data, v_char.size() * sizeof(byte));
I have to get information about the scalar value of a lot of pixels on a gray-scale image using OpenCV. It will be traversing hundreds of thousands of pixels so I need the fastest possible method. Every other source I've found online has been very cryptic and hard to understand. Is there a simple line of code that should just hand a simple integer value representing the scalar value of the first channel (brightness) of the image?
for (int row=0;row<image.height;row++) {
unsigned char *data = image.ptr(row);
for (int col=0;col<image.width;col++) {
// then use *data for the pixel value, assuming you know the order, RGB etc
// Note 'rgb' is actually stored B,G,R
blue= *data++;
green = *data++;
red = *data++;
}
}
You need to get the data pointer on each new row because opencv will pad the data to 32bit boundary at the start of each row
With regards to Martin's post, you can actually check if the memory is allocated continuously using the isContinuous() method in OpenCV's Mat object. The following is a common idiom for ensuring the outer loop only loops once if possible:
#include <opencv2/core/core.hpp>
using namespace cv;
int main(void)
{
Mat img = imread("test.jpg");
int rows = img.rows;
int cols = img.cols;
if (img.isContinuous())
{
cols = rows * cols; // Loop over all pixels as 1D array.
rows = 1;
}
for (int i = 0; i < rows; i++)
{
Vec3b *ptr = img.ptr<Vec3b>(i);
for (int j = 0; j < cols; j++)
{
Vec3b pixel = ptr[j];
}
}
return 0;
}