Performance reading binary files - c++

I have a program that reads from a really big binary file (48 MB) and then passes the data to a matrix of custom structs named pixel:
struct pixel {
int r;
int g;
int b;
};
Opening the file:
ifstream myFile(inputPath, ios::binary);
pixel **matrixPixel;
The read of the file is done this way:
int position = 0;
for (int i = 0; i < HEIGHT; ++i) {
for (int j = 0; j < WIDTH; ++j) {
if (!myFile.eof()) {
myFile.seekg(position, ios::beg);
myFile.read((char *) &matrixPixel[i][j].r, 1); // red byte
myFile.seekg(position + HEIGHT * WIDTH, ios::beg);
myFile.read((char *) &matrixPixel[i][j].g, 1); // green byte
myFile.seekg(position + HEIGHT * WIDTH * 2, ios::beg);
myFile.read((char *) &matrixPixel[i][j].b, 1); // blue byte
++position;
}
}
}
myFile.close();
The thing is that, for a big file like the one at the beginning, it takes a lot of time (~7 min) and it's supposed to be optimized. How could I read from the file in less time?

So, the structure of the data you're storing in memory looks like this:
rgbrgbrgbrgbrgbrgbrgbrgbrgbrgb..............rgb
But the structure of the file you're reading looks like this (assuming your code's logic is correct):
rrrrrrrrrrrrrrrrrrrrrrrrrrr....
ggggggggggggggggggggggggggg....
bbbbbbbbbbbbbbbbbbbbbbbbbbb....
And in your code, you're translating between the two. Fundamentally, that's going to be slow. And what's more, you've chosen to read the file by making manual seeks to arbitrary points in the file. That's going to slow things down even more.
The first thing you can do is streamline the Hard Disk reads:
for(int channel = 0; channel < 3; channel++) {
for (int i = 0; i < HEIGHT; ++i) {
for (int j = 0; j < WIDTH; ++j) {
if (!myFile.eof()) {
switch(channel) {
case 0: myFile.read((char *) &matrixPixel[i][j].r, 1); break;
case 1: myFile.read((char *) &matrixPixel[i][j].g, 1); break;
case 2: myFile.read((char *) &matrixPixel[i][j].b, 1); break;
}
}
}
}
}
That requires the fewest changes to your code, and will speed up your code, but the code will probably still be slow.
A better approach, which increases CPU use but dramatically reduces Hard Disk use (which, in the vast majority of applications, will result in a speed-up), would be to store the data like so:
std::vector<unsigned char> reds(WIDTH * HEIGHT);
std::vector<unsigned char> greens(WIDTH * HEIGHT);
std::vector<unsigned char> blues(WIDTH * HEIGHT);
myFile.read(reds.data(), WIDTH * HEIGHT); //Stream can be checked for errors resulting from EOF or other issues.
myFile.read(greens.data(), WIDTH * HEIGHT);
myFile.read(blues.data(), WIDTH * HEIGHT);
std::vector<pixel> pixels(WIDTH * HEIGHT);
for(size_t index = 0; index < WIDTH * HEIGHT; index++) {
pixels[index].r = reds[index];
pixels[index].g = greens[index];
pixels[index].b = blues[index];
}
The final, best approach, is to change how the binary file is formatted, because the way it appears to be formatted is insane (from a performance perspective). If the file is reformatted to the rgbrgbrgbrgbrgb style (which is far more standard in the industry), your code simply becomes this:
struct pixel {
unsigned char red, green, blue;
}; //You'll never read values above 255 when doing byte-length color values.
std::vector<pixel> pixels(WIDTH * HEIGHT);
myFile.read(reinterpret_cast<char*>(pixels.data()), WIDTH * HEIGHT * 3);
This is extremely short, and is probably going to outperform all the other methods. But of course, that may not be an option for you.
I haven't tested any of these methods (and there may be a typo or two) but all of these methods should be faster than what you're currently doing.

A faster method would be to read the bitmap into a buffer:
uint8_t buffer[HEIGHT][WIDTH];
const unsigned int bitmap_size_in_bytes = sizeof(buffer);
myFile.read(buffer, bitmap_size_in_bytes);
An even faster method is to read more than one bitmap into memory.

Related

Calculating the vertical gradient of 2D image causes strange output

I want to apply a simple derive/gradient filter, [-1, 0, 1], to an image from a .ppm file.
The raw binary data from the .ppm file is read into a one-dimensional array:
uint8_t* raw_image_data;
size_t n_rows, n_cols, depth;
// Open the file as an input binary file
std::ifstream file;
file.open("test_image.ppm", std::ios::in | std::ios::binary);
if (!file.is_open()) { /* error */ }
std::string temp_line;
// Check that it's a valid P6 file
if (!(std::getline(file, temp_line) && temp_line == "P6")) {}
// Then skip all the comments (lines that begin with a #)
while (std::getline(file, temp_line) && temp_line.at(0) == '#');
// Try read in the info about the number of rows and columns
try {
n_rows = std::stoi(temp_line.substr(0, temp_line.find(' ')));
n_cols = std::stoi(temp_line.substr(temp_line.find(' ')+1,temp_line.size()));
std::getline(file, temp_line);
depth = std::stoi(temp_line);
} catch (const std::invalid_argument & e) { /* stoi has failed */}
// Allocate memory and read in all image data from ppm
raw_image_data = new uint8_t[n_rows*n_cols*3];
file.read((char*)raw_image_data, n_rows*n_cols*3);
file.close();
I then read a grayscale image from the data into a two-dimensional array, called image_grayscale:
uint8_t** image_grayscale;
image_grayscale = new uint8_t*[n_rows];
for (size_t i = 0; i < n_rows; ++i) {
image_grayscale[i] = new uint8_t[n_cols];
}
// Convert linear array of raw image data to 2d grayscale image
size_t counter = 0;
for (size_t r = 0; r < n_rows; ++r) {
for (size_t c = 0; c < n_cols; ++c) {
image_grayscale[r][c] = 0.21*raw_image_data[counter]
+ 0.72*raw_image_data[counter+1]
+ 0.07*raw_image_data[counter+2];
counter += 3;
}
}
I want to write the resulting filtered image to another two-dimensional array, gradient_magnitude:
uint32_t** gradient_magnitude;
// Allocate memory
gradient_magnitude = new uint32_t*[n_rows];
for (size_t i = 0; i < n_rows; ++i) {
gradient_magnitude[i] = new uint32_t[n_cols];
}
// Filtering operation
int32_t grad_h, grad_v;
for (int r = 1; r < n_rows-1; ++r) {
for (int c = 1; c < n_cols-1; ++c) {
grad_h = image_grayscale[r][c+1] - image_grayscale[r][c-1];
grad_v = image_grayscale[r+1][c] - image_grayscale[r-1][c];
gradient_magnitude[r][c] = std::sqrt(pow(grad_h, 2) + pow(grad_v, 2));
}
}
Finally, I write the filtered image to a .ppm output.
std::ofstream out;
out.open("output.ppm", std::ios::out | std::ios::binary);
// ppm header
out << "P6\n" << n_rows << " " << n_cols << "\n" << "255\n";
// Write data to file
for (int r = 0; r < n_rows; ++r) {
for (int c = 0; c < n_cols; ++c) {
for (int i = 0; i < 3; ++i) {
out.write((char*) &gradient_magnitude[r][c],1);
}
}
}
out.close();
The output image, however, is a mess.
When I simply set grad_v = 0; in the loop (i.e. solely calculate the horizontal gradient), the output is seemingly correct:
When I instead set grad_h = 0; (i.e. solely calculate the vertical gradient), the output is strange:
It seems like part of the image has been circularly shifted, but I cannot understand why. Moreover, I have tried with many images and the same issue occurs.
Can anyone see any issues? Thanks so much!
Ok, first clue is that the image looks circularly shifted. This hints that strides are wrong. The core of your problem is simple:
n_rows = std::stoi(temp_line.substr(0, temp_line.find(' ')));
n_cols = std::stoi(temp_line.substr(temp_line.find(' ')+1,temp_line.size()));
but in the documentation you can read:
Each PPM image consists of the following:
A "magic number" for identifying the file type. A ppm image's magic number is the two
characters "P6".
Whitespace (blanks, TABs, CRs, LFs).
A width, formatted as ASCII characters in decimal.
Whitespace.
A height, again in ASCII decimal.
[...]
Width is columns, height is rows. So that's the classical error that you get when implementing image processing stuff: swapping rows and columns.
From a didactic point of view, why are you doing this mistake? My guess: poor debugging tools. After making a working example from your question (effort that I would have saved if you had provided a MCVE), I run to the end of image loading and used Image Watch to see the content of your image with #mem(raw_image_data, UINT8, 3, n_cols, n_rows, n_cols*3). Result:
Ok, let's try to swap them: #mem(raw_image_data, UINT8, 3, n_rows, n_cols, n_rows*3). Result:
Much better. Unfortunately I don't know how to specify RGB instead of BGR in Image Watch #mem pseudo command, so the wrong colors.
Then we come back to your code: please compile with all warnings on. Then I'd use more of the std::stream features for parsing your input and less std::stoi() or find(). Avoid memory allocation by using std::vector and make a (possibly template) class for images. Even if you stick to your pointer to pointer, don't make multiple new for each row: make a single new for the pointer at row 0, and have the other pointers point to it:
uint8_t** image_grayscale = new uint8_t*[n_rows];
image_grayscale[0] = new uint8_t[n_rows*n_cols];
for (size_t i = 1; i < n_rows; ++i) {
image_grayscale[i] = image_grayscale[i - 1] + n_cols;
}
Same effect, but easier to deallocate and to manage as a single piece of memory. For example, saving as a PGM becomes:
{
std::ofstream out("output.pgm", std::ios::binary);
out << "P5\n" << n_rows << " " << n_cols << "\n" << "255\n";
out.write(reinterpret_cast<char*>(image_grayscale[0]), n_rows*n_cols);
}
Fill your borders! Using the single allocation style I showed you you can do it as:
uint32_t** gradient_magnitude = new uint32_t*[n_rows];
gradient_magnitude[0] = new uint32_t[n_rows*n_cols];
for (size_t i = 1; i < n_rows; ++i) {
gradient_magnitude[i] = gradient_magnitude[i - 1] + n_cols;
}
std::fill_n(gradient_magnitude[0], n_rows*n_cols, 0);
Finally the gradient magnitude is an integer value between 0 and 360 (you used a uint32_t). Then you save only the least significant byte of it! Of course it's wrong. You need to map from [0,360] to [0,255]. How? You can saturate (if greater than 255 set to 255) or apply a linear scaling (*255/360). Of course you can do also other things, but it's not important.
Here you can see the result on a zoomed version of the three cases: saturate, scale, only LSB (wrong):
With the wrong version you see dark pixels where the value should be higer than 255.

C++ External bubble sort

For my programming project, I am supposed to write a program that sorts integers on a disk (i.e. offline sort). I am first supposed to generate some random integers, write all of them, and read two of those integers for swapping, and write them back to the disk, and repeat those steps until the numbers are sorted. I am able to generate random numbers just fine, and had no problem opening the file, but it crashes when an attempt is made to write to the file. Here is the fragment of the code for the program that I used to implement the sort, that contains the sorting algorithm, and the segment of the code that crashes my program:
void ReadAndWrite(int & rand_ints, int & pos_ints)
{
int rand_ints2 = 0;
GetNumber(pos_ints);
srand(time(0));
fstream finout;
finout.open(SORTFILE, ios::binary | ios::in | ios::out);
if (finout.is_open())
{
for (int i = 0; i < pos_ints; ++i)
{
rand_ints = rand() % 5;
finout.write(reinterpret_cast <char *>(rand_ints), sizeof(int) * 1);
}
for (int i = 0; i < pos_ints; ++i)
{
finout.seekg(ios::beg);
finout.read(reinterpret_cast <char *>(rand_ints), sizeof(int) * 2);
bubbleSort(&rand_ints, 2, sizeof(int), compare_ints);
finout.seekp(ios::app);
finout.write(reinterpret_cast <char *>(rand_ints), sizeof(int) * 2);
}
finout.close();
}
else
{
cout << "File not opened!" << endl;
}
}
void GetNumber(int & pos_ints)
{
cout << "Enter a positive number: ";
cin >> pos_ints;
}
void bubbleSort(void * base, size_t num, size_t width, int(*compar) (const void *, const void *))
{
bool done = false;//Allows us to enter loop first time
int hi = num - 1;//largest index is 1 less than the size of the array
while (!done)
{
done = true;//assume the list is sorted
for (int i = 0; i<hi; i++)
{
//pass thru the array up to 'hi'
//if (arr[i+1]<arr[i])
if (compar((void *)(((char *)base) + width*(i + 1)), (void *)(((char *)base) + width*(i))) < 0)
{
//if any pair are out of order
done = false;//the list is not sorted
//int temp = arr[i];//swap them
void * tempp = (void *) new char[width];
memcpy_s(tempp, width, (((char *)base) + width*(i)), width);
//arr[i] = arr[i+1];
memcpy_s((((char *)base) + width*(i)), width, ((char *)base) + width*(i + 1), width);
//arr[i+1]=temp;
memcpy_s(((char *)base) + width*(i + 1), width, tempp, width);
delete[] tempp;
}
}
hi--;//at the end of a pass, largest item in that pass is in proper place; no need to go this far next time
}
}
int compare_ints(const void * arg1, const void * arg2)
{
int return_value = 0;
if (*(int *)arg1 < *(int *)arg2)
return_value = -1;
else if (*(int *)arg1 > *(int *)arg2)
return_value = 1;
return return_value;
}
It crashes on the line of code finout.write(reinterpret_cast (rand_ints), sizeof(int) * 1); within the first for loop (line 52), with the following error: Exception thrown at 0x55916D16 (msvcp140d.dll) in ExternalSort.exe: 0xC0000005: Access violation reading location 0x00000001.
Is there a way to fix this error and make this sorting program work? Tried everything I could have possibly tried, and I can't see a line of my code to cause the program to crash or cause other problems.
Since rand_ints is of type int, you probably mean reinterpret_cast<char *>(&rand_ints) (note the &). Otherwise, you'll be making a pointer out of an integral value.
OTOH, trying to read two adjacent integers into the address of a single integer variable is very likely to cause problems.
Looking more deeply into your sorting algorithm, it seems to me that you attempted to generalize it for data elements of any size, and not just ints. However, it is still clearly array-oriented; if you wanted to deal with files, you probably have to pass the function either a filename or a fstream reference.
Also, unless you're required to use Bubble Sort, I'd strongly advise you against it, especially for on-disk sorting, unless you make sure your data set is very, very small (say, no more than a hundred numbers). For in-place sorting, I'd advise you to use Quick Sort.

PGM File Reader Doesn't Read Asymmetric Files

I'm writing a simple PGM file reader for a basic CV idea, and I'm having a weird issue. My method seems to work alright for symmetric files (255 x 255, for example), but when I try to read an asymmetric file (300 x 246), I get some weird input. One file reads to a certain point and then dumps ESCAPE characters (ASCII 27) into the remainder of the image (see below), and others just won't read. I think this might be some flawed logic or a memory issue. Any help would be appreciated.
// Process files of binary type (P5)
else if(holdString[1] == '5') {
// Assign fileType value
fileType = 5;
// Read in comments and discard
getline(fileIN, holdString);
// Read in image Width value
fileIN >> width;
// Read in image Height value
fileIN >> height;
// Read in Maximum Grayscale Value
fileIN >> max;
// Determine byte size if Maximum value is over 256 (1 byte)
if(max < 256) {
// Collection variable for bytes
char readChar;
// Assign image dynamic memory
*image = new int*[height];
for(int index = 0; index < height; index++) {
(*image)[index] = new int[width];
}
// Read in 1 byte at a time
for(int row = 0; row < height; row++) {
for(int column = 0; column < width; column++) {
fileIN.get(readChar);
(*image)[row][column] = (int) readChar;
}
}
// Close the file
fileIN.close();
} else {
// Assign image dynamic memory
// Read in 2 bytes at a time
// Close the file
}
}
Tinkered with it a bit, and came up with at least most of a solution. Using the .read() function, I was able to draw the whole file in and then read it piece by piece into the int array. I kept the dynamic memory because I wanted to draw in files of different sizes, but I did pay more attention to how it was read into the array, so thank you for the suggestion, Mark. The edits seem to work well on files up to 1000 pixels wide or tall, which is fine for what I'm using it for. After, it distorts, but I'll still take that over not reading the file.
if(max < 256) {
// Collection variable for bytes
int size = height * width;
unsigned char* data = new unsigned char[size];
// Assign image dynamic memory
*image = new int*[height];
for(int index = 0; index < height; index++) {
(*image)[index] = new int[width];
}
// Read in 1 byte at a time
fileIN.read(reinterpret_cast<char*>(data), size * sizeof(unsigned char));
// Close the file
fileIN.close();
// Set data to the image
for(int row = 0; row < height; row++) {
for(int column = 0; column < width; column++) {
(*image)[row][column] = (int) data[row*width+column];
}
}
// Delete temporary memory
delete[] data;
}

Efficient way to loop through pixels of 16-bit Mat in OpenCV

I'm trying to make very simple (LUT-like) operations on a 16-bit gray-scale OpenCV Mat, which is efficient and doesn't slow down the debugger.
While there is a very detailed page in the documentation addressing exactly this issue, it fails to point out that most of those methods are only available on 8-bit images (including the perfect, optimized LUT function).
I tried the following methods:
uchar* p = mat_depth.data;
for (unsigned int i = 0; i < depth_width * depth_height * sizeof(unsigned short); ++i)
{
*p = ...;
*p++;
}
Really fast, unfortunately only supporting uchart (just like LUT).
int i = 0;
for (int row = 0; row < depth_height; row++)
{
for (int col = 0; col < depth_width; col++)
{
i = mat_depth.at<short>(row, col);
i = ..
mat_depth.at<short>(row, col) = i;
}
}
Adapted from this answer: https://stackoverflow.com/a/27225293/518169. Didn't work for me, and it was very slow.
cv::MatIterator_<ushort> it, end;
for (it = mat_depth.begin<ushort>(), end = mat_depth.end<ushort>(); it != end; ++it)
{
*it = ...;
}
Works well, however it uses a lot of CPU and makes the debugger super slow.
This answer https://stackoverflow.com/a/27099697/518169 points out to the source code of the built-in LUT function, however it only mentions advanced optimization techniques, like IPP and OpenCL.
What I'm looking for is a very simple loop like the first code, but for ushorts.
What method do you recommend for solving this problem? I'm not looking for extreme optimization, just something on par with the performance of the single-for-loop on .data.
I implemented Michael's and Kornel's suggestion and benchmarked them both in release and debug modes.
code:
cv::Mat LUT_16(cv::Mat &mat, ushort table[])
{
int limit = mat.rows * mat.cols;
ushort* p = mat.ptr<ushort>(0);
for (int i = 0; i < limit; ++i)
{
p[i] = table[p[i]];
}
return mat;
}
cv::Mat LUT_16_reinterpret_cast(cv::Mat &mat, ushort table[])
{
int limit = mat.rows * mat.cols;
ushort* ptr = reinterpret_cast<ushort*>(mat.data);
for (int i = 0; i < limit; i++, ptr++)
{
*ptr = table[*ptr];
}
return mat;
}
cv::Mat LUT_16_if(cv::Mat &mat)
{
int limit = mat.rows * mat.cols;
ushort* ptr = reinterpret_cast<ushort*>(mat.data);
for (int i = 0; i < limit; i++, ptr++)
{
if (*ptr == 0){
*ptr = 65535;
}
else{
*ptr *= 100;
}
}
return mat;
}
ushort* tablegen_zero()
{
static ushort table[65536];
for (int i = 0; i < 65536; ++i)
{
if (i == 0)
{
table[i] = 65535;
}
else
{
table[i] = i;
}
}
return table;
}
The results are the following (release/debug):
LUT_16: 0.202 ms / 0.773 ms
LUT_16_reinterpret_cast: 0.184 ms / 0.801 ms
LUT_16_if: 0.249 ms / 0.860 ms
So the conclusion is that reinterpret_cast is the faster by 9% in release mode, while the ptr one is faster by 4% in debug mode.
It's also interesting to see that directly calling the if function instead of applying a LUT only makes it slower by 0.065 ms.
Specs: streaming 640x480x16-bit grayscale image, Visual Studio 2013, i7 4750HQ.
OpenCV implementation is based on polymorphism and runtime dispatching over templates. In OpenCV version the use of templates is limited to a fixed set of primitive data types. That is, array elements should have one of the following types:
8-bit unsigned integer (uchar)
8-bit signed integer (schar)
16-bit unsigned integer (ushort)
16-bit signed integer (short)
32-bit signed integer (int)
32-bit floating-point number (float)
64-bit floating-point number (double)
a tuple of several elements where all elements have the same type (one of the above).
In case your cv::Mat is continues you can use pointer arithmetics to go through the whole data pointer and you should only use the appropriate pointer type to your cv::Mat.
Furthermore, keep it mind that cv::Mats are not always continuous (it can be a ROI, padded, or created from pixel pointer) and iterating over them with pointers will crash.
An example loop:
cv::Mat cvmat16sc1 = cv::Mat::eye(10, 10, CV_16SC1);
if (cvmat16sc1.data)
{
if (!cvmat16sc1.isContinuous())
{
cvmat16sc1 = cvmat16sc1.clone();
}
short* ptr = reinterpret_cast<short*>(cvmat16sc1.data);
for (int i = 0; i < cvmat16sc1.cols * cvmat16sc1.rows; i++, ptr++)
{
if (*ptr == 1)
std::cout << i << ": " << *ptr << std::endl;
}
}
Best solution for your problem is already written in the tutorial that you mentioned, in the chapter named "The efficient way". All you need is to replace every instance of uchar with ushort. No other changes are needed.

Store a cv::Mat in a byte array for data transfer to a server

I need to read an image with OpenCV, get its size and send it to a server so it processes the image and give it back to me the extracted features.
I have been thinking of using a vector<byte>, but I don't understand how to copy the data to a cv::Mat. I wan't it to be fast so I am trying to access the data with a pointer but I have a runtime exception. I have something like this.
Mat image = imread((path + "name.jpg"), 0);
vector<byte> v_char;
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
v_char.push_back(*(uchar*)(image.data+ i + j));
}
}
Which is the best approach for this task?
Direct access is a good idea as it is the fastest for OpenCV, but you are missing the step and that is probably the reason why your program breaks. The next line is wrong:
v_char.push_back(*(uchar*)(image.data+ i + j));
You don't have to increment i, you have to increment i + image.step. It will be this way:
Mat image = imread((path + "name.jpg"), 0);
vector<byte> v_char;
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
v_char.push_back(*(uchar*)(image.data+ i*image.step + j));
}
}
You have received great answers so far, but this is not your main problem. What you probably want to do before sending an image to a server is to compress it.
So, take a look at cv::imencode() on how to compress it, and cv::imdecode() to transform it back to an OpenCV matrix in the server. just push the imencode ouptut to a socket and you're done.
Improving on Jav_Rock's answer here's how I would do it.
Mat image = ...;
vector<byte> v_char(image.rows * image.cols);
for (int i = 0; i < image.rows; i++)
memcpy(&v_char[i * image.cols], image.data + i * image.step, image.cols);
EDIT: Initialization by constructor will allocate enough space to avoid extra reallocation, but it will also set all items in the vector to default value (0). The following code avoids this extra initialization.
Mat image = ...;
vector<byte> v_char;
v_char.reserve(image.rows * image.cols);
for (int i = 0; i < image.rows; i++)
{
int segment_start = image.data + i * image.step;
v_char.insert(v_char.end(), segment_start, segment_start + image.cols);
}
I don't understand completely why you need to use a vector, but if it's really necessary I recommend you to do a simple memcpy:
vector<byte> v_char(image.width * image.height); //Allocating the vector with the same size of the matrix
memcpy(v_char.data(), image.data, v_char.size() * sizeof(byte));