I am trying to read a large file (~5GB) using ifstream in C++.
Since I'm on a 64bit OS, I thought this shouldn't be a problem.
Still, I get a segfault. Everything runs fine with smaller files,
so I'm pretty sure that is where the problem is.
I'm using g++ (4.4.5-8) and libstdc++6 (4.4.5-8).
Thanks.
The code looks like this:
void load (const std::string &path, int _dim, int skip = 0, int gap = 0) {
std::ifstream is(path.c_str(), std::ios::binary);
BOOST_VERIFY(is);
is.seekg(0, std::ios::end);
size_t size = is.tellg();
size -= skip;
long int line = sizeof(float) * _dim + gap;
BOOST_VERIFY(size % line == 0);
long int _N = size / line;
reset(_dim, _N);
is.seekg(skip, std::ios::beg);
char *off = dims;
for (long int i = 0; i < N; ++i) {
is.read(off, sizeof(T) * dim);
is.seekg(gap, std::ios::cur);
off += stride;
}
BOOST_VERIFY(is);
}
The segfault is in the is.read line for i=187664.
T is float and I'm reading dim=1000 floats at a time.
When the segfault occures, i * stride is way smaller than size, so I'm not running past the end of the file.
dims is allocated here
void reset (int _dim, int _N)
{
BOOST_ASSERT((ALIGN % sizeof(T)) == 0);
dim = _dim;
N = _N;
stride = dim * sizeof(T) + ALIGN - 1;
stride = stride / ALIGN * ALIGN;
if (dims != NULL) delete[] dims;
dims = (char *)memalign(ALIGN, N * stride);
std::fill(dims, dims + N * stride, 0);
}
I don't know if this is the bug, but this code looks very C like and plenty of opportunity to leak. Any way try changing
void reset (int _dim, int _N)
to
void reset (size_t dim, size_t _N)
//I would avoid using leading underscores that is usually used to identify elements of the standard library.
When you are dealing with the size or index of something in memory ALWAYS use size_t, it is guaranteed to be able to hold the maximum size of an object including arrays.
I think you have to use _ftelli64 etc... to have the right size of your file, and to use long long (or _int64) variables to manage it. But it's C library. I don't find how to use ifstream with so big file (actualy > 2Go). Did you find the way ?
PS : In your case size_t is fine, but I'm not sure that's OK with 32-bit software. I'm sure it's OK with 64-bit.
int main()
{
string name="tstFile.bin";
FILE *inFile,*inFile2;
fopen_s(&inFile,name.c_str(),"rb");
if (!inFile)
{
cout<<"\r\n***error -> File not found\r\n";
return 0;
}
_fseeki64 (inFile,0L,SEEK_END);
long long fileLength = _ftelli64(inFile);
_fseeki64 (inFile,0L,SEEK_SET);
cout<<"file lg : "<<fileLength<<endl;
return 1;
}
Related
I'm a newbie for GPU programming using Cuda toolkit, and I have to write some code offering the functionality as I mentioned in the title.
I'd like to paste the code to show what exactly I want to do.
void CTrtModelWrapper::forward(void **bindings,
unsigned height,
unsigned width,
short channel,
ColorSpaceFmt colorFmt,
PixelDataType pixelType) {
uint16_t *devInRawBuffer_ptr = (uint16_t *) bindings[0];
uint16_t *devOutRawBuffer_ptr = (uint16_t *) bindings[1];
const unsigned short bit = 16;
float *devInputBuffer_ptr = nullptr;
float *devOutputBuffer_ptr = nullptr;
unsigned volume = height * width * channel;
common::cudaCheck(cudaMalloc((void **) &devInputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
common::cudaCheck(cudaMalloc((void **) &devOutputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
unsigned short npos = 0;
switch (pixelType) {
case PixelDataType::PDT_INT8: // high 8bit
npos = bit - 8;
break;
case PixelDataType::PDT_INT10: // high 10bit
npos = bit - 10;
break;
default:
break;
}
switch (colorFmt) {
case CFMT_RGB: {
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); // SEGMENTATION Fault at this line
}
}
break;
default:
break;
}
void *rtBindings[2] = {devInputBuffer_ptr, devOutputBuffer_ptr};
// forward
this->_forward(rtBindings);
// convert output
unsigned short ef_bit = bit - npos;
switch (colorFmt) {
case CFMT_RGB: {
for (unsigned i = 0; i < volume; ++i) {
devOutRawBuffer_ptr[i] = clip< uint16_t >((uint16_t) devOutputBuffer_ptr[i],
0,
(uint16_t) pow(2, ef_bit)) << npos;
}
}
break;
default:
break;
}
}
bindings is a pointer to an array, the 1st element in the array is a device pointer that points to a buffer allocated using cudaMalloc on the gpu, each element in the buffer is a 16bit integer.the 2nd one the same, used to store the output data.
height,width,channel,colorFmt(RGB here),pixelType(PDT_INT8, aka 8bit) respective to the image height, width,channel number, colorspace, bits to store one pixel value.
the _forward function requires a pointer to an array, similar to bindings except that each element in the buffer should be a 32bit float number.
so I make some transformation using a loop
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); // SEGMENTATION Fault at this line
}
the >> operation is because the actual 8bit data is stored in the high 8 bit.
SEGMENTATION FAULT occurred at this line of code devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); and i equals 0.
I try to separate this code into several line:
uint16_t value = devInRawBuffer_ptr[i];
float transferd = float(value >> npos);
devInputBuffer_ptr[i] = transferd;
and SEGMENTATION FAULT occurred at this line uint16_t value = devInRawBuffer_ptr[i];
I wonder that is this a valid way to assign value to an allocated gpu memory buffer?
PS: the buffer given in bindings are totally fine. they are from host memory using cudaMemcpy before the call to forward function, but I still paste the code below
nvinfer1::DataType type = nvinfer1::DataType::kHALF;
HostBuffer hostInputBuffer(volume, type);
DeviceBuffer deviceInputBuffer(volume, type);
HostBuffer hostOutputBuffer(volume, type);
DeviceBuffer deviceOutputBuffer(volume, type);
// HxWxC --> WxHxC
auto *hostInputDataBuffer = static_cast<unsigned short *>(hostInputBuffer.data());
for (unsigned w = 0; w < W; ++w) {
for (unsigned h = 0; h < H; ++h) {
for (unsigned c = 0; c < C; ++c) {
hostInputDataBuffer[w * H * C + h * C + c] = (unsigned short )(*(ppm.buffer.get() + h * W * C + w * C + c));
}
}
}
auto ret = cudaMemcpy(deviceInputBuffer.data(), hostInputBuffer.data(), volume * getElementSize(type),
cudaMemcpyHostToDevice);
if (ret != 0) {
std::cout << "CUDA failure: " << ret << std::endl;
return EXIT_FAILURE;
}
void *bindings[2] = {deviceInputBuffer.data(), deviceOutputBuffer.data()};
model->forward(bindings, H, W, C, sbsisr::ColorSpaceFmt::CFMT_RGB, sbsisr::PixelDataType::PDT_INT8);
In CUDA, it's generally not advisable to dereference a device pointer in host code. For example, you are creating a "device pointer" when you use cudaMalloc:
common::cudaCheck(cudaMalloc((void **) &devInputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
From the code you have posted, it's not possible to deduce that for devInRawBuffer_ptr but I'll assume it also is a device pointer.
In that case, to perform this operation:
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos);
}
You would launch a CUDA kernel, something like this:
// put this function definition at file scope
__global__ void shift_kernel(float *dst, uint16_t *src, size_t sz, unsigned short npos){
for (size_t idx = blockIdx.x*blockDim.x+threadIdx.x, idx < sz; idx += gridDim.x*blockDim.x) dst[idx] = (float)((src[idx]) >> npos);
}
// call it like this in your code:
kernel<<<160, 1024>>>(devInputBuffer_ptr, devInRawBuffer_ptr, volume, npos);
(coded in browser, not tested)
If you'd like to learn more about what's going on here, you may wish to study CUDA. For example, you can get most of the basic concepts here and by studying the CUDA sample code vectorAdd. The grid-stride loop is discussed here.
I have the most strange problem here... I'm using the same code(copy-paste) from Linux in Windows to READ and WRITE and BMP image. And from some reason in Linux every thing works perfectly fine, but when I'm coming to Windows 10 from some I can't open that images and I've receive an error message how said something like this:
"It looks like we don't support this file format."
Do you have any idea what should I do? I will put the code below.
EDIT:
I've solved the padding problem and now it's write the images but they are completely white, any idea why? I've update the code also.
struct BMP {
int width;
int height;
unsigned char header[54];
unsigned char *pixels;
int size;
int row_padded;
};
void writeBMP(string filename, BMP image) {
string fileName = "Output Files\\" + filename;
FILE *out = fopen(fileName.c_str(), "wb");
fwrite(image.header, sizeof(unsigned char), 54, out);
unsigned char tmp;
for (int i = 0; i < image.height; i++) {
for (int j = 0; j < image.width * 3; j += 3) {
// Convert (B, G, R) to (R, G, B)
tmp = image.pixels[j];
image.pixels[j] = image.pixels[j + 2];
image.pixels[j + 2] = tmp;
}
fwrite(image.pixels, sizeof(unsigned char), image.row_padded, out);
}
fclose(out);
}
BMP readBMP(string filename) {
BMP image;
string fileName = "Input Files\\" + filename;
FILE *f = fopen(fileName.c_str(), "rb");
if (f == NULL)
throw "Argument Exception";
fread(image.header, sizeof(unsigned char), 54, f); // read the 54-byte header
// extract image height and width from header
image.width = *(int *) &image.header[18];
image.height = *(int *) &image.header[22];
image.row_padded = (image.width * 3 + 3) & (~3);
image.pixels = new unsigned char[image.row_padded];
unsigned char tmp;
for (int i = 0; i < image.height; i++) {
fread(image.pixels, sizeof(unsigned char), image.row_padded, f);
for (int j = 0; j < image.width * 3; j += 3) {
// Convert (B, G, R) to (R, G, B)
tmp = image.pixels[j];
image.pixels[j] = image.pixels[j + 2];
image.pixels[j + 2] = tmp;
}
}
fclose(f);
return image;
}
In my point of view this code should be cross-platform... But it's not... why?
Thanks for help
Check the header
The header must start with the following two signature bytes: 0x42 0x4D. If it's something different a third party application will think that this file doesn't contain a bmp picture despite the .bmp file extension.
The size and the way pixels are stored is also a little bit more complex than what you expect: you assume that the number of bits per pixels is 24 and no no compression is used. This is not guaranteed. If it's not the case, you might read more data than available, and corrupt the file when writing it back.
Furthermore, the size of the header depends also on the BMP version you are using, which you can detect using the 4 byte integer at offset 14.
Improve your code
When you load a file, check the signature, the bmp version, the number of bits per pixel and the compression. For debugging purpose, consider dumping the header to check it manually:
for (int i=0; i<54; i++)
cout << hex << image.header[i] << " ";`
cout <<endl;
Furthermore, when you fread() check that the number of bytes read correspond to the size you wanted to read, so to be sure that you're not working with uninitialized buffer data.
Edit:
Having checked the dump, it appears that the format is as expected. But verifying the padded size in the header with the padded size that you have calculated it appears that the error is here:
image.row_padded = (image.width * 3 + 3) & (~3); // ok size of a single row rounded up to multiple of 4
image.pixels = new unsigned char[image.row_padded]; // oops ! A little short ?
In fact you read row by row, but you only keep the last one in memory ! This is different of your first version, where you did read the full pixels of the picture.
Similarly, you write the last row repeated height time.
Reconsider your padding, working with the total padded size.
image.row_padded = (image.width * 3 + 3) & (~3); // ok size of a single row rounded up to multiple of 4
image.size_padded = image.row_padded * image.height; // padded full size
image.pixels = new unsigned char[image.size_padded]; // yeah !
if (fread(image.pixels, sizeof(unsigned char), image.size_padded, f) != image.size_padded) {
cout << "Error: all bytes couldn't be read"<<endl;
}
else {
... // process the pixels as expected
}
...
I have a binary data file that contains 2d and 3d coordinates in such order:
uint32 numberOfUVvectors;
2Dvec uv[numberOfUVvectors];
uint32 numberOfPositionVectors;
3Dvec position[numberOfPositionVectors];
uint32 numberOfNormalVectors;
3Dvec normal[numberOfNormalVectors];
2Dvec and 3Dvec are structs composed from 2 and 3 floats respectively.
At first, I read all these values using the "usual" way:
in.read(reinterpret_cast<char *>(&num2d), sizeof(uint32));
2Dvectors.reserve(num2d); // It's for an std::vector<2DVec> 2Dvectors();
for (int i = 0; i < num2d; i++){
2Dvec 2Dvector;
in.read(reinterpret_cast<char *>(&2Dvector), sizeof(2DVec));
2Dvectors.push_back(2Dvector);
}
It worked fine, but it was painfully slow (there can be more than 200k entries in a file and with so many read calls, the hdd access became a bottleneck). I decided to read the entire file into a buffer at once:
in.seekg (0, in.end);
int length = in.tellg();
in.seekg (0, in.beg);
char * buffer = new char [length];
is.read (buffer,length);
The reading is way faster now, but here's the question: how to parse that char buffer back into integers and structs?
To answer your specific question:
unsigned char * pbuffer = (unsigned char *)buffer;
uint32 num2d = *((uint32 *)pbuffer);
pbuffer += sizeof(uint32);
if(num2d)
{
2Dvec * p2Dvec = (2Dvec *)pbuffer;
2Dvectors.assign(p2Dvec, p2Dvec + num2d);
pbuffer += (num2d * sizeof(2Dvec));
}
uint32 numpos = *((uint32 *)pbuffer);
pbuffer += sizeof(uint32);
if(numpos)
{
3Dvec * p3Dvec = (3Dvec *)pbuffer;
Pos3Dvectors.assign(p3Dvec, p3Dvec + numpos);
pbuffer += (numpos * sizeof(3Dvec));
}
uint32 numnorm = *((uint32 *)pbuffer);
pbuffer += sizeof(uint32);
if(numnorm)
{
3Dvec * p3Dvec = (3Dvec *)pbuffer;
Normal3Dvectors.assign(p3Dvec, p3Dvec + numnorm);
pbuffer += (numnorm * sizeof(3Dvec));
}
// do not forget to release the allocated buffer
A an even faster way would be:
in.read(reinterpret_cast<char *>(&num2d), sizeof(uint32));
if(num2d)
{
2Dvectors.resize(num2d);
2Dvec * p2Dvec = &2Dvectors[0];
in.read(reinterpret_cast<char *>(&p2Dvec), num2d * sizeof(2Dvec));
}
//repeat for position & normal vectors
Use memcpy with the appropriate sizes and start values
or cast the values (example):
#include <iostream>
void copy_array(void *a, void const *b, std::size_t size, int amount)
{
std::size_t bytes = size * amount;
for (int i = 0; i < bytes; ++i)
reinterpret_cast<char *>(a)[i] = static_cast<char const *>(b)[i];
}
int main()
{
int a[10], b[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
copy_array(a, b, sizeof(b[0]), 10);
for (int i = 0; i < 10; ++i)
std::cout << a[i] << ' ';
}
I have a block of memory with elements of fixed size, say 100 bytes, put into it one after another, all with the same fixed length, so memory looks like this
<element1(100 bytes)><element2(100 bytes)><element3(100 bytes)>...
In some situations I need to determine whether all bytes of a certain element are set to the 0-byte because that has a special meaning (I didn't say it was a good idea, but that is the situation I am in).
The question is, how do I do that efficiently. Further: is there a simple function to do it. For setting bytes to zero I can used memset or bzero, but I don't know of any function for checking for zero.
At the moment I am using a loop for the check
char *elementStart = memoryBlock + elementNr*fixedElementSize;
bool special = true;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special &= (*(elementStart+curByteNr)) == 0;
}
Of course, I could loop with a bigger offset and check several bytes at once with a mword or some other suited bigger type. And I guess that would be rather efficient, but I would like to know whether there is a function to take that burden from me.
Suggested functions:
!memcmp (compareBlock, myBlock, fixedElementSize)
You could perhaps actually use memcmp without having to allocate a zero-valued array, like this:
static int memvcmp(void *memory, unsigned char val, unsigned int size)
{
unsigned char *mm = (unsigned char*)memory;
return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0;
}
The standard for memcmp does not say anything about overlapping memory regions.
The obvious portable, high efficiency method is:
char testblock [fixedElementSize];
memset (testblock, 0, sizeof testblock);
if (!memcmp (testblock, memoryBlock + elementNr*fixedElementSize, fixedElementSize)
// block is all zero
else // a byte is non-zero
The library function memcmp() in most implementations will use the largest, most efficient unit size it can for the majority of comparisons.
For more efficiency, don't set testblock at runtime:
static const char testblock [100];
By definition, static variables are automatically initialized to zero unless there is an initializer.
I can't believe no one posted this yet... a solution that actually looks like C++ and isn't UB for breaking aliasing rules:
#include <algorithm> // std::all_of
#include <cstddef> // std::size_t
// You might only need this
bool
memory_is_all_zeroes(unsigned char const* const begin,
std::size_t const bytes)
{
return std::all_of( begin, begin + bytes,
[](unsigned char const byte) { return byte == 0; } );
}
// but here's this as a bonus
template<typename T_Element, std::size_t T_count>
bool
array_is_all_zeroes( T_Element const (& array)[T_count] )
{
auto const begin = reinterpret_cast<unsigned char const*>(array);
auto const bytes = T_count * sizeof(T_Element);
return memory_is_all_zeroes(begin, bytes);
}
int
main()
{
int const blah[1000]{0};
return !array_is_all_zeroes(blah);
}
This might not satisfy some people's assumptions about efficiency (which are just that, assumptions, until profiled), but I think being valid and idiomatic code are much in its favour.
AFAIK there is no automatically function to check memory.
You could use | to speed up the for-loop, no need for "=="
char *elementStart = memoryBlock + elementNr*fixedElementSize;
char special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special |= (*(elementStart+curByteNr));
}
and also can you use long for even more speed
char *elementStart = memoryBlock + elementNr*fixedElementSize;
long special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; curByteNr += sizeof(long) )
{
special |= *(long*)(elementStart+curByteNr);
}
WARNING: the above code is not tested. Please test it first so that the sizeof and casting operator works
I have tested some solutions proposed here and checked memcmp source code which is not optimized for the OP needs since it has an additional requirement to perform sorting, leading it to compare unsigned char one by one.
In the following, I propose an optimized function check_memory_zeroed which performs most of the check on the biggest aligned int available, making it portable, and I compare it with the other solutions proposed in this thread. Time measurement is performed and results printed.
It shows that the proposed solution is near twice better than wallyk's obvious portable high efficiency method and does not need to create an additional array, and six times better than char by char comparison or mihaif's shifted array which saves RAM compared to wallyk's one.
I have also tested my solution without aligning the words check_memory_zeroed_bigestint_not_aligned and surprisingly, it performs even better. If someone has an explanation, he is welcome.
Here is the code with functional and performance tests on a 1Gb table (the proposed optimized function is the fisrt one : check_memory_zeroed):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <assert.h>
#include <time.h>
#define BIG_TAB_SIZE 1000000000
typedef intmax_t biggestint;
int check_memory_zeroed (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
int bis = sizeof(biggestint);
char* pc = (char*) ptr;
biggestint* pbi0 = (biggestint*) pc;
if ((size_t) pc % bis) /* is aligned ? */
pbi0 = (biggestint*) (pc + (bis - ((size_t) pc % bis))); /* minimal pointer larger than ptr but aligned */
assert ((size_t) pbi0 % bis == 0); /* check that pbi0 is aligned */
for (char* p = pc; p < (char*) pbi0; p++)
if(*p) return 0; /* check beginning of non aligned array */
biggestint* pbi = pbi0;
biggestint* pbiUpper = ((biggestint*) (pc + size)) - 1;
for (;pbi <= pbiUpper; pbi++)
if(*pbi) return 0; /* check with the biggest int available most of the array : its aligned part */
for (char* p = (char*) pbi; p < pc + size; p++)
if(*p) return 0; /* check end of non aligned array */
return 1;
}
int check_memory_zeroed_bigestint_not_aligned (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
biggestint* pbi = (biggestint*) ptr;
biggestint* pbiUpper = ((biggestint*) (((char*) ptr) + size)) - 1;
for (;pbi <= pbiUpper; pbi++)
if(*pbi) return 0; /* check with the biggest int available most of the array, but without aligning it */
for (char* p = (char*) pbi; p < ((char*) ptr) + size; p++)
if(*p) return 0; /* check end of non aligned array */
return 1;
}
int check_memory_zeroed_by_char (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
for (char* p = (char*) ptr; p < ((char*) ptr) + size; p++)
if(*p) return 0;
return 1;
}
/* variant of wallyk solution */
int check_memory_zeroed_by_memcmp_and_testblock (void* ptr, size_t size)
{
void* testblock = malloc(size);
if (ptr == NULL || testblock == NULL) return -1;
memset (testblock, 0, sizeof(testblock));
int res = ! memcmp (testblock, ptr, size);
free (testblock);
return res;
}
/* variant of mihaif solution */
int check_memory_zeroed_by_memcmp_with_shifted_array (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
char* pc = (char*) ptr;
return (*pc) || memcmp(pc, pc + 1, size - 1);
}
int test() {
/* check_memory_zeroed (void* ptr, size_t size) */
char tab[16];
for (int i = 0; i < 8; i++)
for (int j = 0; j < 8; j++) {
for (int k = 0; k < 16; k++) tab[k] = (k >= i && k < 16 - j) ? 0 : 100 + k;
assert(check_memory_zeroed(tab + i, 16 - j - i));
if (i > 0) assert(tab[i-1] == 100 + i - 1);
if (j > 0) assert(tab[16 - j] == 100 + 16 - j);
for (int k = i; k < 16 - j; k++) {
tab[k] = 200+k;
assert(check_memory_zeroed(tab + i, 16 - j - i) == 0);
tab[k] = 0;
}
}
char* bigtab = malloc(BIG_TAB_SIZE);
clock_t t = clock();
printf ("Comparison of different solutions execution time for checking an array has all its values null\n");
assert(check_memory_zeroed(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed optimized : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_bigestint_not_aligned(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_bigestint_not_aligned : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_char(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_char : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_memcmp_and_testblock(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_memcmp_and_testblock by wallyk : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_memcmp_with_shifted_array(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
free (bigtab);
return 0;
}
int main(void) {
printf("Size of intmax_t = %lu\n", sizeof(intmax_t));
test();
return 0;
}
And the results for comparison of different solutions execution time for checking an array has all its values null:
Size of intmax_t = 8
check_memory_zeroed optimized : 0.331238 seconds
check_memory_zeroed_bigestint_not_aligned : 0.260504 seconds
check_memory_zeroed_by_char : 1.958392 seconds
check_memory_zeroed_by_memcmp_and_testblock by wallyk : 0.503189 seconds
check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : 2.012257 seconds
It is not possible to check all 100 bytes at the same time. So, you (or any utility functions) have to iterate through the data in any case. But, besides having a step size bigger than 1 byte, you could do some more optimizations: For example, you could break as soon as you find a non-zero value. Well, the time complexity would still be O(n), I know.
I can't recall a standard library function which could do this for you. If you are not sure this causes any performance issues I'd just use the loop, maybe replace char* with int* as already suggested.
If you do have to optimize you could unroll the loop:
bool allZeroes(char* buffer)
{
int* p = (int*)buffer; // you better make sure your block starts on int boundary
int acc = *p;
acc |= *++p;
acc |= *++p;
...
acc |= *++p; // as many times as needed
return acc == 0;
}
You may need to add special handling for the end of buffer if it's size is not a multiple of sizeof(int), but it could be more efficient to allocate a slightly larger block with some padding bytes set to 0.
If your blocks are large you could treat them as a sequence of smaller blocks and loop over them, using the code above for each small block.
I would be curious to know how this solution compares with std::upper_bound(begin,end,0) and memcmp.
EDIT
Did a quick check how a home-grown implementation compares with memcmp, used VS2010 for that.
In short:
1) in debug mode home-grown can be twice as fast as memcmp
2) in release with full optimization memcmp has an edge on the blocks which start with non-0s. As the length of the zero-filled preamble increases it starts losing, then somehow magically gets almost as fast as homegrown, about only 10% slower.
So depending on your data patterns and need/desire to optimize you could get some extra performance from rolling out your own method, but memcmp is a rather reasonable solution.
Will put the code and results on github in case you could use them.
The following will iterate through the memory of a structure.
Only disadvantage is that it does a bytewise check.
#include <iostream>
struct Data { int i; bool b; };
template<typename T>
bool IsAllZero(T const& data)
{
auto pStart = reinterpret_cast<const char*>(&data);
for (auto pData = pStart; pData < pStart+sizeof(T); ++pData)
{
if (*pData)
return false;
}
return true;
}
int main()
{
Data data1;// = {0}; // will most probably have some content
Data data2 = {0}; // all zeroes
std::cout << "data1: " << IsAllZero(data1) << "\ndata2: " << IsEmptyStruct(data2);
return 0;
};
What about using long int and binary or operator.
unsigned long long int *start, *current, *end, value = 0;
// set start,end
for(current = start; current!=end; current++) {
value |= *current;
}
bool AllZeros = !value;
Well if you just want to decide whether a single element is all 0s you can create a 100byte element with all 1s. Now when you want to check whether an element is all 0s just binary AND (&) the content of the element and the element you created(all 1s). now if the result of binary AND is zero the element you checked had all 0s otherwise it was not all 0s
the creation of a 100 byte element with all 1s seems costly but if you have a large number of elements to check then its actually better
you can create the 100 byte element with all 1s as void *elem; elem=malloc(100);
now set all bits to 1(use ~(elem&0))
How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.