I'm currently working on a project using the genuino 101 where i need to read large amounts of data trough i2c, to fill an arbitrarily sized buffer.from the following image i can see that the read requests themselves only take about 3 milliseconds and the write request about 200 nanoseconds.
however there is a very large time (750+ ms) between read transactions in the same block
#define RD_BUF_SIZE 32
void i2cRead(unsigned char device, unsigned char memory, int len, unsigned char * rdBuf)
{
ushort bytesRead = 0;
ushort _memstart = memory;
while (bytesRead < len)
{
Wire.beginTransmission((int)device);
Wire.write(_memstart);
Wire.endTransmission();
Wire.requestFrom((int)device, BLCK_SIZE);
int i = 0;
while (Wire.available())
{
rdBuf[bytesRead+i] = Wire.read();
i++;
}
bytesRead += BLCK_SIZE;
_memstart += BLCK_SIZE;
}
}
from my understanding this shouldn't take that long, unless adding to memstart and bytesRead is taking extremely long. by my, arguably limited, understanding of time complexity this function has a time complexity of O(n) and should, in the best case only take about 12 ms for a 128 byte query
Am i missing something?
Those 700ms are not caused by the execution time of the few instructions in your function. Those should be done in microseconds. You may have a buffer overflow, or the other device might be delaying transfers, or there's another bug not related to buffer overflow.
This is about how I'd do it:
void i2cRead(unsigned char device, unsigned char memory, int len, unsigned char * rdBuf, int bufLen)
{
ushort _memstart = memory;
if ( bufLen < len ) {
len = bufLen;
}
while (len > 0)
{
Wire.beginTransmission((int)device);
Wire.write(_memstart);
Wire.endTransmission();
int reqSize = 32;
if ( len < reqSize ) {
reqSize = len;
}
Wire.requestFrom((int)device, reqSize);
while (Wire.available() && (len != 0))
{
*(rdBuf++) = Wire.read();
_memstart++;
len--;
}
}
}
Related
I'm currently experimenting with writing to EEPROMs over I2C. Reading goes fine and I get excellent throughput. However when I try to write to the device, the Arduino stops responding, and I have to reset it in order to get it to work again.
The I2C write also doesn't show up in my I2C debugger.
void i2cWrite(unsigned char device, unsigned char memory, const char *wrBuf, unsigned short len) {
int i = 0;
ushort bytesWritten = 0;
ushort _memstart = memory;
ushort blockSize = 4;
#ifdef DEBUG_MSGS
char serialBuf[255] = { '\0'};
Serial.print("Writing To i2c: ");
sprintf(serialBuf, "%p", wrBuf);
Serial.println(serialBuf);
#endif //DEBUG_MSGS
while (bytesWritten < len) {
Wire.beginTransmission((int)device);
Wire.write((unsigned char)_memstart);
for (int j = 0; i < blockSize; j++) {
Wire.write(wrBuf[bytesWritten + j]);
}
Wire.endTransmission();
bytesWritten += blockSize;
_memstart += blockSize;
delay(25);
}
#ifdef DEBUG_MSGS
Serial.println("\mDone writing.");
#endif //DEBUG_MSGS
}
I'm quite unsure as to what I'm doing wrong. I'm getting the following output over the serial connection:
Write Request Received: Andy
Writing To i2c: 0xa800fd98
"Writing to i2c" always gives the same value, and it always seems to crash straight after.
the error seemed to be located in the loop as the output is
Write Request Received: Andy
Writing To i2c: 0xa800fd98
I'm working here
I wrote the memory adress
I wrote a byte of data
I wrote a byte of data
I wrote a byte of data
I wrote a byte of data
I wrote a byte of data
....
this seems to go on ad infinitum.
After adding a few more debug statements and changing the points Some programmer dude noticed
{
Wire.beginTransmission((int)device);
Serial.println("I'm working here");
Wire.write((unsigned char)_memstart);
Serial.println("I wrote the memory adress");
for (int j = 0; j < blockSize; j++) {
Wire.write(wrBuf[bytesWritten + j]);
Serial.println("I wrote a byte of data");
//Serial.write(wrBuf[bytesWritten + j]);
}
Wire.endTransmission();
Serial.println("I ended the transmission");
bytesWritten += blockSize;
_memstart += blockSize;
delay(25);
}
i noticed that i was checking for I < blocksize (copied from the reading part,) now i'm running into some other (small) issues but this solved the problem i was having.
I need to encrypt my data,so i encrypt them using AES. And I can encrypt short data.But I need to encrypt long data, it can't work.What can I do to fix this problem.This is my code.
#include "cooloi_aes.h"
CooloiAES::CooloiAES()
: MSG_LEN(0)
{
for(int i = 0; i < AES_BLOCK_SIZE; i++)
{
key[i] = 32 + i;
}
}
CooloiAES::~CooloiAES()
{
}
std::string CooloiAES::aes_encrypt(std::string msg)
{
int i = msg.size() / 1024;
MSG_LEN = ( i + 1 ) * 1024;
char in[MSG_LEN];
char out[MSG_LEN];
memset((char*)in,0,MSG_LEN);
memset((char*)out,0,MSG_LEN);
strncpy((char*)in,msg.c_str(),msg.size());
unsigned char iv[AES_BLOCK_SIZE];
for(int j = 0; j < AES_BLOCK_SIZE; ++j)
{
iv[j] = 0;
}
AES_KEY aes;
if(AES_set_encrypt_key((unsigned char*)key, 128, &aes) < 0)
{
return NULL;
}
int len = msg.size();
AES_cbc_encrypt((unsigned char*)in,(unsigned char*)out,len,&aes,iv,AES_ENCRYPT);
std::string encrypt_msg(&out[0],&out[MSG_LEN+16]);
std::cout << std::endl;
return encrypt_msg;
}
std::string CooloiAES::aes_decrypt(std::string msg)
{
MSG_LEN = msg.size();
char in[MSG_LEN];
char out[MSG_LEN+16];
memset((char*)in,0,MSG_LEN);
memset((char*)out,0,MSG_LEN+16);
strncpy((char*)in,msg.c_str(),msg.size());
std::cout << std::endl;
unsigned char iv[AES_BLOCK_SIZE];
for(int j = 0; j < AES_BLOCK_SIZE; ++j)
{
iv[j] = 0;
}
AES_KEY aes;
if(AES_set_decrypt_key((unsigned char*)key, 128, &aes) < 0)
{
return NULL;
}
int len = msg.size();
AES_cbc_encrypt((unsigned char*)in,(unsigned char*)out,len,&aes,iv,AES_DECRYPT);
std::string decrypt_msg = out;
return decrypt_msg;
}
When i encrypt data which has 96 byte, it will failed.I get this error "terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
".But I don't think this string is longer than max length.And I don't where is wrong.
You have nothing wrong in your encryption/decryption except for the padding issues and usage of strncpy and (char *) constructor when dealing with binary. You shouldn't encrypt last block of data if it doesn't fit all of the 16 bytes. So you should implement your own padding or don't encrypt last small block at all, your code will be simplified to this:
string aes_encrypt/decrypt(string msg)
{
unsigned char out[msg.size()];
memcpy((char*)out,msg.data(),msg.size());
AES_cbc_encrypt((unsigned char *)msg.data(),out,msg.size()/16*16,&aes,iv,AES_ENCRYPT **or** AES_DECRYPT);
return string((char *)out, msg.size());
}
To summarize:
don't use strncpy() with binary
don't use string s = binary_char_massive; constructor
don't encrypt last portion of data if it doesn't fit to block size or pad it yourself
Use EVP_* openssl API if there is possibility of future algorithms change
AES normally encrypts data by breaking it up into 16 byte blocks. If the last block is not 16 bytes long, it's padded to 16 bytes. Wiki articles:
http://en.wikipedia.org/wiki/Advanced_Encryption_Standard
http://en.wikipedia.org/wiki/AES_implementations
I stored a filesize in a binary file and I am able to get this filesize into a char[8] buffer. I would like to convert this char[] into an off_t type in order to be able to pass it as an argument of truncate(const char *path, off_t length).
I tried this naive approach and it seems to work most of the time, but it fails sometimes and gives me a weird sequence of bits.
off_t pchar_2_off_t(char* str, size_t size)
{
off_t ret = 0;
size_t i;
for (i = 0; i < size; ++i)
{
ret <<= 8;
ret |= str[i];
}
return ret;
}
ret |= str[i]; is a problem as str[i] may sign-extend upon conversion to int, setting many bits in ret. Implied by #pmg and commented by #mafso
off_t pchar_2_off_t(const char* str, size_t size) {
off_t ret = 0;
size_t i;
for (i = 0; i < size; ++i) {
ret <<= 8;
ret |= (unsigned char) str[i];
}
return ret;
}
Just bulk-copy the data in question:
#include <string.h> /* for memcpy() */
...
char str[8];
/* Read 8 bytes binary data into str here. */
off_t off_file;
memcpy(&off_file, str, sizeof off_file);
To get around any endiness issues just do:
off_t off = ntohll(off_file); /* Assuming ntohll being the 64bit version of ntohl(). */
As ntohll() is non-standard please see some possible ways to implement it here: 64 bit ntohl() in C++?
unsigned const char blah[8] = {0xdd,0xee,0xaa,0xdd,0xbb,0xee,0xee,0xff};
off_t * scalar = (off_t *) malloc(8);
memcpy(scalar, blah, 8);
printf("%llx\n",*scalar);
outputs(on my intel machine): ffeeeebbddaaeedd
what the wha?! you say.... there is a problem with this approach, and it is that it isn't portable ... it is a problem with endianness ...
so if you want to do this portably you need to actually either be aware of endianness and special case it or just convert with a loop:
*scalar = 0;
for (int i = 0; i < 8; i++)
{
*scalar += (uint64_t)blah[i] << ( 8 * (7-i));
}
printf("%llx\n",*scalar);
outputs (on all machines that have 64bit off_t's): ddeeaaddbbeeeeff
Assuming the file that contains the filesize was created on the EXACT same machine AND that it was originally written with an off_t type, you can just cast the char[] -> an off_t. eg:
off_t filesize = *((off_t*)str);
I need to read huge 35G file from disc line by line in C++. Currently I do it the following way:
ifstream infile("myfile.txt");
string line;
while (true) {
if (!getline(infile, line)) break;
long linepos = infile.tellg();
process(line,linepos);
}
But it gives me about 2MB/sec performance, though file manager copies the file with 100Mb/s speed. I guess that getline() is not doing buffering correctly. Please propose some sort of buffered line-by-line reading approach.
UPD: process() is not a bottleneck, code without process() works with the same speed.
You won't get anywhere close to line speed with the standard IO streams. Buffering or not, pretty much ANY parsing will kill your speed by orders of magnitude. I did experiments on datafiles composed of two ints and a double per line (Ivy Bridge chip, SSD):
IO streams in various combinations: ~10 MB/s. Pure parsing (f >> i1 >> i2 >> d) is faster than a getline into a string followed by a sstringstream parse.
C file operations like fscanf get about 40 MB/s.
getline with no parsing: 180 MB/s.
fread: 500-800 MB/s (depending on whether or not the file was cached by the OS).
I/O is not the bottleneck, parsing is. In other words, your process is likely your slow point.
So I wrote a parallel parser. It's composed of tasks (using a TBB pipeline):
fread large chunks (one such task at a time)
re-arrange chunks such that a line is not split between chunks (one such task at a time)
parse chunk (many such tasks)
I can have unlimited parsing tasks because my data is unordered anyway. If yours isn't then this might not be worth it to you.
This approach gets me about 100 MB/s on an 4-core IvyBridge chip.
I've translated my own buffering code from my java project and it does what I need. I had to put defines to overcome problems with M$VC 2010 compiler tellg, that always gives wrong negative values on huge files. This algorithm gives desired speed ~100MB/s, though it does some usless new[].
void readFileFast(ifstream &file, void(*lineHandler)(char*str, int length, __int64 absPos)){
int BUF_SIZE = 40000;
file.seekg(0,ios::end);
ifstream::pos_type p = file.tellg();
#ifdef WIN32
__int64 fileSize = *(__int64*)(((char*)&p) +8);
#else
__int64 fileSize = p;
#endif
file.seekg(0,ios::beg);
BUF_SIZE = min(BUF_SIZE, fileSize);
char* buf = new char[BUF_SIZE];
int bufLength = BUF_SIZE;
file.read(buf, bufLength);
int strEnd = -1;
int strStart;
__int64 bufPosInFile = 0;
while (bufLength > 0) {
int i = strEnd + 1;
strStart = strEnd;
strEnd = -1;
for (; i < bufLength && i + bufPosInFile < fileSize; i++) {
if (buf[i] == '\n') {
strEnd = i;
break;
}
}
if (strEnd == -1) { // scroll buffer
if (strStart == -1) {
lineHandler(buf + strStart + 1, bufLength, bufPosInFile + strStart + 1);
bufPosInFile += bufLength;
bufLength = min(bufLength, fileSize - bufPosInFile);
delete[]buf;
buf = new char[bufLength];
file.read(buf, bufLength);
} else {
int movedLength = bufLength - strStart - 1;
memmove(buf,buf+strStart+1,movedLength);
bufPosInFile += strStart + 1;
int readSize = min(bufLength - movedLength, fileSize - bufPosInFile - movedLength);
if (readSize != 0)
file.read(buf + movedLength, readSize);
if (movedLength + readSize < bufLength) {
char *tmpbuf = new char[movedLength + readSize];
memmove(tmpbuf,buf,movedLength+readSize);
delete[]buf;
buf = tmpbuf;
bufLength = movedLength + readSize;
}
strEnd = -1;
}
} else {
lineHandler(buf+ strStart + 1, strEnd - strStart, bufPosInFile + strStart + 1);
}
}
lineHandler(0, 0, 0);//eof
}
void lineHandler(char*buf, int l, __int64 pos){
if(buf==0) return;
string s = string(buf, l);
printf(s.c_str());
}
void loadFile(){
ifstream infile("file");
readFileFast(infile,lineHandler);
}
Use a line parser or write the same. here is a sample in the sourceforge http://tclap.sourceforge.net/ and put in a buffer if necessary.
I have a block of memory with elements of fixed size, say 100 bytes, put into it one after another, all with the same fixed length, so memory looks like this
<element1(100 bytes)><element2(100 bytes)><element3(100 bytes)>...
In some situations I need to determine whether all bytes of a certain element are set to the 0-byte because that has a special meaning (I didn't say it was a good idea, but that is the situation I am in).
The question is, how do I do that efficiently. Further: is there a simple function to do it. For setting bytes to zero I can used memset or bzero, but I don't know of any function for checking for zero.
At the moment I am using a loop for the check
char *elementStart = memoryBlock + elementNr*fixedElementSize;
bool special = true;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special &= (*(elementStart+curByteNr)) == 0;
}
Of course, I could loop with a bigger offset and check several bytes at once with a mword or some other suited bigger type. And I guess that would be rather efficient, but I would like to know whether there is a function to take that burden from me.
Suggested functions:
!memcmp (compareBlock, myBlock, fixedElementSize)
You could perhaps actually use memcmp without having to allocate a zero-valued array, like this:
static int memvcmp(void *memory, unsigned char val, unsigned int size)
{
unsigned char *mm = (unsigned char*)memory;
return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0;
}
The standard for memcmp does not say anything about overlapping memory regions.
The obvious portable, high efficiency method is:
char testblock [fixedElementSize];
memset (testblock, 0, sizeof testblock);
if (!memcmp (testblock, memoryBlock + elementNr*fixedElementSize, fixedElementSize)
// block is all zero
else // a byte is non-zero
The library function memcmp() in most implementations will use the largest, most efficient unit size it can for the majority of comparisons.
For more efficiency, don't set testblock at runtime:
static const char testblock [100];
By definition, static variables are automatically initialized to zero unless there is an initializer.
I can't believe no one posted this yet... a solution that actually looks like C++ and isn't UB for breaking aliasing rules:
#include <algorithm> // std::all_of
#include <cstddef> // std::size_t
// You might only need this
bool
memory_is_all_zeroes(unsigned char const* const begin,
std::size_t const bytes)
{
return std::all_of( begin, begin + bytes,
[](unsigned char const byte) { return byte == 0; } );
}
// but here's this as a bonus
template<typename T_Element, std::size_t T_count>
bool
array_is_all_zeroes( T_Element const (& array)[T_count] )
{
auto const begin = reinterpret_cast<unsigned char const*>(array);
auto const bytes = T_count * sizeof(T_Element);
return memory_is_all_zeroes(begin, bytes);
}
int
main()
{
int const blah[1000]{0};
return !array_is_all_zeroes(blah);
}
This might not satisfy some people's assumptions about efficiency (which are just that, assumptions, until profiled), but I think being valid and idiomatic code are much in its favour.
AFAIK there is no automatically function to check memory.
You could use | to speed up the for-loop, no need for "=="
char *elementStart = memoryBlock + elementNr*fixedElementSize;
char special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special |= (*(elementStart+curByteNr));
}
and also can you use long for even more speed
char *elementStart = memoryBlock + elementNr*fixedElementSize;
long special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; curByteNr += sizeof(long) )
{
special |= *(long*)(elementStart+curByteNr);
}
WARNING: the above code is not tested. Please test it first so that the sizeof and casting operator works
I have tested some solutions proposed here and checked memcmp source code which is not optimized for the OP needs since it has an additional requirement to perform sorting, leading it to compare unsigned char one by one.
In the following, I propose an optimized function check_memory_zeroed which performs most of the check on the biggest aligned int available, making it portable, and I compare it with the other solutions proposed in this thread. Time measurement is performed and results printed.
It shows that the proposed solution is near twice better than wallyk's obvious portable high efficiency method and does not need to create an additional array, and six times better than char by char comparison or mihaif's shifted array which saves RAM compared to wallyk's one.
I have also tested my solution without aligning the words check_memory_zeroed_bigestint_not_aligned and surprisingly, it performs even better. If someone has an explanation, he is welcome.
Here is the code with functional and performance tests on a 1Gb table (the proposed optimized function is the fisrt one : check_memory_zeroed):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <assert.h>
#include <time.h>
#define BIG_TAB_SIZE 1000000000
typedef intmax_t biggestint;
int check_memory_zeroed (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
int bis = sizeof(biggestint);
char* pc = (char*) ptr;
biggestint* pbi0 = (biggestint*) pc;
if ((size_t) pc % bis) /* is aligned ? */
pbi0 = (biggestint*) (pc + (bis - ((size_t) pc % bis))); /* minimal pointer larger than ptr but aligned */
assert ((size_t) pbi0 % bis == 0); /* check that pbi0 is aligned */
for (char* p = pc; p < (char*) pbi0; p++)
if(*p) return 0; /* check beginning of non aligned array */
biggestint* pbi = pbi0;
biggestint* pbiUpper = ((biggestint*) (pc + size)) - 1;
for (;pbi <= pbiUpper; pbi++)
if(*pbi) return 0; /* check with the biggest int available most of the array : its aligned part */
for (char* p = (char*) pbi; p < pc + size; p++)
if(*p) return 0; /* check end of non aligned array */
return 1;
}
int check_memory_zeroed_bigestint_not_aligned (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
biggestint* pbi = (biggestint*) ptr;
biggestint* pbiUpper = ((biggestint*) (((char*) ptr) + size)) - 1;
for (;pbi <= pbiUpper; pbi++)
if(*pbi) return 0; /* check with the biggest int available most of the array, but without aligning it */
for (char* p = (char*) pbi; p < ((char*) ptr) + size; p++)
if(*p) return 0; /* check end of non aligned array */
return 1;
}
int check_memory_zeroed_by_char (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
for (char* p = (char*) ptr; p < ((char*) ptr) + size; p++)
if(*p) return 0;
return 1;
}
/* variant of wallyk solution */
int check_memory_zeroed_by_memcmp_and_testblock (void* ptr, size_t size)
{
void* testblock = malloc(size);
if (ptr == NULL || testblock == NULL) return -1;
memset (testblock, 0, sizeof(testblock));
int res = ! memcmp (testblock, ptr, size);
free (testblock);
return res;
}
/* variant of mihaif solution */
int check_memory_zeroed_by_memcmp_with_shifted_array (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
char* pc = (char*) ptr;
return (*pc) || memcmp(pc, pc + 1, size - 1);
}
int test() {
/* check_memory_zeroed (void* ptr, size_t size) */
char tab[16];
for (int i = 0; i < 8; i++)
for (int j = 0; j < 8; j++) {
for (int k = 0; k < 16; k++) tab[k] = (k >= i && k < 16 - j) ? 0 : 100 + k;
assert(check_memory_zeroed(tab + i, 16 - j - i));
if (i > 0) assert(tab[i-1] == 100 + i - 1);
if (j > 0) assert(tab[16 - j] == 100 + 16 - j);
for (int k = i; k < 16 - j; k++) {
tab[k] = 200+k;
assert(check_memory_zeroed(tab + i, 16 - j - i) == 0);
tab[k] = 0;
}
}
char* bigtab = malloc(BIG_TAB_SIZE);
clock_t t = clock();
printf ("Comparison of different solutions execution time for checking an array has all its values null\n");
assert(check_memory_zeroed(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed optimized : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_bigestint_not_aligned(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_bigestint_not_aligned : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_char(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_char : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_memcmp_and_testblock(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_memcmp_and_testblock by wallyk : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_memcmp_with_shifted_array(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
free (bigtab);
return 0;
}
int main(void) {
printf("Size of intmax_t = %lu\n", sizeof(intmax_t));
test();
return 0;
}
And the results for comparison of different solutions execution time for checking an array has all its values null:
Size of intmax_t = 8
check_memory_zeroed optimized : 0.331238 seconds
check_memory_zeroed_bigestint_not_aligned : 0.260504 seconds
check_memory_zeroed_by_char : 1.958392 seconds
check_memory_zeroed_by_memcmp_and_testblock by wallyk : 0.503189 seconds
check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : 2.012257 seconds
It is not possible to check all 100 bytes at the same time. So, you (or any utility functions) have to iterate through the data in any case. But, besides having a step size bigger than 1 byte, you could do some more optimizations: For example, you could break as soon as you find a non-zero value. Well, the time complexity would still be O(n), I know.
I can't recall a standard library function which could do this for you. If you are not sure this causes any performance issues I'd just use the loop, maybe replace char* with int* as already suggested.
If you do have to optimize you could unroll the loop:
bool allZeroes(char* buffer)
{
int* p = (int*)buffer; // you better make sure your block starts on int boundary
int acc = *p;
acc |= *++p;
acc |= *++p;
...
acc |= *++p; // as many times as needed
return acc == 0;
}
You may need to add special handling for the end of buffer if it's size is not a multiple of sizeof(int), but it could be more efficient to allocate a slightly larger block with some padding bytes set to 0.
If your blocks are large you could treat them as a sequence of smaller blocks and loop over them, using the code above for each small block.
I would be curious to know how this solution compares with std::upper_bound(begin,end,0) and memcmp.
EDIT
Did a quick check how a home-grown implementation compares with memcmp, used VS2010 for that.
In short:
1) in debug mode home-grown can be twice as fast as memcmp
2) in release with full optimization memcmp has an edge on the blocks which start with non-0s. As the length of the zero-filled preamble increases it starts losing, then somehow magically gets almost as fast as homegrown, about only 10% slower.
So depending on your data patterns and need/desire to optimize you could get some extra performance from rolling out your own method, but memcmp is a rather reasonable solution.
Will put the code and results on github in case you could use them.
The following will iterate through the memory of a structure.
Only disadvantage is that it does a bytewise check.
#include <iostream>
struct Data { int i; bool b; };
template<typename T>
bool IsAllZero(T const& data)
{
auto pStart = reinterpret_cast<const char*>(&data);
for (auto pData = pStart; pData < pStart+sizeof(T); ++pData)
{
if (*pData)
return false;
}
return true;
}
int main()
{
Data data1;// = {0}; // will most probably have some content
Data data2 = {0}; // all zeroes
std::cout << "data1: " << IsAllZero(data1) << "\ndata2: " << IsEmptyStruct(data2);
return 0;
};
What about using long int and binary or operator.
unsigned long long int *start, *current, *end, value = 0;
// set start,end
for(current = start; current!=end; current++) {
value |= *current;
}
bool AllZeros = !value;
Well if you just want to decide whether a single element is all 0s you can create a 100byte element with all 1s. Now when you want to check whether an element is all 0s just binary AND (&) the content of the element and the element you created(all 1s). now if the result of binary AND is zero the element you checked had all 0s otherwise it was not all 0s
the creation of a 100 byte element with all 1s seems costly but if you have a large number of elements to check then its actually better
you can create the 100 byte element with all 1s as void *elem; elem=malloc(100);
now set all bits to 1(use ~(elem&0))