logical operations between chunks of memory? - c++

I want to or two big chunks of memory... but it doesn't work
Consider I have three char * bm, bm_old, and bm_res.
#define to_uint64(buffer,n) {(uint64_t)buffer[n] << 56 | (uint64_t)buffer[n+1] << 48 | (uint64_t)buffer[n+2] << 40 | (uint64_t)buffer[n+3] << 32 | (uint64_t) buffer[n+4] << 24 | (uint64_t)buffer[n+5] << 16 | (uint64_t)buffer[n+6] << 8 | (uint64_t)buffer[n+7];}
...
for (unsigned int i=0; i<bitmapsize(size)/8; i++){
uint64_t or_res = (to_uint64(bm_old,i*8)) | (to_uint64(bm,i*8));
memcpy(bm_res+i*sizeof(uint64_t), &or_res, sizeof(uint64_t));
}
bm_res is not correct!
Have any clue?
Thanks,
Amir.

Enclose the definition of to_uint64 in parentheses () instead of braces {} and get rid of the semicolon at the end. Using #define creates a macro whose text is inserted verbatim wherever it's used, not an actual function, so you were attempting to |-together two blocks rather than those blocks' "return values."

I think you need to advance your output pointer by the correct size:
memcpy(bm_res + i * sizeof(uint64_t), &or_res, sizeof(uint64_t));
^^^^^^^^^^^^^^^^^^^^
Since bm_res is a char-pointer, + 1 advances by just one byte.

You're incrementing bm_res by one for every eight-byte block you move. Further, you never increment bm or bm_old at all. So you're basically tiling the first byte of or_res over bm_res, which is probably not what you want.
More importantly, your code is byte-order sensitive - whether or_res is represented in memory as least-order-byte first or highest-order-byte first matters.
I would recommend you just do a byte-by-byte or first, and only try to optimize it if that is too slow. When you do optimize it, don't use your crazy to_uint64 macro there - it'll be slower than just going byte-by-byte. Instead, cast to uint64_t * directly. While this is, strictly speaking, undefined behavior, it works on every platform I've ever seen, and should be byteorder agnostic.

Related

How '<<' operator does work with #define in C++

I have two operation, and I am assuming both are doing ShiftLeft bitwise operation.
#define TACH_MAX_OWN_ORDERS 1<<6
int myVal = 1<<6
cout<<"Value after operation|"<<myVal <<"|"<<TACH_MAX_OWN_ORDERS<<endl;
output of TACH_MAX_OWN_ORDERS value always surprise me.
Value after operation|64|16
Do anyone have any clue, how it comes???
Thanks
Macros replace text as is, so it will result in
cout<<"Value after operation|"<<myVal <<"|"<<1<<6<<endl;
the << won't result in (int)1<<6 but rather ([...] << 1) << 6 where [...] will have std::cout at the deepest level. This means your macro will always result in 16 when used in std::cout, because 1 and 6 are shifted into the out stream ("1" + "6") instead of the actual numerical value 64.
You should put parantheses around the statement to avoid this:
#define TACH_MAX_OWN_ORDERS (1<<6)
or even better, since you should avoid macros, if available try to use compile time constants:
constexpr int TACH_MAX_OWN_ORDERS = 1 << 6;

Typecast four bytes to a 4 byte int

If the string is:
char message[] = "HI THERE";
How would I take the first four bytes ("HI T") and typecast them to a 4 byte int?
Total value should equal 1411402056.
The safe way to do it which will always work is to just shift each byte individually:
uint32_t i = (uint8_t(message[0]) << 24) | (uint8_t(message[1]) << 16) | (uint8_t(message[2]) << 8) | uint8_t(message[3]);
You might need to reverse the order of the bytes depending on the endianess of your string.
You may also be able to simply reinterpret cast to an integer depending on the endianess of the string matching the endianess of your processor, whether the string is aligned to the correct byte boundary etc:
Int i = *reinterpret_cast<int*>(message);
The simplest way is:
int32_t num = *reinterpret_cast<int32_t*>(message);
But this is technically a violation of Strict Aliasing. A safer way is:
int32_t num;
memcpy(&num, message, sizeof(num));
Though, to be really safe, you should use the bit-shifting approach described in Alan Birtles's answer.
You might have to swap the order of the int's bytes afterwards, depending on the endian of your system.

Reading consecutive bytes as one integer

I am new here, and would like to ask this question.
I am working with a binary file that each byte, multiple bytes or even parts of a byte have a different meaning.
What I have been trying so far is to read a number of bytes (4 in my example) as a one block.
I have them in Hexadecimal representation like: 00 1D FB C8.
Using the following code, I read them separately:
for (int j = 36; j < 40;j++)
{
cout << dec << (bitset<8>(fileBuf[j])).to_ulong();
}
where j is the position of the byte in the file. The previous code gives me 029251200 which is wrong. What I want is read the 4 bytes at once and get the answer of 1965000
I appreciate any help.
Thank you.
DWORD final = (fileBuf[j] << 24) + (fileBuf[j+1] << 16) + (fileBuf[j+2] << 8) + (fileBuf[j+3]);
Also depends what kind of endian you want (ABCD / DCBA / CDAB)
EDIT (cant reply due to low rep, just joined today)
I tried to extend the bitset, however it gave the value of the first byte only
It will not work because the fileBuf is 99% byte array, extending from 8bit to 32bit(int) wont make any difference because its still a byte array which is 8bit. You have to mathematicly calculate the value from 4 array elements into original integer representation. see code above edit
The answer isnt "Wrong" this is a logic error. Youre not storing the values and adding the computation
C8 is 200 in decimal form, so youre not appending the value to the original subset.
The answer it spit it out, was infact what you programmed it to do.
You need to either extend the bitset to a larger amount to append the other hex numbers or provide some other means of outputting
Keeping the format of the function from the question, you could do:
//little-endian
{
int i = (fileBuf[j]<<0) | (fileBuf[j+1]<<8) | (fileBuf[j+2]<<16) | (fileBuf[j+3]<<24);
cout << dec << i;
}
// big-endian
{
int i = (fileBuf[j+3]<<0) | (fileBuf[j+2]<<8) | (fileBuf[j+1]<<16) | (fileBuf[j]<<24);
cout << dec << i;
}

Combining 2 Hex Values Into 1 Hex Value

I have a coordinate pair of values that each range from [0,15]. For now I can use an unsigned, however since 16 x 16 = 256 total possible coordinate locations, this also represents all the binary and hex values of 1 byte. So to keep memory compact I'm starting to prefer the idea of using a BYTE or an unsigned char. What I want to do with this coordinate pair is this:
Let's say we have a coordinate pair with the hex value [0x05,0x0C], I would like the final value to be 0x5C. I would also like to do the reverse as well, but I think I've already found an answer with a solution to the reverse. I was thinking on the lines of using & or | however, I'm missing something for I'm not getting the correct values.
However as I was typing this and looking at the reverse of this: this is what I came up with and it appears to be working.
byte a = 0x04;
byte b = 0x0C;
byte c = (a << 4) | b;
std::cout << +c;
And the value that is printing is 76; which converted to hex is 0x4C.
Since I have figured out the calculation for this, is there a more efficient way?
EDIT
After doing some testing the operation to combine the initial two is giving me the correct value, however when I'm doing the reverse operation as such:
byte example = c;
byte nibble1 = 0x0F & example;
byte nibble2 = (0xF0 & example) >> 4;
std::cout << +nibble1 << " " << +nibble2 << std::endl;
It is printout 12 4. Is this correct or should this be a concern? If worst comes to worst I can rename the values to indicate which coordinate value they are.
EDIT
After thinking about this for a little bit and from some of the suggestions I had to modify the reverse operation to this:
byte example = c;
byte nibble1 = (0xF0 & example) >> 4;
byte nibble2 = (0x0F & example);
std:cout << +nibble1 << " " << +nibble2 << std::endl;
And this prints out 4 12 which is the correct order of what I am looking for!
First of all, be careful about there are in fact 17 values in the range 0..16. Your values are probably 0..15, because if they actually range both from 0 to 16, you won't be able to uniquely store every possible coordinate pair into a single byte.
The code extract you submitted is pretty efficient, you are using bit operators, which are the quickest thing you can ask a processor to do.
For the "reverse" (splitting your byte into two 4-bit values), you are right when thinking about using &. Just apply a 4-bit shift at the right time.

C/C++ pointer trick fix

I'm trying this pointer trick and I can't figure out how to fix it, I'm running g++ 4.6 on ubuntu 12.04 64-bit. Check out this code below:
int arr[5];
arr[3] = 50;
((short*) arr)[6] = 2;
cout << arr[3] << endl;
The logic is: since short is 2 bytes, int is 4 bytes, I want to change the first 2 bytes in arr[3], while keeping the value of the second 2 bytes as 50. So I'm just messing with the bit pattern. Unfortunately, sizeof(int*) and sizeof(short*) are both 8 bytes. Is there a type cast that returns a pointer of size 2 bytes?
Update:
I realized that the question is poorly written, so I'll fix that:
The output from cout << arr[3] << endl; I'm getting is 2. The output I would like to get is neither 2 nor 50, but rather a large number, indicating that the left part of the int bit pattern has been changed, while the right part (the second 2-bits) of the int stored in arr[3] is still unchanged.
sizeof(int*) and sizeof(short*) are both going to be the same -- as will sizeof(void*) -- you're asking for the size of a pointer, not the size of the thing the pointer points to.
Use sizeof(int) or sizeof(short) instead.
Now, as for your code snippet, you are making assumptions about the endianness of the machine on which you're running. The "first" part of the int on a given platform may be the bytes with higher address, or the bytes with lower address.
For instance, your memory block may be laid out like this. Let's say the least significant byte has index zero, and the most significant byte has index one. On a big endian architecture an int may look like this:
<------------- 4 bytes --------------->
+---------+---------+---------+---------+
| int:3 | int:2 | int:1 | int:0 |
| short:1 | short:0 | short:1 | short:0 |
+---------+---------+---------+---------+
Notice how the first short in the int -- which in your case would have been ((short*) arr)[6] -- contains the most significant bits of the int, not the least significant. So if you overwrite ((short*) arr)[6], you are overwriting the most significant bits of arr[3], which appears to be what you wanted. But x64 is not a big endian machine.
On a little endian architecture, you would see this instead:
<------------- 4 bytes --------------->
+---------+---------+---------+---------+
| int:0 | int:1 | int:2 | int:3 |
| short:0 | short:1 | short:0 | short:1 |
+---------+---------+---------+---------+
leading to the opposite behavior -- ((short*) arr)[6] would be the least significant bits of arr[3], and ((short*) arr)[7] would be the most significant.
Here's what my machine happens to do -- your machine may be different:
C:\Users\Billy\Desktop>type example.cpp
#include <iostream>
int main()
{
std::cout << "Size of int is " << sizeof(int) << " and size of short is "
<< sizeof(short) << std::endl;
int arr[5];
arr[3] = 50;
((short*) arr)[6] = 2;
std::cout << arr[3] << std::endl;
((short*) arr)[7] = 2;
std::cout << arr[3] << std::endl;
}
C:\Users\Billy\Desktop>cl /W4 /EHsc /nologo example.cpp && example.exe
example.cpp
Size of int is 4 and size of short is 2
2
131074
Your problem is due to endianness. Intel CPU's are little endian meaning that the first byte of an int is stored in the the first address. Let me how you can example:
Let's assume that arr[3] is at address 10:
Then arr[3] = 50; Writes the following to memory
10: 0x32
11: 0x00
12: 0x00
13: 0x00
And ((short*) arr)[6] = 2; writes the following to memory
10: 0x02
11: 0x00
When you index a pointer, it adds the index multiplied by the size of the pointed-to type. So, you don't need a 2-byte pointer.
You are making a lot of assumptions which might not hold water. Also what does the sizeof pointers have to do with the problem at hand?
Why not just use bit masks:
arr[3] |= (top_2_bytes << 16);
This should set the upper 16 bytes without disturbing the lower 16. (You may get into signed/unsigned dramas)
Said all the above, the standard prohibits doing such things: Setting a variable through a pointer to another type is calling undefined behaviour. If you know how your machine works (sizes of int and short, endianness, ...) and you know how your compiler (is likely to) translate your code, then you might get away with it. Does for neat parlor tricks, and spectacular explosions when the machine/compiler/phase of the moon change.
If it wins any performance, the win will be minimal, and it could even be a net loss (one compiler long ago I fiddled with, playing the "I can implement this loop better than the compiler, I know exactly what goes on here" generated much worse code for my `label: ... if() goto label; than the exact same written natively as a loop: My "smart" code confused the compiler, its "pattern for loops" did not apply).
You wouldn't want your actual pointer to be of size two bytes; this would mean that it could only access ~16k of memory addresses. However, using a cast, as you are, to a short *, will let you access the memory every two bytes (since the compiler will view the array as an array of shorts, not of ints).