Reinterpret casted value varies by compiler - c++

For the same program:
const char* s = "abcd";
auto x1 = reinterpret_cast<const int64_t*>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << *x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
In gcc5(link): 139639660962401
In gcc8(link): 1684234849
Why does the value vary according to different compiler versions?
What is then a compiler safe way to move from const char* to int64_t and backward(just like in this problem - not for actual integer strings but one with other chars as well)?

Why does the value vary according to different compiler versions?
Behaviour is undefined.
What is then a compiler safe way to move from const char* to int64_t and backward
It is somewhat unclear what you mean by "move from const char* to int64_t". Based on the example, I assume you mean to create a mapping from a character sequence (of no greater length than fits) into a 64 bit integer in a way that can be converted back using another process - possibly compiled by another (version of) compiler.
First, create a int64_tobject, initialise to zero:
int64_t i = 0;
Get length of the string
auto len = strlen(s);
Check that it fits
assert(len < sizeof i);
Copy the bytes of the character sequence onto the integer
memcpy(&i, s, len);
(As long as the integer type doesn't have trap representations) The behaviour is well defined, and the generated integer will be the same across compiler versions as long as the CPU endianness (and negative number representation) remains the same.
Reading the character string back doesn't require copying because char is exceptionally allowed to alias all other types:
auto back = reinterpret_cast<char*>(&i);
Note the qualification in the last section. This method does not work if the integer is passed (across the network for example) to process running on another CPU. That can be achieved as well by bit shifting and masking so that you copy octets to certain position of significance using bit shifting and masking.

When you dereference the int64_t pointer, it is reading past the end of the memory allocated for the string you casted from. If you changed the length of the string to at least 8 bytes, the integer value would become stable.
const char* s = "abcdefg"; // plus null terminator
auto x1 = reinterpret_cast<const int64_t*>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << *x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
If you wanted to store the pointer in an integer instead, you should use intptr_t and leave out the * like:
const char* s = "abcd";
auto x1 = reinterpret_cast<intptr_t>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"

Based on what RemyLebeau pointed out in the comments of your post,
unsigned 5_byte_mask = 0xFFFFFFFFFF; std::cout << *x1 & 5_byte_mask << std::endl;
Should be a reasonable way to get the same value on a little endian machine with whatever compiler. It may be UB by one specification or another, but from a compiler's perspective, you're dereferencing eight bytes at a valid address that you have initialized five bytes of, and masking off the remaining bytes that are uninitialized / junk data.

Related

How to use different pointer arithmetic semantics

For academic purposes, I'm trying to purposefully overwrite data allocated on the free store. Here's what I've got working so far:
//main
int* val = new int(-1);
print_bits(*val);
short* i = new(val) short(0);
print_bits(*val);
std::cout << "\nval loc: " << val <<
"\ni loc: " << i << std::endl;
delete val;
//end main
As expected, this produces something similar to:
11111111111111111111111111111111
00000000000000001111111111111111
val loc: 0x27d5580
i loc: 0x27d5580
My next intention was to override the second byte in val, so I simply change short allocation to:
short* i = new(val+1) short(0);
However, after making this change, I got output similar something like:
11111111111111111111111111111111
11111111111111111111111111111111
val loc: 0x27d5580
i loc: 0x27d5584
As you can see, the val+1 move the pointer a full sizeof(int) bytes forward rather than just one byte forward. I understand why this happens (and am thankful for it). However, if my intention was to move only a single byte forward in memory, how could I accomplish that?
EDIT:
One solution I've come up with is to do something like
char* ch = &val;
short i = new(ch+1) short(0);
I'm actually not entirely sure yet whether this will work, since char*'s have a habit of being interpreted as C-style strings.
Solution:
The simplest solution that has been given is to write
short* i = new(reinterpret_cast<char*>(val)+1) short(0);
Just write
short* i = new(( char * )val + sizeof( short )) short(0);
or
short* i = new(reinterpret_cast<char *>( val ) + sizeof( short )) short(0);
A short won't be a single byte, and storing via one at a single byte offset potentially both violates alignment requirements and potentially stores to space beyond the length of the original allocation (depending on the size of short vs int).
To "move" by single bytes, you want to use a char * type. Simply:
*(reinterpret_cast<char*>(val)+1) = 0;
... should work.

C++ converting string containing non human readable data to 200 double

I have a string whose length is 1600 and I know that it contains 200 double. When I print out the string I get the following :Y���Vz'#��y'#��!U�}'#�-...
I would like to convert this string to a vector containing the 200 doubles.
Here is the code I tried (blobString is a string 1600 characters long):
string first_eight = blobString.substr(0, sizeof(double)); // I get the first 8 values of the string which should represent the first double
double double_value1
memcpy(&double_value1, &first_eight, sizeof(double)); // First thing I tried
double* double_value2 = (double*)first_eight.c_str(); // Second thing I tried
cout << double_value1 << endl;
cout << double_value2 << endl;
This outputs the following:
6.95285e-310
0x7ffd9b93e320
--- Edit solution---
The second method works all I had to do was look to where double_value1 was pointing.
cout << *double_value2 << endl;
Here's an example that might get you closer to what you need. Bear in mind that unless the numbers in your blob are in the exact format that your particular C++ compiler expects, this isn't going to work like you expect. In my example I'm building up the buffer of doubles myself.
Let's start with our array of doubles.
double doubles[] = { 0.1, 5.0, 0.7, 8.6 };
Now I'll build an std::string that should look like your blob. Notice that I can't simply initialize a string with a (char *) that points to the base of my list of doubles, as it will stop when it hits the first zero byte!
std::string double_buf_str;
double_buf_str.append((char *)doubles, 4 * sizeof(double));
// A quick sanity check, should be 32
std::cout << "Length of double_buf_str "
<< double_buf_str.length()
<< std::endl;
Now I'll reinterpret the c_str() pointer as a (double *) and iterate through the four doubles.
for (auto i = 0; i < 4; i++) {
std::cout << ((double*)double_buf_str.c_str())[i] << std::endl;
}
Depending on your circumstances you might consider using a std::vector<uint8_t> for your blob, instead of an std::string. C++11 gives you a data() function that would be the equivalent of c_str() here. Turning your blob directly into a vector of doubles would give you something even easier to work with--but to get there you'd potentially have to get dirty with a resize followed by a memcpy directly into the internal array.
I'll give you an example for completeness. Note that this is of course not how you would normally initialize a vector of doubles...I'm imagining that my double_blob is just a pointer to a blob containing a known number of doubles in the correct format.
const int count = 200; // 200 doubles incoming
std::vector<double> double_vec;
double_vec.resize(count);
memcpy(double_vec.data(), double_blob, sizeof(double) * count);
for (double& d : double_vec) {
std::cout << d << std::endl;
}
#Mooning Duck brought up the great point that the result of c_str() is not necessarily aligned to an appropriate boundary--which is another good reason not to use std::string as a general purpose blob (or at least don't interpret the internals until they are copied somewhere that guarantees a valid alignment for the type you are interested in). The impact of trying to read a double from a non-aligned location in memory will vary depending on architecture, giving you a portability concern. In x86-based machines there will only be a performance impact AFAIK as it will read across alignment boundaries and assemble the double correctly (you can test this on a x86 machine by writing then reading back a double from successive locations in a buffer with an increasing 1-byte offset--it'll just work). In other architectures you'll get a fault.
The std::vector<double> solution will not suffer from this issue due to guarantees about the alignment of newed memory built into the standard.

Bus error with allocated memory on a heap

I have Bus Error in such code:
char* mem_original;
int int_var = 987411;
mem_original = new char [250];
memcpy(&mem_original[250-sizeof(int)], &int_var, sizeof(int));
...
const unsigned char* mem_u_const = (unsigned char*)mem_original;
...
const unsigned char *location = mem_u_const + 250 - sizeof(int);
std::cout << "sizeof(int) = " << sizeof(int) << std::endl;//it's printed out as 4
std::cout << "byte 0 = " << int(*location) << std::endl;
std::cout << "byte 1 = " << int(*(location+1)) << std::endl;
std::cout << "byte 2 = " << int(*(location+2)) << std::endl;
std::cout << "byte 3 = " << int(*(location+3)) << std::endl;
int original_var = *((const int*)location);
std::cout << "original_var = " << original_var << std::endl;
That works well few times, printing out:
sizeof(int) = 4
byte 0 = 0
byte 1 = 15
byte 2 = 17
byte 3 = 19
original_var = 987411
And then it fails with:
sizeof(int) = 4
byte 0 = 0
byte 1 = 15
byte 2 = 17
byte 3 = 19
Bus Error
It's built & run on Solaris OS (C++ 5.12)
Same code on Linux (gcc 4.12) & Windows (msvc-9.0) is working well.
We can see:
memory was allocated on the heap by new[].
memory is accessible (we can read it byte by byte)
memory contains exactly what there should be, not corrupted.
So what may be reason for Bus Error? Where should I look?
UPD:
If I memcpy(...) location in the end to original_var, it works. But what the problem in *((const int*)location) ?
This is a common issue for developers with no experience on hardware that has alignment restrictions - such as SPARC. x86 hardware is very forgiving of misaligned access, albeit with performance impacts. Other types of hardware? SIGBUS.
This line of code:
int original_var = *((const int*)location);
invokes undefined behavior. You're taking an unsigned char * and interpreting what it points to as an int. You can't do that safely. Period. It's undefined behavior - for the very reason you're experiencing.
You're violating the strict aliasing rule. See What is the strict aliasing rule? Put simply, you can't refer to an object of one type as another type. A char * does not and can not refer to an int.
Oracle's Solaris Studio compilers actually provide a command-line argument that will let you get away with that on SPARC hardware - -xmemalign=1i (see https://docs.oracle.com/cd/E19205-01/819-5265/bjavc/index.html). Although to be fair to GCC, without that option, the forcing you do in your code will still SIGBUS under the Studio compiler.
Or, as you've already noted, you can use memcpy() to copy bytes around no matter what they are - as long as you know the source object is safe to copy into the target object - yes, there are cases when that's not true.
I get the following warning when I compile your code:
main.cpp:19:26: warning: cast from 'const unsigned char *' to 'const int *' increases required alignment from 1 to 4 [-Wcast-align]
int original_var = *((const int*)location);
^~~~~~~~~~~~~~~~~~~~
This seems to be the cause of the bus error, because improperly aligned access can cause a bus error.
Although I don’t have access to a SPARC right now to test this, I’m pretty sure from my experiences on that platform that this line is your problem:
const unsigned char *location = mem_u_const + 250 - sizeof(int);
The mem_u_const block was originally allocated by new for an array of characters. Since sizeof(unsigned char) is 1 and sizeof(int) is 4, you are adding 246 bytes. This is not a multiple of 4.
On SPARC, the CPU can only read 4-byte words if they are aligned to 4-byte boundaries. Your attempt to read a misaligned word is what causes the bus error.
I recommend allocating a struct with an array of unsigned char followed by an int, rather than a bunch of pointer math and casts like the one that caused this bug.

Unknown behavior of uint64_t datatype

I am trying to store a very big number in a uint64_t like:
int main(int argc, char** argv) {
uint64_t ml = sizeof(void*)*(1<<63);
cout << "ml=" << ml << "\n";
const char* r;
const char* mR=r+ml;
return 0;
}
But I do not know why am I getting the output as 0, despite of storing it in a uint64_t datatype?
EDIT: char* mR is my memory buffer and I can increase my memory buffer to at most ml. I want to make use of 64GB RAM machine. So, can you suggest how much should I increment mR to..as I want to use all the available RAM. That is to what value should I set ml to?
Try
uint64_t ml = ((uint64_t)1)<<63;
or just
uint64_t ml = 0x8000000000000000;
Just 1 << 63 uses integers, and if the shift value is too big, it is undefined behavior. In you case, it may result in 0 due to overflow.
Please note that if you multiply 0x8000000000000000 by sizeof(void*), you'll likely get overflow too.
If you want to allocate 64G of memory, that would be:
char* buffer = new char[64ULL * 1024 * 1024 * 1024];
or simply:
char* buffer = new char[1ULL << 36];
Note that 64G is 2^36 bytes, which is far, far less than the 2^63 number that you're trying to use. Although, typically when you use that much memory, it's because your program organically uses it through various operations... not by just allocating it in one large chunk.
Just use:
uint64_t ml = sizeof(void*) * (1ULL << 63);
Because, as AlexD already said, 1 << 63 uses integers, and 1 << 63 is actually 0.
Even after you correct the shift to (uint64_t)1 << 63, if sizeof(void*) is any even number (and it assuredly is), then your product will be divisible by 2^64, and thus be zero when stored in a uint64_t.

Input same value in few sequential array elements

so I found this:
std::fill_n(array, 100, value);
but I suspect it might not be what I'm looking for. I have an appay *pointer and need to put same value in few sequential elements fast, because they are pixels, and there's lots of them.
So I use:
*(pointer)=*(pointer+1)=value;
sometimes there's
(pointer)=(pointer+1)=*(pointer+2)=value;
but the first case is most crucial. One additional "+" is not a problem, I know, but when I use SDL's function to fill screen black (or other), it works kind of fast, and I don't know how it is optimized.
So if I need to costumly input same value in few neighbour elements of array, is there some cool trick.
Maybe some cast to (Uint64) and <<32 to place 2 same values in 2 integers trick?
Okey, sorry I didn't explained what this is for from the start.
So I render voxel object and sometimes after rotation there is spots on screen inside the object, where no pixel is drown, because I drow only kind of outer layer of object. And I want to do smoothing by basically stretching object by one pixel to the right. So while im putting pixel, I need to put one just like him to his right.
If you want to fill 100 (or even 1000) unsigned int elements, then you can choose any method you want, be it std::fill_n, or for loop - the number is so small you won't see the difference, even if you do this operation very often.
However, if you want to set values for a bigger array, say, 8k x 8k texture with pixels composed of 4 unsigned color components, then there is a short comparison of the methods you can use:
#include <iostream>
#include <ctime>
#include <cstdint>
int main(){
long unsigned const size = 8192 * 8192 * 4;
unsigned* arr = new unsigned[size];
clock_t t1 = clock();
memset(arr, 0, size*sizeof(unsigned));
clock_t t2 = clock();
std::fill_n(arr, size, 123);
clock_t t3 = clock();
for(int i = 0; i < size; ++i)
*(arr + i) = 123;
clock_t t4 = clock();
int64_t val = 123;
val = val << 32 | 132;
for(int i = 0; i < size / 2; ++i)
*(int64_t*)(arr + i * 2) = val;
clock_t t5 = clock();
std::cout << "memset = " << t2 - t1 << std::endl;
std::cout << "std::fill_n = " << t3 - t2 << std::endl;
std::cout << "for 32 = " << t4 - t3 << std::endl;
std::cout << "for 64 = " << t5 - t4 << std::endl;
delete arr;
return 0;
}
1. memset
This function is used here only to show you how fast zeroing your array could be, in comparison to other methods. It's the fastest solution, but only usable when you want to set every byte to the same value (especially useful with 0 and 0xFF in your case, I guess).
2. std::fill_n and for loop with 32-bit value
std::fill_n looks to be the slowest of the solutions, and it is even slightly slower than the for solution with 32-bit values.
3. for loop with 64-bit value ON 64-bit SYSTEM
I guess this is the solution you could go for, since it wins this competition. However, if your machine were 32-bit, then I would expect the results to be comparable to the loop with 32-bit values (depends on the compiler and processor), since processor will handle one 64-bit value as two 32-bit values.
Yes, you can use one 64-bit variable to put a value into two (or more) 32-bit (or smaller) consecutive elements. There are many ifs. Obviously you should be on 64 bit patform, and you should know how your platform handle alignment.
Somethin like this:
uint32_t val = ...;
uint64_t val2 = val;
(val2 <<= 32) |= val;
for (uint32_t* p = ...; ...)
*(uint64_t*) p = val2;
You can use similar techniques with greater effect, if you use SSE.