How to use different pointer arithmetic semantics - c++

For academic purposes, I'm trying to purposefully overwrite data allocated on the free store. Here's what I've got working so far:
//main
int* val = new int(-1);
print_bits(*val);
short* i = new(val) short(0);
print_bits(*val);
std::cout << "\nval loc: " << val <<
"\ni loc: " << i << std::endl;
delete val;
//end main
As expected, this produces something similar to:
11111111111111111111111111111111
00000000000000001111111111111111
val loc: 0x27d5580
i loc: 0x27d5580
My next intention was to override the second byte in val, so I simply change short allocation to:
short* i = new(val+1) short(0);
However, after making this change, I got output similar something like:
11111111111111111111111111111111
11111111111111111111111111111111
val loc: 0x27d5580
i loc: 0x27d5584
As you can see, the val+1 move the pointer a full sizeof(int) bytes forward rather than just one byte forward. I understand why this happens (and am thankful for it). However, if my intention was to move only a single byte forward in memory, how could I accomplish that?
EDIT:
One solution I've come up with is to do something like
char* ch = &val;
short i = new(ch+1) short(0);
I'm actually not entirely sure yet whether this will work, since char*'s have a habit of being interpreted as C-style strings.
Solution:
The simplest solution that has been given is to write
short* i = new(reinterpret_cast<char*>(val)+1) short(0);

Just write
short* i = new(( char * )val + sizeof( short )) short(0);
or
short* i = new(reinterpret_cast<char *>( val ) + sizeof( short )) short(0);

A short won't be a single byte, and storing via one at a single byte offset potentially both violates alignment requirements and potentially stores to space beyond the length of the original allocation (depending on the size of short vs int).
To "move" by single bytes, you want to use a char * type. Simply:
*(reinterpret_cast<char*>(val)+1) = 0;
... should work.

Related

Reinterpret casted value varies by compiler

For the same program:
const char* s = "abcd";
auto x1 = reinterpret_cast<const int64_t*>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << *x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
In gcc5(link): 139639660962401
In gcc8(link): 1684234849
Why does the value vary according to different compiler versions?
What is then a compiler safe way to move from const char* to int64_t and backward(just like in this problem - not for actual integer strings but one with other chars as well)?
Why does the value vary according to different compiler versions?
Behaviour is undefined.
What is then a compiler safe way to move from const char* to int64_t and backward
It is somewhat unclear what you mean by "move from const char* to int64_t". Based on the example, I assume you mean to create a mapping from a character sequence (of no greater length than fits) into a 64 bit integer in a way that can be converted back using another process - possibly compiled by another (version of) compiler.
First, create a int64_tobject, initialise to zero:
int64_t i = 0;
Get length of the string
auto len = strlen(s);
Check that it fits
assert(len < sizeof i);
Copy the bytes of the character sequence onto the integer
memcpy(&i, s, len);
(As long as the integer type doesn't have trap representations) The behaviour is well defined, and the generated integer will be the same across compiler versions as long as the CPU endianness (and negative number representation) remains the same.
Reading the character string back doesn't require copying because char is exceptionally allowed to alias all other types:
auto back = reinterpret_cast<char*>(&i);
Note the qualification in the last section. This method does not work if the integer is passed (across the network for example) to process running on another CPU. That can be achieved as well by bit shifting and masking so that you copy octets to certain position of significance using bit shifting and masking.
When you dereference the int64_t pointer, it is reading past the end of the memory allocated for the string you casted from. If you changed the length of the string to at least 8 bytes, the integer value would become stable.
const char* s = "abcdefg"; // plus null terminator
auto x1 = reinterpret_cast<const int64_t*>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << *x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
If you wanted to store the pointer in an integer instead, you should use intptr_t and leave out the * like:
const char* s = "abcd";
auto x1 = reinterpret_cast<intptr_t>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
Based on what RemyLebeau pointed out in the comments of your post,
unsigned 5_byte_mask = 0xFFFFFFFFFF; std::cout << *x1 & 5_byte_mask << std::endl;
Should be a reasonable way to get the same value on a little endian machine with whatever compiler. It may be UB by one specification or another, but from a compiler's perspective, you're dereferencing eight bytes at a valid address that you have initialized five bytes of, and masking off the remaining bytes that are uninitialized / junk data.

C++ converting string containing non human readable data to 200 double

I have a string whose length is 1600 and I know that it contains 200 double. When I print out the string I get the following :Y���Vz'#��y'#��!U�}'#�-...
I would like to convert this string to a vector containing the 200 doubles.
Here is the code I tried (blobString is a string 1600 characters long):
string first_eight = blobString.substr(0, sizeof(double)); // I get the first 8 values of the string which should represent the first double
double double_value1
memcpy(&double_value1, &first_eight, sizeof(double)); // First thing I tried
double* double_value2 = (double*)first_eight.c_str(); // Second thing I tried
cout << double_value1 << endl;
cout << double_value2 << endl;
This outputs the following:
6.95285e-310
0x7ffd9b93e320
--- Edit solution---
The second method works all I had to do was look to where double_value1 was pointing.
cout << *double_value2 << endl;
Here's an example that might get you closer to what you need. Bear in mind that unless the numbers in your blob are in the exact format that your particular C++ compiler expects, this isn't going to work like you expect. In my example I'm building up the buffer of doubles myself.
Let's start with our array of doubles.
double doubles[] = { 0.1, 5.0, 0.7, 8.6 };
Now I'll build an std::string that should look like your blob. Notice that I can't simply initialize a string with a (char *) that points to the base of my list of doubles, as it will stop when it hits the first zero byte!
std::string double_buf_str;
double_buf_str.append((char *)doubles, 4 * sizeof(double));
// A quick sanity check, should be 32
std::cout << "Length of double_buf_str "
<< double_buf_str.length()
<< std::endl;
Now I'll reinterpret the c_str() pointer as a (double *) and iterate through the four doubles.
for (auto i = 0; i < 4; i++) {
std::cout << ((double*)double_buf_str.c_str())[i] << std::endl;
}
Depending on your circumstances you might consider using a std::vector<uint8_t> for your blob, instead of an std::string. C++11 gives you a data() function that would be the equivalent of c_str() here. Turning your blob directly into a vector of doubles would give you something even easier to work with--but to get there you'd potentially have to get dirty with a resize followed by a memcpy directly into the internal array.
I'll give you an example for completeness. Note that this is of course not how you would normally initialize a vector of doubles...I'm imagining that my double_blob is just a pointer to a blob containing a known number of doubles in the correct format.
const int count = 200; // 200 doubles incoming
std::vector<double> double_vec;
double_vec.resize(count);
memcpy(double_vec.data(), double_blob, sizeof(double) * count);
for (double& d : double_vec) {
std::cout << d << std::endl;
}
#Mooning Duck brought up the great point that the result of c_str() is not necessarily aligned to an appropriate boundary--which is another good reason not to use std::string as a general purpose blob (or at least don't interpret the internals until they are copied somewhere that guarantees a valid alignment for the type you are interested in). The impact of trying to read a double from a non-aligned location in memory will vary depending on architecture, giving you a portability concern. In x86-based machines there will only be a performance impact AFAIK as it will read across alignment boundaries and assemble the double correctly (you can test this on a x86 machine by writing then reading back a double from successive locations in a buffer with an increasing 1-byte offset--it'll just work). In other architectures you'll get a fault.
The std::vector<double> solution will not suffer from this issue due to guarantees about the alignment of newed memory built into the standard.

Unknown behavior of uint64_t datatype

I am trying to store a very big number in a uint64_t like:
int main(int argc, char** argv) {
uint64_t ml = sizeof(void*)*(1<<63);
cout << "ml=" << ml << "\n";
const char* r;
const char* mR=r+ml;
return 0;
}
But I do not know why am I getting the output as 0, despite of storing it in a uint64_t datatype?
EDIT: char* mR is my memory buffer and I can increase my memory buffer to at most ml. I want to make use of 64GB RAM machine. So, can you suggest how much should I increment mR to..as I want to use all the available RAM. That is to what value should I set ml to?
Try
uint64_t ml = ((uint64_t)1)<<63;
or just
uint64_t ml = 0x8000000000000000;
Just 1 << 63 uses integers, and if the shift value is too big, it is undefined behavior. In you case, it may result in 0 due to overflow.
Please note that if you multiply 0x8000000000000000 by sizeof(void*), you'll likely get overflow too.
If you want to allocate 64G of memory, that would be:
char* buffer = new char[64ULL * 1024 * 1024 * 1024];
or simply:
char* buffer = new char[1ULL << 36];
Note that 64G is 2^36 bytes, which is far, far less than the 2^63 number that you're trying to use. Although, typically when you use that much memory, it's because your program organically uses it through various operations... not by just allocating it in one large chunk.
Just use:
uint64_t ml = sizeof(void*) * (1ULL << 63);
Because, as AlexD already said, 1 << 63 uses integers, and 1 << 63 is actually 0.
Even after you correct the shift to (uint64_t)1 << 63, if sizeof(void*) is any even number (and it assuredly is), then your product will be divisible by 2^64, and thus be zero when stored in a uint64_t.

4 chars to int in c++

I have to read 10 bytes from a file and the last 4 bytes are an unsigned integer. But I got a 11 char byte long char array / pointer. How do I convert the last 4 bytes (without the zero terminating character at the end) to an unsigned integer?
//pesudo code
char *p = readBytesFromFile();
unsigned int myInt = 0;
for( int i = 6; i < 10; i++ )
myInt += (int)p[i];
Is that correct? Doesn't seem correct to me.
The following code might work:
myInt = *(reinterpret_cast<unsigned int*>(p + 6));
iff:
There are no alignment problems (e.g. on a GPU memory space this is very likely to blow if some guarantees aren't provided).
You can guarantee that the system endianness is the same used to store the data
You can be sure that sizeof(int) == 4, this is not guaranteed everywhere
If not, as Dietmar suggested, you should loop over your data (forward or reverse according to the endianness) and do something like
myInt = myInt << 8 | static_cast<unsigned char>(p[i])
this is alignment-safe (it should be on every system). Still pay attention to points 1 and 3.
I agree with the previous answer but just wanna add that this solution may not work 100% if the file was created with a different endianness.
I do not want to confuse you with extra information but keep in mind that endianness may cause you problem when you cast directly from a file.
Here's a tutorial on endianness : http://www.codeproject.com/Articles/4804/Basic-concepts-on-Endianness
Try myInt = *(reinterpret_cast<unsigned int*>(p + 6));.
This takes the address of the 6th character, reinterprets as a pointer to an unsigned int, and then returns the (unsigned int) value it points to.
Maybe using an union is an option? I think this might work;
UPDATE: Yes, it works.
union intc32 {
char c[4];
int v;
};
int charsToInt(char a, char b, char c, char d) {
intc32 r = { { a, b, c, d } };
return r.v;
}

C/C++: Bitwise operators on dynamically allocated memory

In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.
I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.
Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}
Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.
I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.
Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.