C++ How to directly access memory - c++

Say I have manually allocated a large portion of memory in C++, say 10 MB.
Say for the heck of it I want to store a few bits around the middle of this region.
How would I get at the memory at that location?
The only way I know of accessing raw memory is using array notation.

And array notation works well for that, as the allocated memory can be seen as a large array.
// Set the byte in the middle to `123`
((char *) memory_ptr)[5 * 1024 * 1024] = 123;
I typecast to a char pointer in case the pointer is of another type. If it's already a char pointer then the typecast isn't needed.
If you only want to set a single bit, see the memory as a giant bit field with 80 million separate bits. To find the bit you want, say bit number 40000000, you must first find the byte it's in and then the bit. This is done with normal division (to find the char) and modulo (to find the bit):
int wanted_bit = 40000000;
int char_index = wanted_bit / 8; // 8 bits to a byte
int bit_number = wanted_bit % 8;
((char *) memory_ptr)[char_index] |= 1 << bit_number; // Set the bit

Array notation is just another way of writing pointers. You can use that, or use pointers directly like so:
char *the_memory_block = // your allocated block.
char b = *(the_memory_block + 10); // get the 11th byte, *-operator is a dereference.
*(the_memory_block + 20) = b; // set the 21st byte to b, same operator.
memcpy, memzero, memmove, memcmp and others may also be very useful, like this:
char *the_memory_block = // your allocated block.
memcpy(the_memory_block + 20, the_memory_block + 10, 1);
Of course this code is also the same:
char *the_memory_block = // your allocated block.
char b = the_memory_block[10];
the_memory_block[20] = b;
And so is this:
char *the_memory_block = // your allocated block.
memcpy(&the_memory_block[20], &the_memory_block[10], 1);
Also, one is not safer then the other, they are completely equivalent.

I think the array notation would be your answer... You can use the bitshift operators << and >> with AND and OR bitmasks to access specific bits.

You can use array notation, or you can use pointer arithmetic:
char* buffer = new char[1024 * 1024 * 10];
// copy 3 bytes to the middle of the memory region using pointer arithmetic
//
std::memcpy(buffer + (1024 * 1024 * 5), "XXX", 3);

C/C++, arrays are treated as pointers to their first elements.
So, an array name is nothing but an alias to its first element:
*pName is equivalent pName[0]
And then:
*(pName+1) == pName[1];
*(pName+2) == pName[2];
And so on. Parenthesis are used to avoid precedence issues. Never forget using them.
After compilation, both ways will behave the same.
I do prefer brackets notation for readability.

Related

Can we allocate any number of byte we want in c++ ? (allocate 6 bytes for example for a type)

I'm trying to make a program for chess and in most cases I would need only 4, 6 or 8(here I can use char) bytes.
So can I create a type that use 4 bytes, or an array with 4 bytes per cases ? It would lead to an significant gain in memory (and in efficiency ?).
Thanks all.
If you like, you can allocate a big buffer and manage the memory yourself.
If you do so, make sure not to use structs, or if you do, learn about "byte padding" or "byte alignment", since it will make each struct occupy more space than it "needs" to.
Once you have raw allocated memory (with malloc, std::vector or using an array, for example), you need to "tightly store" your data.
Here's some code:
char array[5000] = { 0 }; // or char* array = malloc(5000);
// Let's say you want to store several "char, int" structs
const int structSize = sizeof(char) + sizeof(int);
*(char*)&array[0 * structSize] = 'a';
*(int*)&array[0 * structSize + sizeof(char)] = 1;
*(char*)&array[1 * structSize] = 'b';
*(int*)&array[1 * structSize + sizeof(char)] = 2;
*(char*)&array[2 * structSize] = 'c';
*(int*)&array[2 * structSize + sizeof(char)] = 3;
WARNING: In the code above, if you are storing structs/classes rather than raw types (char, int...), make sure to use "placement new" on the memory location.
If, instead, you use a "struct { char, int }", its size will be 8 bytes (on most computers, not all), 3 of which are padding (unused memory).
Please note that, while you gain in memory usage, you lose in cpu efficiency (that is what the point of padding is).
Alternatively, see this article for alternatives that tell the compiler not to do the padding (using "#pragma pack").
#pragma pack(1)
struct nopadding
{
char first;
int second;
};
sizeof(nopadding); // <- is now equal to 5 instead of 8

C++ adding 4 bytes to pointer address

I have a question about pointers, and memory addresses:
Supposing I have the following code:
int * array = (int *) malloc(sizeof(int) * 4);
Now in array im storing a memory address, I know that c++ takes already care when adding +1 to this pointer it will add 4 bytes, but what If I want to add manually 4 bytes?
array + 0x004
If im correct this will lead to add 4*4 (16) bytes, but my Idea is to add manually those 4 bytes.
Why? Just playing around, i've tried this and I got a totally different result from what I expected, then i've researched and i've seen that c++ takes already care when you add +1 to a pointer (it sums 4 bytes in this case).
Any idea?
For a pointer p to a type T with value v, the expression p+n will (on most systems anyway) result in a pointer to the address v+n*sizeof(T). To get a fixed-byte offset to the pointer, you can first cast it to a character pointer, like this:
reinterpret_cast<T*>(reinterpret_cast<char*>(p) + n)
In c++, sizeof(char) is defined to be equal to 1.
Do note that accessing improperly aligned values can have large performance penalties.
Another thing to note is that, in general, casting pointers to different types is not allowed (called the strict aliasing rule), but an exception is explicitly made for casting any pointer type to char* and back.
The trick is convert the type of array into any pointer-type with a size of 1 Byte, or store the pointer value in an integer.
#include <stdint.h>
int* increment_1(int* ptr) {
//C-Style
return (int*)(((char*)ptr) + 4);
}
int* increment_2(int* ptr) {
//C++-Style
char* result = reinterpret_cast<char*>(ptr);
result += 4;
return reinterpret_cast<int*>(result);
}
int* increment_3(int* ptr) {
//Store in integer
intptr_t result = reinterpret_cast<intptr_t>(ptr);
result += 4;
return reinterpret_cast<int*>(result);
}
Consider that if you add an arbitrary number of bytes to an address of an object of type T, it no longer makes sense to use a pointer of type T, since there might not be an object of type T at the incremented memory address.
If you want to access a particular byte of an object, you can do so using a pointer to a char, unsigned char or std::byte. Such objects are the size of a byte, so incrementing behaves just as you would like. Furthermore, while rules of C++ disallow accessing objects using incompatible pointers, these three types are excempt of that rule and are allowed to access objects of any type.
So, given
int * array = ....
You can access the byte at index 4 like this:
auto ptr = reinterpret_cast<unsigned char*>(array);
auto byte_at_index_4 = ptr + 4;
array + 0x004
If im correct this will lead to add 4*4 (16) bytes
Assuming sizeof(int) happens to be 4, then yes. But size of int is not guaranteed to be 4.

C++ Vector data access

I've got an array of bytes, declared like so:
typedef unsigned char byte;
vector<byte> myBytes = {255, 0 , 76 ...} //individual bytes no larger in value than 255
The problem I have is I need to access the raw data of the vector (without any copying of course), but I need to assign an arbitrary amount of bits to any given pointer to an element.
In other words, I need to assign, say an unsigned int to a certain position in the vector.
So given the example above, I am looking to do something like below:
myBytes[0] = static_cast<unsigned int>(76535); //assign n-bit (here 32-bit) value to any index in the vector
So that the vector data would now look like:
{2, 247, 42, 1} //raw representation of a 32-bit int (76535)
Is this possible? I kind of need to use a vector and am just wondering whether the raw data can be accessed in this way, or does how the vector stores raw data make this impossible or worse - unsafe?
Thanks in advance!
EDIT
I didn't want to add complication, but I'm constructing variously sized integer as follows:
//**N_TYPES
u16& VMTypes::u8sto16(u8& first, u8& last) {
return *new u16((first << 8) | last & 0xffff);
}
u8* VMTypes::u16to8s(u16& orig) {
u8 first = (u8)orig;
u8 last = (u8)(orig >> 8);
return new u8[2]{ first, last };
}
What's terrible about this, is I'm not sure of the endianness of the numbers generated. But I know that I am constructing and destructing them the same everywhere (I'm writing a stack machine), so if I'm not mistaken, endianness is not effected with what I'm trying to do.
EDIT 2
I am constructing ints in the following horrible way:
u32 a = 76535;
u16* b = VMTypes::u32to16s(a);
u8 aa[4] = { VMTypes::u16to8s(b[0])[0], VMTypes::u16to8s(b[0])[1], VMTypes::u16to8s(b[1])[0], VMTypes::u16to8s(b[1])[1] };
Could this then work?:
memcpy(&_stack[0], aa, sizeof(u32));
Yes, it is possible. You take the starting address by &myVector[n] and memcpy your int to that location. Make sure that you stay in the bounds of your vector.
The other way around works too. Take the location and memcpy out of it to your int.
As suggested: by using memcpy you will copy the byte representation of your integer into the vector. That byte representation or byte order may be different from your expectation. Keywords are big and little endian.
As knivil says, memcpy will work if you know the endianess of your system. However, if you want to be safe, you can do this with bitwise arithmetic:
unsigned int myInt = 76535;
const int ratio = sizeof(int) / sizeof(byte);
for(int b = 0; b < ratio; b++)
{
myBytes[b] = byte(myInt >> (8*sizeof(byte)*(ratio - b)));
}
The int can be read out of the vector using a similar pattern, if you want me to show you how let me know.

Performing arithmetic on a void pointer in C++

I am trying to perform arithmetic on an address. I need to make sure that it is on a 4 byte word boundary. This requires me to get an unsigned integer representation of the pointer to perform some math on. I have done this:
//memStartAddr is of type void* (from an external API function)
data32 tempAddr = reinterpret_cast<data32>(memStartAddr);
//adjust pointer to the next word boundary if needed
tempAddr = wordAlign(tempAddr);
//memStartAddress is a class variable of type unsigned char*
memStartAddress = reinterpret_cast<data8*>(tempAddr);
//calculate bytes lost due to above adjustment
data32 delta = (tempAddr - reinterpret_cast<data32>(memStartAddress));
//Take advantage of integer arithmetic to ensure usable size is a multiple
//of 4 bytes. Remainders are lost. Shrinks usable size if necessary.
memSize = ((size - delta) / 4) * 4;
This all works in my tests, however, using reinterpret_cast is considered a forbidden practice. Is there another way to do this? Or is an exception to the rule warranted here?
You're looking for memStartAddress = std::align(4, objSize, memStartAddr, objSize+3);. The standard function is a bit more flexible than you need, so you need to tell it that you want at most a +3 change in address.
a tricky approach might be to have pointer to initial address
then on the pointer perform + 1 --> this will automatically result in the next address according to the size in bytes of the variable type it's pointing to.
Note: make sure you perform the +1 on the pointer not on the address (int* initialAddress --> perform the operation on initialAddress not on *initialAddress)
then you can perform a subtraction between the two and you'll have the number of bytes.
I suggest you avoid division.
Try this:
unsigned int pointer_value = (unsigned int) pointer;
if ((pointer_value & 0x3U) == 0U)
{
// Pointer is on 4 byte boundary.
}
Some compilers may only optimize division by 4 and multiplication by 4 when the optimization levels are set on high. The embedded compiler I'm using will call the division function when dividing by 4 rather than right shifting, unless the optimization setting is cranked up to high.

Can we have operations between int* and unsigned int?

If I declare
int x = 5 ;
int* p = &x;
unsigned int y = 10 ;
cout << p+y ;
Is this a valid thing to do in C++, and if not, why?
It has no practical use, but is it possible?
The math is valid; the resulting pointer isn't.
When you say ptr + i (where ptr is an int*), that evaluates to the address of an int that's i * sizeof(int) bytes past ptr. In this case, since your pointer points to a single int rather than an array of them, you have no idea (and C++ doesn't say) what's at p+10.
If, however, you had something like
int ii[20] = { 0 };
int *p = ii;
unsigned int y = 10;
cout << p + y;
Then you'd have a pointer you could actually use, because it still points to some location within the array it originally pointed into.
What you are doing in your code snippet is not converting unsigned int to pointer. Instead you are incrementing a pointer by an integer offset, which is a perfectly valid thing to do. When you access the index of an array, you basically take the pointer to the first element and increase it by the integer index value. The result of this operation is another pointer.
If p is a pointer/array, the following two lines are equivalent and valid (supposing the pointed-to-array is large enough)
p[5] = 1;
*(p + 5) = 1;
To convert unsigned int to pointer, you must use a cast
unsigned int i = 5;
char *p = reinterpret_cast<char *>(i);
However this is dangerous. How do you know 5 is a valid address?
A pointer is represented in memory as an unsigned integer type, the address. You CAN store a pointer in an integer. However you must be careful that the integer data type is large enough to hold all the bits in a pointer. If unsigned int is 32-bits and pointers are 64-bits, some of the address information will be lost.
C++11 introduces a new type uintptr_t which is guaranteed to be big enough to hold a pointer. Thus it is safe to cast a pointer to uintptr_t and back again.
It is very rare (should be never in run-of-the-mill programming) that you need to store pointers in integers.
However, modifying pointers by integer offsets is totally valid and common.
Is this a valid thing to do in c++, and if not why?
Yes. cout << p+y; is valid as you can see trying to compile it. Actually p+y is so valid that *(p+y) can be translated to p[y] which is used in C-style arrays (not that I'm suggesting its use in C++).
Valid doesn't mean it actually make sense or that the resulting pointer is valid. Since p points to an int the resulting pointer will be an offset of sizeof(int) * 10 from the location of x. And you are not certain about what's in there.
A variable of type int is a variable capable of containing an integer value. A variable of type int* is a pointer to a variable copable of containing an integer value.
Every pointer type has the same size and contains the same stuff: A memory address, which the size is 4 bytes for 32-bit arquitectures and 8 bytes for 64-bit arquitectures. What distinguish them is the type of the variable they are poiting to.
Pointers are useful to address buffers and structures allocated dynamically at run time or any sort of variable that is to be used but is stored somewhere else and you have to tell where.
Arithmetic operations with pointers are possible, but they won't do what you think. For instance, summing + 1 to a pointer of type int will increase its value by sizeof(int), not by literally 1, because its a pointer, and the logic here is that you want the next object of this array.
For instance:
int a[] = { 10, 20, 30, 40 };
int *b = a;
printf("%d\n", *b);
b = b + 1;
printf("%d\n", *b);
It will output:
10
20
Because b is pointing to the integer value 10, and when you sum 1 to it, or any variable containing an integer, its then poiting to the next value, 20.
If you want to perform operations with the variable stored at b, you can use:
*b = *b + 3;
Now b is the same pointer, the address has not changed. But the array 10, 20, 30, 40 now contains the values 13, 20, 30, 40, because you increased the element b was poiting to by 3.