Storing address of buffer in an unsigned integer; - c++

I have a memory buffer whose address i want to store in an unsigned integer value.
uint8_t* _buff = new uint8_t[1024];
uint64_t* _base_addr = (uint64_t *)_buff;
I want the address of the location pointed by _buff or _base_addr (anyhow it is the same location) to be stored in say uint32_t value.
So that when i read the value of integer it gives me the address.
How can this be done ?

You cannot store an address in "say" uint32_t variable, as the address might not fit - on 64-bit systems the pointers require 64 bits of storage. Instead of fixing the size, use the uintptr_t of C99 (<stdint.h>) or C++11 (<cstdint>).
To store the address in such an integer variable, use
uintptr_t variable = (uintptr_t)pointer;

Just cast it: uint32_t addr = (uint32_t)_buff;
You should have a very good reason for doing it though, for 2 reasons:
It's not portable and might be even wrong - size of an address differs between different systems, normally (but not limited to) it will be either 32 or 64 bit.
It harms the readability of your code. Pointers exist for precisely this reason - to store (and manipulate) addresses.
You might want to store an address in an integer when you need to manipulate HW devices mapped into memory. In this case you need to have a very good idea of what you're doing.

Related

Accessing memory mapped register

Assume there is a memory mapped device at address 0x1ffff670. The device register has only 8-bits. I need to get the value in that register and increment by one and write back.
Following is my approach to do that,
In the memory I think this is how the scenario looks like.
void increment_reg(){
int c;//to save the address read from memory
char *control_register_ptr= (char*) 0x1ffff670;//memory mapped address. using char because it is 8 bits
c=(int) *control_register_ptr;// reading the register and save that to c as an integer
c++;//increment by one
*control_register_ptr=c;//write the new bit pattern to the control register
}
Is this approach correct? Many thanks.
Your approach is almost correct. The only missing part - as pointed out in the comments on the question - is adding a volatile to the pointer type like so:
volatile unsigned char * control_register_ptr = ...
I would also make it unsigned char, since that is usually a better fit, but that's basically not that much different (the only meaningful difference would be when shifting the value down.)
The volatile keyword signals to the compiler the fact that the value at that address might change from outside the program (i.e. by code that the compiler doesn't see and know about.) This will make the compiler more conservative in optimizing loads and stores away, for example.

does the size of char * (character pointer) in C/C++ vary? - use for database column fixed size

per the following code, I get the size of a character pointer is 8 bytes. Yet this site has a size of 1 byte for the char pointer.
#include <stdio.h>
int main(void ){
char *a = "saher asd asd asldasdas;daksd ahwal";
printf(" nSize = %d \n", sizeof(a));
return 0;
}
Is this always the case? I am writing a connector for a simple database I am implementing and want to read TEXT field of mysql into my database. Since TEXT has variable size, I was wondering if my column Type/metadata can have a fixed size of 8 bytes where I store the pointer in memory to the string (char *)?
per the following code, I get the size of a character pointer is 8 bytes. Yet this site has a size of 1 byte for the char pointer.
It's implementation-defined. It's usually 8 on a 64-bit Intel system and 4 on a 32-bit Intel system. Don't rely on it being any particular size.
I am writing a connector for a simple database I am implementing and want to read TEXT field of mysql into my database. Since TEXT has variable size, I was wondering if my column can have a fixed size of 8 bytes where I store the pointer in memory to the string (char *)?
It makes no sense at all to store pointers into memory in a database. A database is for persistent data. On the other hand, data stored in memory is liable to disappear whenever a process exits (or the system is restarted).
No, it is not. Size of a pointer depends on CPU architecture. Some architecture even have different sizes depending on "type" of the pointer. On x86_64, pointers are 48 bits wide. 64 bits are used because individual bits are not addressable. One could, however, use pointer packing to serialize/deserialize pointers into 48-bit chunks.
A variable can be different sizes based on the computer that you are using. This is causing the discrepancy between your results and the results you see online.
However, the variable will always be the same size on the same machine.
The size of any pointer in one platform is the same.. regardless of the data type char, string, object, etc.
In PC with 64 operating system (and also the compiler support 64 bit), the size of pointer is 8 byte (64 bit address space)..
Another platform may have 4 byte, 2 byte, or 1 byte (like an 8 bit micro controller)..

Storing hexadecimal addresses in a file

I have a pintool application which store the memory address accessed by an application in a file. These addresses are in hexadecimal form. If I write these addresses in form of string, it will take a huge amount of storage(nearly 300GB). Writing such a large file will also take large amount of time. So I think of an alternate way to reduce the amount of storage used.
Each character of hexadecimal address represent 4 bits and each ASCII character is of 8 bits. So I am thinking of representing two hexadecimal characters by one ASCII character.
For example :
if my hexadecimal address is 0x26234B
then corresponding converted ASCII address will be &#K (0x is ignored as I know all address will be hexadecimal).
I want to know that is there any other much more efficient method for doing this which takes less amount of storage.
NOTE : I am working in c++
This is a good start. If you really want to go further, you can consider compressing the data using something like a zip library or Huffman encoding.
Assuming your addresses are 64-bit pointers, and that such a representation is sensible for your platform, you can just store them as 64-bit ints. For example, you list 0x1234567890abcdef, which could be stored as the four bytes:
12 34 56 78 90 ab cd ef
(your pointer, stored in 8 bytes.)
or the same, but backwards, depending on what endianness you choose. Specifically, you should read this.
We can even do this somewhat platform-independently: uintptr_t is unsigned integer type the same width as a pointer (assuming one exists, which it usually does, but it's not a sure thing), and sizeof(our_pointer), which gives us the size in bytes of a pointer. We can arrive at the above bytes with:
Convert the pointer to an integer representation (i.e., 0x0026234b)
Shift the bytes around to pick out the one we want.
Stick it somewhere.
In code:
unsigned char buffer[sizeof(YourPointerType)];
for(unsigned int i = 0; i < sizeof(YourPointerType); ++i) {
buffer[i] = (
(reinterpret_cast<uintptr_t>(your_pointer) >> (sizeof(YourPointerType) - i - 1))
& 0xff
);
}
Some notes:
That'll do a >> 0 on the last loop iteration. I suspect that might be undefined behavior, and you'll need an if-case to handle it.
This will write out pointers of the size of your platform, and requires that they can be converted sensibly to integers. (I think uintptr_t won't exist if this isn't the case.) It won't do the same thing on 64- as it will on 32-bit platforms, as they have different pointer sizes. (Or any other pointer-sized platform you run across.)
A program's pointers aren't valid once the program dies, and might not even remain valid when the program is still running. (If the pointer points to memory that the program decides to free, then the pointer is invalid.)
There's likely a library that'll do this for you. (struct, in Python, does this.)
The above is a big-endian encoder. Alternatively, you can write out little endian — the Wikipedia article details the difference.
Last, you can just cast a pointer to the pointer to a unsigned char *, and write that. (I.e., dump the actual memory of the pointer to a file.) That's way more platform dependent though.
If you need even more space, I'd run it through gzip.

Shared Memory Interface between Windows 64 bits and 32 bits

I need to write code in Windows 7 (64 bits) that executes a 32-bits program that has a Shared Memory Interface (SMI). More precisely, the program I am coding writes into the SMI and the 32-bits program reads from this SMI.
The first problem that I have is that I don't have access to the source code of the 32-bit program, problem that can't be solved. The second problem is that the SMI stores the address of the information that is written. This pointed is stored as a based pointer using the following code:
gpSharedBlock->m_pData[uiDataPointer] = (char __based(gpSharedBlock)*)pData;
Were pData is a pointer to the data we are writing, and gpSharedBlock->m_pData[i] points to the i^th element stored.
Probably from here you have already noticed the problem; a pointer in W32 is 4 bytes while a pointer in W64 is 8 bytes. Then, since the value stored is a 64 bit pointer, the value finally read by the 32-bits program is not the desired one.
My question is: is there a way to do a translation of the 64-bit address to a 32-bit address such that the program that is running reads the correct information?
I have read about WOW64, and I suppose that the W32 program is running under it, but I don't know how to take advantage of that. Any ideas?
A __based pointer is a numeric offset from another pointer. It is effectively a virtual pointer interpretted at runtime.
A pointer is 8 bytes in 64-bit, so to be compatible with the 32-bit program, you will have to declare the pointer members of the SharedBlock type in your 64-bit code to use 4-bit integers instead of pointers, eg:
struct sSharedBlock
{
int32_t m_pData[...];
};
pData is __based on gpSharedBlock, so the value of pData is a relative offset from the value of gpSharedBlock. Use that fact to determine the actual byte offset of your data block relative to the gpSharedBlock memory block, and then store that offset value into m_pData[] as an integer. That is what the SMI memory block is actually expecting anyway - an offset, not a real pointer. The __based keyword is just a fancy way of handling offsets using pointers without doing the offset calculations manually in code.
The original code is effectively the same as the following without needing the __based keyword:
gpSharedBlock->m_pData[uiDataPointer] = (int32_t) ( ((char*)pData) - ((char*)gpSharedBlock) );

Why the size of a pointer is 4bytes in C++

On the 32-bit machine, why the size of a pointer is 32-bit? Why not 16-bit or 64-bit? What's the cons and pros?
Because it mimics the size of the actual "pointers" in assembler. On a machine with a 64 bit address bus, it will be 64 bits. In the old 6502, it was an 8 bit machine, but it had 16 bit address bus so that it could address 64K of memory. On most 32 bit machines, 32 bits were enough to address all the memory, so that's what the pointer size was in C++. I know that some of the early M68000 series chips only had a 24 bit memory address space, but it was addressed from a 32 bit register so even on those the pointer would be 32 bits.
In the bad old days of the 80286, it was worse - there was a 16 bit address register, and a 16 bit segment register. Some C++ compilers didn't hide that from you, and made you declare your pointers as near or far depending on whether you wanted to change the segment register. Mercifully, I've recycled most of those brain cells, so I forget if near pointers were 16 bits - but at the machine level, they would be.
The size of a pointer in C++ is implementation-defined. C++ might run on anything from your toaster's chip up to huge mainframes. Different architectures require different sizes of the data types.
If on your implementation a pointer is 32bit, then that's very likely an architecture which can address 2^32 bytes. (Note that even the size of bytes might be different depending on the implementation.) 64bit architectures generally can address 2^64 bytes, so implementations on these architectures will likely have a pointer size of 64bit.
16 bit would obviously be insufficient - you could only address 64K then.
Why not emulate 64 bit on 32 bit systems - I guess because the performance of pointer arithmetic would degrade.
As mentioned in many other answers, the size of a pointer need not be 32-bits - the implementation will set the size of a pointer to be whatever the architecture of the platform dictates. On a system with 64-bit addressing, the size of a pointer will generally be 64-bits.
However, you should also note that even on a single implementation, different types of pointers might have different sizes. In particular, pointer-to-member types (which I'll grant are odd-ball pointers) may have different sizes than plain-old pointers to objects.
The same is true about pointers to plain old functions - they might have a different size than pointers to objects (this applies to C as well as C++). However on modern desktop systems you'll usually find that pointers to functions are the same size as pointers to objects.
Here's a short example of fun with pointer-to-member-functions:
#include <stdio.h>
class A {};
class B {};
class VirtD: public virtual A, public virtual B {
public:
virtual int Dfunc() { return 5; };
};
typedef int (VirtD::* Derived_mfp)();
int main()
{
VirtD virtd;
Derived_mfp mfp = &VirtD::Dfunc;
printf( "sizeof( mfp) == %u\n", (unsigned int) sizeof( mfp));
}
Displays: sizeof( mfp) == 12 on MSVC.
The size of the pointer has little to do with the architecture(32bit, 64bit). 32bit usually refers to the fact that the register size is 32bit. As a result, the maximum possible number of address that you can address using one register is 2^32. So, it boils down to efficiency of addressing the memory slots using a register.
With a 32-bit pointer you can point to a wider range of memory than with 16-bit pointers. When 32-bit pointers were standardized, 64-bit CPUs were not very popular (or even existent?). Therefor a pointer would not be able to fit inside the CPU register, which is a very important factor for speed.
Why not 16-bit? Because, presuming a flat 32-bit address space, you cannot address every byte. Far from it: you can only address 216 unique locations with a 16-bit pointer. Even if your pointers only point to dwords and not bytes, this still leaves 1073676288 dwords unaddressable.
Assuming a flat 32-bit address space, you can already address every single byte with a 32-bit pointer. At this point, 64-bit pointers are just wasting space, unless you want to add additional information to each pointer. For example, on 32-bit PowerPC, a function descriptor is actually a 96-bit entity, with one third pointing to the executable code and the rest being data that helps make relocating modules easier.
In a segmented address space, having larger-than-32-bit pointers to data could be useful. Windows NT on the DEC Alpha was a 32-bit operating system, but the Alpha hardware was 64-bit capable. Your ordinary address space was still 32-bit, but there were special APIs to allow 32-bit programs to access 64-bit addresses, as if they were in otherwise-inaccessible segments.
To answer your question: C++ itself says very little about the size of a pointer, and certainly not that it has to be 32 bits or anything. The size of a pointer should be the natural one for the machine architecture.