Do memory addresses contain implicit hex digits? - gdb

What is the value of a memory address that is less than 12 hex digits on a 64-bit computer?
For instance, when I run gdb on a simple assembly program and run (gdb) info frame I get:
Stack level 0, frame at 0x7fffffffd970:
rip = 0x40052f in main (file.s:11); saved rip = 0x7ffff7a2d830
source language asm.
Arglist at 0x7fffffffd960, args:
Locals at 0x7fffffffd960, Previous frame's sp is 0x7fffffffd970
Saved registers:
rbp at 0x7fffffffd960, rip at 0x7fffffffd968
The first part of the second line rip = 0x40052f in main (file.s:11) I believe states the value of the instruction pointer when I called info frame. But why is the memory address it holds not 12 hex digits?
Also, if I type (gdb) x 0x7fffffffd968 (which I expect to be 0x7ffff7a2d830) I get:
0x7fffffffd968: 0xf7a2d830
Does this mean that any memory address with less than 12 hex digits contains an implicit 7ff...?

No. On x86 or x86_64, a memory address is simply a number, but is commonly displayed using hexadecimal. And like most number notation systems, a shorter number just means a much smaller value, or if you like, there are implicit zeros before it.
So just like the decimal string "12" is much smaller than "12654321", the address 0x40052f is much smaller than the address 0x7ffff7a2d830. The two addresses are almost certainly in different virtual memory maps. (On Linux, you can view virtual memory maps by cat /proc/{pid}/maps.)
When you used the gdb x command, you didn't see the value you expected because gdb took a guess at what kind of data your address points at. The first time you use x in a gdb session, it defaults to showing 4 bytes (32 bits) per element, as though the address points at an array of uint32_t. Since addresses on x86_64 are 8 bytes (64 bits), you need x/g to tell gdb the element size is 8 bytes.

Related

What can I know about data if I know where they are, e.g. 0xffffffff7fffd9d8 vs. 0x10019c1e0?

I am wondering if the numeric value of a pointer tells me something of use during debugging.
For example I have the following on my call stack:
std::basic_ostringstream<char, std, char_traits<char>, std::allocator, <char>void>::str(
0xffffffff7fffd9d8,
0x10019c1e0,
0x100446710,
0x0,
0xffffffff7fffd9d8,
0xffffffff7b331688),
at 0xffffffff7b1b28ec
There seem to be pointers of the form 0xfff and 0x100. Is there a meaning to this difference?
On 64-bit platforms, in theory you could address 264 or approximately 16 exabytes. Since most applications don’t need such a large address space, hardware vendors define smaller virtual address spaces to reduce the cost of address translation. Therefore, on AMD and Intel chips, only the least significant 48 bits of the address are significant, and bits 48 through 63 must be copies of bit 47. These are called canonical form addresses, and they span the following ranges:
0000000000000000 — 00007FFFFFFFFFFF
FFFF800000000000 — FFFFFFFFFFFFFFFF
The former are called canonical lower half addresses, and the latter canonical upper half. It’s the decision of the kernel, but typically upper-half addresses refer to the stack and static program data area, while lower-half addresses refer to heap memory.
Source: Wikipedia
There is a hint you should take with a grain of salt. On 64-bit x86 Linux stack addresses generally are in the upper range, while the heap addresses are in the lower range. So your 0xffff form addresses are probably from the stack, and 0x100 from the heap.

Why does my compiler use an 8-bit char when I'm running on a 64-bit machine?

I am using the Microsoft Visual Studio 2013 IDE. When I compile a program in C++ while using the header <climits>, I output the macro constant CHAR_BIT to the screen. It tells me there are 8-bits in my char data type (which is 1-byte in C++). However, Visual Studio is a 32-bit application and I am running it on a 64-bit machine (i.e. a machine whose processor has a 64-bit instruction set and operating system is 64-bit Windows 7).
I don't understand why my char data type uses only 8-bits. Shouldn't it be using at least 32-bits (since my IDE is a 32-bit application), let alone 64-bits (since I'm compiling on a 64-bit machine)?
I am told that the number of bits used in a memory address (1-byte) depends on the hardware and implementation. If that's the case, why does my memory address still only use 8-bits and not more?
I think you are confusing memory address bit-width with data value bit-width. Memory addresses (pointers) are 32 bits for 32-bit programs and 64 bits for 64-bit programs. But data types have different widths for their values depending on type (as governed by the standard). So a char is 8-bits, but a char* will be 32-bits if you are compiling as a 32-bit application (also note here it depends on how you compile the application and not what type of processor or OS you are running on).
Edit for questions:
However, what is the relationship between these two?
Memory addresses will always have the same bit width regardless of what data value is stored there.
For example, if I have a 32-bit address and I assign an 8-bit value to that address, does that mean there are 24-bits of unused address space?
Some code (assume 32-bit compilation):
char i_am_1_byte = 0x00; // an 8-bit data value that lives in memory
char* i_am_a_ptr = &i_am_1_byte; // pointer is 32-bits and points to an 8-bit data value
*i_am_a_ptr = 0xFF; // writes 0xFF to the location pointed to by the pointer
// that is, to i_am_1_byte
So we have i_am_1_byte which is a char and takes up 8 bits somewhere in memory. We can get this memory location using the address-of operator & and store it in the pointer variable i_am_a_ptr, which is your 32-bit address. We can write 8 bits of data to the location pointed to be i_am_a_ptr by dereferencing it.
If not, what is the bit-width for memory address actually used for
All the data that your program uses must be located somewhere in memory and each location has an address. Most programs probably will not use most of the memory available for them to use, but we need a way to address every possible location.
how can having more memory address bit-width be helpful?
That depends on how much data you need to work with. A 32-bit program, at most, can address 4GB of memory space (and this may be smaller depending on your OS). That used to be a very, very large amount of memory, but these days it is conceivable a program could run out. It is also a lot easier for the CPU to address more the 4GB of RAM if it is 64-bit (this gets into the difference between physical memory and virtual memory). Of course, 64-bit architecture means a lot more than just bigger addresses and brings many benefits that may be more useful to programs than the bigger memory space.
An interesting fact is that on some processors, such as 32-bit ARM, mostly of their instructions are word aligned. That is, compilers tend to allocate 32-bits (4 bytes) to any data type, even though the data type used needs less than 4 bytes unless otherwise stated in the source code. This happens because ARM architectures are optimized to memory access using word alignment.

Cannot Set 4 Byte Hardware Breakpoint Windbg

I cannot set 4 byte read / write access hardware breakpoint using windbg.
0:000> dd 02e80dcf
02e80dcf 13121110 17161514 1a191800 1e1d1c1b
02e80ddf 011c171f c7be7df1 00000066 4e454900
Actually I have to check when the value 0x13121110 (at address 0x02e80dcf)is getting changed/overwritten by the program.
So When I'm trying to set a 4 byte write access hardware breakpoint # 0x02e80dcf, I'm getting Data breakpoint must be aligned Error.
0:000> ba w 4 02e80dcf
Data breakpoint must be aligned
^ Syntax error in 'ba w 4 02e80dcf'
0:000> ba r 4 02e80dcf
Data breakpoint must be aligned
^ Syntax error in 'ba r 4 02e80dcf'
0:000> ba w 1 02e80dcf
breakpoint 0 redefined
I'm able to set 1 byte write access breakpoint at the address, But it not getting triggered when the pointer # address 0x02e80dcf is getting overwritten.
And also if anyone could suggest any other way to detect the address overwritten thing would be really helpful.
Note : The problem I'm facing for a particular program. I'm able to set 4 byte hardware break point in the same debugging environment.
As a side note, this particular behavior is from the CPU architecture itself (not from the system or the debugger).
x86 and x86-64 (IA32 and IA32-e in Intel lingo) architecture use Drx (Debug Registers) to handle hardware breakpoints.
Dr7 LENn field will set the length of a breakpoint and Dr0 to Dr3 will hold the breakpoint addresses.
from Intel Manual 3B - Chapter 18.2.5. "Breakpoint Field Recognition":
The LENn fields permit specification of a 1-, 2-, 4-, or 8-byte range,
beginning at the linear address specified in the corresponding debug
register (DRn).
In the same chapter it is explicitly stated:
Two-byte ranges must be aligned on word boundaries; 4-byte ranges must
be aligned on doubleword boundaries.
If you cover the desired address with a data breakpoint with a big enough length, then it will trap (breakpoint will be hit):
A data breakpoint for reading or writing data is triggered if any of
the bytes participating in an access is within the range defined by a
breakpoint address register and its LENn field.
The manual then goes on giving a tip to trap on unaligned address and gives an example table:
A data breakpoint for an unaligned operand can be constructed using
two breakpoints, where each breakpoint is byte-aligned and the two
breakpoints together cover the operand.
Addresses must be aligned on a 4-byte boundary (or larger for 64-bit systems).
Any hex address ending in 0xf is not aligned to a 4-byte boundary.
There may be a restriction by WinDbg that data breakpoints are aligned to 4 or 8 byte boundaries. You many need to use conditional break so that only the one byte is checked.

does the size of char * (character pointer) in C/C++ vary? - use for database column fixed size

per the following code, I get the size of a character pointer is 8 bytes. Yet this site has a size of 1 byte for the char pointer.
#include <stdio.h>
int main(void ){
char *a = "saher asd asd asldasdas;daksd ahwal";
printf(" nSize = %d \n", sizeof(a));
return 0;
}
Is this always the case? I am writing a connector for a simple database I am implementing and want to read TEXT field of mysql into my database. Since TEXT has variable size, I was wondering if my column Type/metadata can have a fixed size of 8 bytes where I store the pointer in memory to the string (char *)?
per the following code, I get the size of a character pointer is 8 bytes. Yet this site has a size of 1 byte for the char pointer.
It's implementation-defined. It's usually 8 on a 64-bit Intel system and 4 on a 32-bit Intel system. Don't rely on it being any particular size.
I am writing a connector for a simple database I am implementing and want to read TEXT field of mysql into my database. Since TEXT has variable size, I was wondering if my column can have a fixed size of 8 bytes where I store the pointer in memory to the string (char *)?
It makes no sense at all to store pointers into memory in a database. A database is for persistent data. On the other hand, data stored in memory is liable to disappear whenever a process exits (or the system is restarted).
No, it is not. Size of a pointer depends on CPU architecture. Some architecture even have different sizes depending on "type" of the pointer. On x86_64, pointers are 48 bits wide. 64 bits are used because individual bits are not addressable. One could, however, use pointer packing to serialize/deserialize pointers into 48-bit chunks.
A variable can be different sizes based on the computer that you are using. This is causing the discrepancy between your results and the results you see online.
However, the variable will always be the same size on the same machine.
The size of any pointer in one platform is the same.. regardless of the data type char, string, object, etc.
In PC with 64 operating system (and also the compiler support 64 bit), the size of pointer is 8 byte (64 bit address space)..
Another platform may have 4 byte, 2 byte, or 1 byte (like an 8 bit micro controller)..

Shared Memory Interface between Windows 64 bits and 32 bits

I need to write code in Windows 7 (64 bits) that executes a 32-bits program that has a Shared Memory Interface (SMI). More precisely, the program I am coding writes into the SMI and the 32-bits program reads from this SMI.
The first problem that I have is that I don't have access to the source code of the 32-bit program, problem that can't be solved. The second problem is that the SMI stores the address of the information that is written. This pointed is stored as a based pointer using the following code:
gpSharedBlock->m_pData[uiDataPointer] = (char __based(gpSharedBlock)*)pData;
Were pData is a pointer to the data we are writing, and gpSharedBlock->m_pData[i] points to the i^th element stored.
Probably from here you have already noticed the problem; a pointer in W32 is 4 bytes while a pointer in W64 is 8 bytes. Then, since the value stored is a 64 bit pointer, the value finally read by the 32-bits program is not the desired one.
My question is: is there a way to do a translation of the 64-bit address to a 32-bit address such that the program that is running reads the correct information?
I have read about WOW64, and I suppose that the W32 program is running under it, but I don't know how to take advantage of that. Any ideas?
A __based pointer is a numeric offset from another pointer. It is effectively a virtual pointer interpretted at runtime.
A pointer is 8 bytes in 64-bit, so to be compatible with the 32-bit program, you will have to declare the pointer members of the SharedBlock type in your 64-bit code to use 4-bit integers instead of pointers, eg:
struct sSharedBlock
{
int32_t m_pData[...];
};
pData is __based on gpSharedBlock, so the value of pData is a relative offset from the value of gpSharedBlock. Use that fact to determine the actual byte offset of your data block relative to the gpSharedBlock memory block, and then store that offset value into m_pData[] as an integer. That is what the SMI memory block is actually expecting anyway - an offset, not a real pointer. The __based keyword is just a fancy way of handling offsets using pointers without doing the offset calculations manually in code.
The original code is effectively the same as the following without needing the __based keyword:
gpSharedBlock->m_pData[uiDataPointer] = (int32_t) ( ((char*)pData) - ((char*)gpSharedBlock) );