I have this PE loader that works really fine with 32bit PEs but with x64 ones it can't allocate the 0X0000000140000000 preferred base address since it only reads it as 0X40000000.
Here is how I allocate that address:
DWORD hdr_image_base = p_NT_HDR->OptionalHeader.ImageBase;
char* ImageBase = (char*)VirtualAlloc((void*)hdr_image_base, size_of_image, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
but when I hardcode the address It works just fine:
char* ImageBase = (char*)VirtualAlloc((void*)0X140000000, size_of_image, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
Is DWORD not enough to store the address?
this reallocation code is stripping some variable-length somewhere since it still works with 32 but not with x64:
//this is how much we shifted the ImageBase
DWORD_PTR delta_VA_reloc = (reinterpret_cast<DWORD_PTR>(ImageBase)) - p_NT_HDR->OptionalHeader.ImageBase;
// if there is a relocation table, and we actually shitfted the ImageBase
if (data_directory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress != 0 && delta_VA_reloc != 0) {
printf("\n[*] The relocated address is not the prefered address, started relocating\n");
//calculate the relocation table address
IMAGE_BASE_RELOCATION* p_reloc = (IMAGE_BASE_RELOCATION*)(ImageBase + data_directory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);
//once again, a null terminated array
while (p_reloc->VirtualAddress != 0) {
// how many relocations in this block
// ie the total size, minus the size of the "header", divided by 2 (those are words, so 2 bytes for each)
//std::cout << sizeof(WORD) << "\n"; sizeof word is 2
DWORD size = (p_reloc->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD);
// the first relocation element in the block, right after the header (using pointer arithmetic again)
WORD* reloc = (WORD*)(p_reloc + 1);
for (int i = 0; i < size; ++i) {
//type is the first 4 bits of the relocation word
int type = reloc[i] >> 12;
// offset is the last 12 bits
unsigned long long int offset = reloc[i] & 0x0fff;
//printf("--------------- %#llx\n", offset);
//this is the address we are going to change
DWORD* change_addr = (DWORD*)(ImageBase + p_reloc->VirtualAddress + offset);
// there is only one type used that needs to make a change
// When you relocate you should look if the flag HIGHT_LOW is active for PE32 and DIR64 for PE32+
switch (type) {
case IMAGE_REL_BASED_HIGHLOW://for x86
*change_addr += delta_VA_reloc;
break;
case IMAGE_REL_BASED_DIR64://for x64
*change_addr += delta_VA_reloc;
break;
default:
break;
}
}
// switch to the next relocation block, based on the size
p_reloc = (IMAGE_BASE_RELOCATION*)((reinterpret_cast<DWORD_PTR>(p_reloc)) + p_reloc->SizeOfBlock);
}
}
Related
I'm currently attempting to build my own Operating System and have run into an issue when trying to test out my kernel code using VirtualBox.
The real issue arises when I call the assembly instruction sti as I'm currently attempting to implement an interrupt descriptor table and communicate with the PICs.
Here is the code that calls it. It's a function called kernel_main that is called from another assembly file. That file simply sets up the stack before executing any code from the OS, but there hasn't been any issues there, and everything works fine until I add the instruction asm("sti"); to the following code:
/* main function of our kernal
* accepts the pointer to multiboot and the magic code (no particular reason to take the magic number)
*
* use extern C to prevent gcc from changing the name
*/
extern "C" void kernel_main(void *multiboot_structure, uint32_t magic_number)
{
// can't use the standard printf as we're outside an OS currently
// we don't have access to glibc so we write our own printf
printf_boot_message("kernel.....\n");
// create the global descriptor table
GlobalDescriptorTable gdt;
// create the interrupt descriptor table
InterruptHandler interrupt_handler(&gdt);
// enable interrupts (test)
asm("sti"); // <- causes crash
// random debug printf
printf_boot_message("sti called\n");
// kernal never really stops, inf loop
while (1)
;
}
Below is the virtual box debug output, I've googled around for VINF_EM_TRIPLE_FAULT but mostly found RAM related issues that I don't think apply to me. The printf calls in the above code execute as expected followed by the VM immediately crashing stating the following:
Link to output as it's too large to post here: https://pastebin.com/jfPfhJUQ
Here is my interrupt handling code:
* Implementations of the interrupt handling routines in sys_interrupts.h
*/
#include "sys_interrupts.h"
#include "misc.h"
//handle() is used to take the interrupt number,
//i_number, and the address to the current CPU stack frame.
uint32_t InterruptHandler::handle(uint8_t i_number, uint32_t crnt_stkptr)
{
// debug
printf(" INTERRUPT");
// after the interrupt code has been executed,
// return the stack pointer so that the CPU can resume
// where it left off.
// this works for now as we do not have multiple
// concurrent processes running, so there is no issue
// of handling the threat number.
return crnt_stkptr;
}
// define the global descriptor table
InterruptHandler::_gate_descriptor InterruptHandler::interrupt_desc_table[N_ENTRIES];
// define the constructor. Takes a pointer to the global
// descriptor table
InterruptHandler::InterruptHandler(GlobalDescriptorTable* global_desc_table)
{
// grab the offset of the usable memory within our global segment
uint16_t seg = global_desc_table->CodeSegmentSelector();
// set all the entries in the IDT to block request initially
for (uint16_t i = 0; i < N_ENTRIES; i++)
{
// create an a gate for a system level interrupt, calling the block function (does nothing) using seg as its memory.
create_entry(i, seg, &block_request, PRIV_LVL_KERNEL, GATE_INTERRUPT);
}
// create a couple interrupts for 0x00 and 0x01, really 0x20 and 0x21 in memory
//create_entry(BASE_I_NUM + 0x00, seg, &isr0x00, PRIV_LVL_KERNEL, GATE_INTERRUPT);
//create_entry(BASE_I_NUM + 0x01, seg, &isr0x01, PRIV_LVL_KERNEL, GATE_INTERRUPT);
// init the PICs
pic_controller.send_master_cmd(PIC_INIT);
pic_controller.send_slave_cmd(PIC_INIT);
// tell master pic to add 0x20 to any interrupt number it sends to CPU, while slave pic sends 0x28 + i_number
pic_controller.send_master_data(PIC_OFFSET_MASTER);
pic_controller.send_slave_data(PIC_OFFSET_SLAVE);
// set the interrupt vectoring to cascade and tell master that there is a slave PIC at IRQ2
pic_controller.send_master_data(ICW1_INTERVAL4);
pic_controller.send_slave_data(ICW1_SINGLE);
// set the PICs to work in 8086 mode
pic_controller.send_master_data(ICW1_8086);
pic_controller.send_slave_data(ICW1_8086);
// send 0s
pic_controller.send_master_data(DEFAULT_MASK);
pic_controller.send_slave_data(DEFAULT_MASK);
// tell the cpu to use the table
interrupt_desc_table_pointerdata idt_ptr;
//set the size
idt_ptr.table_size = N_ENTRIES * sizeof(_gate_descriptor) - 1;
// set the base address
idt_ptr.base_addr = (uint32_t)interrupt_desc_table;
// use lidt instruction to load the table
// the cpu will map interrupts to the table
asm volatile("lidt %0" : : "m" (idt_ptr));
// issue debug print
printf_boot_message(" 2: Created Interrupt Desc Table...\n");
}
// define the destructor of the class
InterruptHandler::~InterruptHandler()
{
}
// function to make entries in the IDT
// takes the interrupt number as an index, the segment offset it used to specify which memory segment to use
// a pointer to the function to call, the flags and access level.
void InterruptHandler::create_entry(uint8_t i_number, uint16_t segment_desc_offset, void (*isr)(), uint8_t priv_lvl, uint8_t desc_type)
{
// set the i_number'th entry to the given params
// take the lower bits of the pointer
interrupt_desc_table[i_number].handler_lower_bits = ((uint32_t)isr) & 0xFFFF;
// take the upper bits
interrupt_desc_table[i_number].handler_upper_bits = (((uint32_t)isr) >> 16) & 0xFFFF;
// calculate the privilage byte, setting the correct bits
interrupt_desc_table[i_number].priv_lvl = 0x80 | ((priv_lvl & 3) << 5) | desc_type;
interrupt_desc_table[i_number].segment_desc_offset = segment_desc_offset;
// reserved byte is always 0
interrupt_desc_table[i_number].reserved_byte = 0;
}
// need a function to block or ignore any requests
// that we dont want to service. Requests could be caused
// by devices we haven't yet configured when testing the os.
void InterruptHandler::block_request()
{
// do nothing
}
// function to tell the CPU to send interrupts
// to this table
void InterruptHandler::set_active()
{
// call sti assembly to start interrup poling at the CPU level
asm volatile("sti"); // <- calling this crashes the kernel
// issue debug print
printf_boot_message(" 4: Activated sys interrupts...\n");
}
And here is the code for my GDT, I followed the os dev wiki guide for this:
#include "global_desc_table.h"
/**
* A code segment is identified by flag 0x9A, cannot write to a code segment
* while a data segment is identified by flag 0x92
*
* Based on the C code present on OSDEV Wiki
*/
GlobalDescriptorTable::GlobalDescriptorTable() : nullSegmentSelector(0, 0, 0),
unusedSegmentSelector(0, 0, 0),
codeSegmentSelector(0, 64*1024*1024, 0x9A),
dataSegmentSelector(0, 64*1024*1024, 0x92)
{
//8 bytes defined, but processor expects 6 bytes only
uint32_t i[2];
//first 4 bytes is address of table
i[0] = (uint32_t)this;
//second 4 bytes, the high bytes, are size of global desc table
i[1] = sizeof(GlobalDescriptorTable) << 16;
// tell processor to use this table using its ldgt function
asm volatile("lgdt (%0)" : : "p" (((uint8_t *) i) + 2));
// issue debug print
printf_boot_message(" 1: Created Global Desc Table...\n");
}
// function to get the offset of the datasegment selector
uint16_t GlobalDescriptorTable::DataSegmentSelector()
{
// calculate the offset by subtracting the table's address from the datasegment's address
return (uint8_t *) &dataSegmentSelector - (uint8_t*)this;
}
// function to get the offset of the code segment
uint16_t GlobalDescriptorTable::CodeSegmentSelector()
{
// calculate the offset by subtracting the table's address from the code segment's address
return (uint8_t *) &codeSegmentSelector - (uint8_t*)this;
}
// default destructor
GlobalDescriptorTable::~GlobalDescriptorTable()
{
}
/**
* The constructor to create a new entry segment, set the flags, determine the formatting for the limit, and set the base
*/
GlobalDescriptorTable::SegmentDescriptor::SegmentDescriptor(uint32_t base, uint32_t limit, uint8_t flags)
{
uint8_t* target = (uint8_t*)this;
//if 16 bit limit
if (limit <= 65536)
{
// tell processor that this is a 16bit entry
target[6] = 0x40;
} else {
// if the last 12 bits of limit are not 1s
if ((limit & 0xFFF) != 0xFFF)
{
limit = (limit >> 12) - 1;
} else {
limit >>= 12;
}
// indicate that there was a shift of 12 done
target[6] = 0xC0;
}
// set the lower and upper 2 lowest bytes of limit
target[0] = limit & 0xFF;
target[1] = (limit >> 8) & 0xFF;
//the rest of limit must go in lower 4 bit of byte 6, and byte 5
target[6] |= (limit >> 16) & 0xF;
//encode the pointer
target[2] = base & 0xFF;
target[3] = (base >> 8) & 0xFF;
target[4] = (base >> 16) & 0xFF;
target[7] = (base >> 24) & 0xFF;
// set the flags
target[5] = flags;
}
/**
* Define the methods to get the base pointer from an segment and
* the limit for a segment, taken from os wiki
*/
uint32_t GlobalDescriptorTable::SegmentDescriptor::Base()
{
// simply do the reverse of wht was done to place the pointer in
uint8_t* target = (uint8_t*) this;
uint32_t result = target[7];
result = (result << 8) + target[4];
result = (result << 8) + target[3];
result = (result << 8) + target[2];
return result;
}
uint32_t GlobalDescriptorTable::SegmentDescriptor::Limit()
{
uint8_t* target = (uint8_t *)this;
uint32_t result = target[6] & 0xF;
result = (result << 8) + target[1];
result = (result << 8) + target[0];
//check if there was a shift of 12
if (target[6] & 0xC0 == 0xC0)
{
result = (result << 12) & 0xFFF;
}
return result;
}
i[0] = (uint32_t)this;
//second 4 bytes, the high bytes, are size of global desc table
i[1] = sizeof(GlobalDescriptorTable) << 16;
I've had the same problem, just swap the 0 and 1 in between:
i[1] = (uint32_t)this;
//second 4 bytes, the high bytes, are size of global desc table
i[0] = sizeof(GlobalDescriptorTable) << 16;
That's the problem if you are following the same tutorial and I think you do if you came here.
Sometimes due to wrong idtr value also(invalid pointer causing crash)
check the idtr reg value in vbox log
if u load idt in protected mode address of idt shows some wierd changes(shifted left 16bits or some value in lower 16 bit)
try changing pointer according to that(thats how i did) or use lidt in before entering protected mode(this is also tested)
There was a bug in my GDT that forced the kernel to read an invalid pointer from the segment. This caused a seg fault.
I am wondering why the pointer value (324502) is in var signalLengthDebugVar1 instead of the expected integer value (2)?
struct ShmLengthOfSignalName {
int signalLength;
};
//...
BYTE* pBuf = NULL;
//...
int main(void){
//...
pBuf = (BYTE*) MapViewOfFile(hMapFile, FILE_MAP_ALL_ACCESS, 0, 0, BUF_SIZE);
//...
JobaSignal sig1;
printf("Value SignalLength: %d \r\n", pBuf[30]); // 2
const ShmLengthOfSignalName * signalNameLengthPtr = (const ShmLengthOfSignalName *)(pBuf + 30);
int signalLengthDebugVar1 = signalNameLengthPtr->signalLength; // content: 324502 maybe pointer?
int signalLengthDebugVar2 = (int) pBuf[30]; // content 2
sig1.setNameLength(signalLengthDebugVar2);
}
When you print the value, you're reading only the single byte at pBuf + 30:
// takes pBuf[30], converts that byte's value to int, and prints it
printf("Value SignalLength: %d \r\n", pBuf[30]); // 2
Later, when you cast the pointer and dereference it, you're accessing a full int, which is sizeof(int) bytes (likely 4). This occupies not just the byte at pBuf + 30 but also the subsequent bytes at pBuf + 31, etc., up to sizeof(int) on your platform. It also interprets these bytes according to your platform's byte-endianness (little-endian for Intel, big-endian for most other platforms).
// the signalLength struct member is an int
int signalLengthDebugVar1 = signalNameLengthPtr->signalLength; // content: 324502 maybe pointer?
Note also that the compiler is permitted to add padding before or after the loation of its signalLength field. In other words, you can't assume that signalLength will start at struct offset zero, unless you use extern "C" or a compiler-specific #pragma. And even then, you can't control the endianness interpretation, so if the data was encoded as big-endian and you're on a little-endian machine like x86, the value you see will be wrong.
The bottom line is that in C++ this is not a safe way to decode binary data.
I'm trying to change the value of an address in solitaire which provides the time.
Given the code below, the baseaddress + offset 0x97074 should point to another address with offset 0x50 and finally this address should point to the final address with offset x0C to change the timevalue.
However, solitaire crashes when I'm executing this operation.
HMODULE hModule = GetModuleHandle(nullptr);
sstream << std::hex << reinterpret_cast<unsigned int>(hModule);
str = sstream.str();
BaseAddress = reinterpret_cast<DWORD>(str.c_str());
//MessageBox(NULL, (LPCSTR) BaseAddress, "Adress", MB_OK); just some reminder
*(*(*(*(DWORD *) BaseAddress + (DWORD *) BASE_OFS_DEF ) + (DWORD *)TIME_OFS1_DEF ) + (DWORD *)TIME_OFS2_DEF) = 500;
The logic is wrong, you are dereferencing your pointers before you add the offset and casting your offsets to pointers! I think this is what you want
*(DWORD*)(*(DWORD*)(*(DWORD*)(BaseAddress + BASE_OFS_DEF) + TIME_OFS1_DEF) + TIME_OFS2_DEF) = 500;
But you should really break that down a bit to help understand what's going on, e.g.
DWORD temp1 = *(DWORD*)(BaseAddress + BASE_OFS_DEF);
DWORD temp2 = *(DWORD*)(temp1 + TIME_OFS1_DEF);
*(DWORD*)(temp2 + TIME_OFS2_DEF) = 500;
I have a very large file and I need to read it in small pieces and then process each piece. I'm using MapViewOfFile function to map a piece in memory, but after reading first part I can't read the second. It throws when I'm trying to map it.
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
HANDLE fileMap = CreateFileMapping(inputFile, NULL, PAGE_READONLY, 0, 0, input);
while (offset < fileSize)
{
long k = 0;
bool cutted = false;
offset -= tempBufferSize;
if (fileSize - offset <= bufferSize)
{
bufferSize = fileSize - offset;
}
char *buffer = new char[bufferSize + tempBufferSize];
for(int i = 0; i < tempBufferSize; i++)
{
buffer[i] = tempBuffer[i];
}
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
HANDLE inputFile;
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
long long offsetLow = (offset & 0xFFFFFFFF);
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
memcpy(&buffer[tempBufferSize], &tmp_buffer[0], bufferSize);
UnmapViewOfFile(tmp_buffer);
offset += bufferSize;
offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
offsetLow = (offset & 0xFFFFFFFF);
if (offset < fileSize)
{
char *next;
next = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, 1);
if (next[0] >= '0' && next[0] <= '9')
{
cutted = true;
}
UnmapViewOfFile(next);
}
ostringstream path_stream;
path_stream << tempPath << splitNum;
ProcessChunk(buffer, path_stream.str(), cutted, bufferSize);
delete buffer;
cout << (splitNum + 1) << " file(s) sorted" << endl;
splitNum++;
}
One possibility is that you're not using an offset that's a multiple of the allocation granularity. From MSDN:
The combination of the high and low offsets must specify an offset within the file mapping. They must also match the memory allocation granularity of the system. That is, the offset must be a multiple of the allocation granularity. To obtain the memory allocation granularity of the system, use the GetSystemInfo function, which fills in the members of a SYSTEM_INFO structure.
If you try to map at something other than a multiple of the allocation granularity, the mapping will fail and GetLastError will return ERROR_MAPPED_ALIGNMENT.
Other than that, there are many problems in the code sample that make it very difficult to see what you're trying to do and where it's going wrong. At a minimum, you need to solve the memory leaks. You seem to be allocating and then leaking completely unnecessary buffers. Giving them better names can make it clear what they are actually used for.
Then I suggest putting a breakpoint on the calls to MapViewOfFile, and then checking all of the parameter values you're passing in to make sure they look right. As a start, on the second call, you'd expect offsetHigh to be 0 and offsetLow to be bufferSize.
A few suspicious things off the bat:
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
Every cast should make you suspicious. Sometimes they are necessary, but make sure you understand why. At this point you should ask yourself why every other file API you're using requires a HANDLE and this function returns an HFILE. If you check OpenFile documentation, you'll see, "This function has limited capabilities and is not recommended. For new application development, use the CreateFile function." I know that sounds confusing because you want to open an existing file, but CreateFile can do exactly that, and it returns the right type.
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
What type is offset? You probably want to make sure it's an unsigned long long or equivalent. When bitshifting, especially to the right, you almost always want an unsigned type to avoid sign-extension. You also have to make sure that it's a type that has more bits than the amount you're shifting by--shifting a 32-bit value by 32 (or more) bits is actually undefined in C and C++, which allows the compilers to do certain types of optimizations.
long long offsetLow = (offset & 0xFFFFFFFF);
In both of these statements, you have to be careful about the 0xFFFFFFFF value. Since you didn't cast it or give it a suffix, it can be hard to predict whether the compiler will treat it as an int or unsigned int. In this case,
it'll be an unsigned int, but that won't be obvious to many people. In fact,
I got this wrong when I first wrote this answer. [This paragraph corrected 16-MAY-2017] With bitwise operations, you almost always want to make sure you're using unsigned values.
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
You're casting offsetHigh and offsetLow to ints, which are signed values. The API actually wants DWORDs, which are unsigned values. Rather than casting in the call, I would declare offsetHigh and offsetLow as DWORDs and do the casting in the initialization, like this:
DWORD offsetHigh = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD offsetLow = static_cast<DWORD>( offset & 0xFFFFFFFFul);
tmp_buffer = reinterpret_cast<const char *>(MapViewOfFile(fileMap, FILE_MAP_READ, offsetHigh, offsetLow, bufferSize));
Those fixes may or may not resolve your problem. It's hard to tell what's going on from the incomplete code sample.
Here's a working sample you can compare to:
// Calls ProcessChunk with each chunk of the file.
void ReadInChunks(const WCHAR *pszFileName) {
// Offsets must be a multiple of the system's allocation granularity. We
// guarantee this by making our view size equal to the allocation granularity.
SYSTEM_INFO sysinfo = {0};
::GetSystemInfo(&sysinfo);
DWORD cbView = sysinfo.dwAllocationGranularity;
HANDLE hfile = ::CreateFileW(pszFileName, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
if (hfile != INVALID_HANDLE_VALUE) {
LARGE_INTEGER file_size = {0};
::GetFileSizeEx(hfile, &file_size);
const unsigned long long cbFile =
static_cast<unsigned long long>(file_size.QuadPart);
HANDLE hmap = ::CreateFileMappingW(hfile, NULL, PAGE_READONLY, 0, 0, NULL);
if (hmap != NULL) {
for (unsigned long long offset = 0; offset < cbFile; offset += cbView) {
DWORD high = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD low = static_cast<DWORD>( offset & 0xFFFFFFFFul);
// The last view may be shorter.
if (offset + cbView > cbFile) {
cbView = static_cast<int>(cbFile - offset);
}
const char *pView = static_cast<const char *>(
::MapViewOfFile(hmap, FILE_MAP_READ, high, low, cbView));
if (pView != NULL) {
ProcessChunk(pView, cbView);
}
}
::CloseHandle(hmap);
}
::CloseHandle(hfile);
}
}
You have a memory leak in your code:
char *tmp_buffer = new char[bufferSize];
[ ... ]
while (offset < fileSize)
{
[ ... ]
char *tmp_buffer = new char[bufferSize];
[ ... ]
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
[ ... ]
}
You're never delete what you allocate via new char[] during every iteration there. If your file is large enough / you do enough iterations of this loop, the memory allocation will eventually fail - that's then you'll see a throw() done by the allocator.
Win32 API calls like MapViewOfFile() are not C++ and never throw, they return error codes (the latter NULL on failure). Therefore, if you see exceptions, something's wrong in you C++ code. Likely the above.
I also had some troubles with memory mapped files.
Basically I just wanted to share memory (1Mo) between 2 apps on the same Pc.
- Both apps where written in Delphi
- Using Windows8 Pro
At first one application (the first one launched) could read and write the memoryMappedFile, but the second one could only read it (error 5 : AccessDenied)
Finally after a lot of testing It suddenly worked when both application where using CreateFileMapping. I even tried to create my on security descriptor, nothing helped.
Just before my applications where first calling OpenFileMapping and then CreateFileMapping if the first one failed
Another thing that misleaded me is that the handles , although visibly referencing the same MemoryMappedFile where different in both applications.
One last thing, after this correction my application seemed to work all right, but after a while I had error_NotEnough_Memory. when calling MapViewOfFile.
It was just a beginner's mistake of my part, I was not always calling UnmapViewOfFile.
How do I align a pointer to a 16 byte boundary?
I found this code, not sure if its correct
char* p= malloc(1024);
if ((((unsigned long) p) % 16) != 0)
{
unsigned char *chpoint = (unsigned char *)p;
chpoint += 16 - (((unsigned long) p) % 16);
p = (char *)chpoint;
}
Would this work?
thanks
C++0x proposes std::align, which does just that.
// get some memory
T* const p = ...;
std::size_t const size = ...;
void* start = p;
std::size_t space = size;
void* aligned = std::align(16, 1024, p, space);
if(aligned == nullptr) {
// failed to align
} else {
// here, p is aligned to 16 and points to at least 1024 bytes of memory
// also p == aligned
// size - space is the amount of bytes used for alignment
}
which seems very low-level. I think
// also available in Boost flavour
using storage = std::aligned_storage_t<1024, 16>;
auto p = new storage;
also works. You can easily run afoul of aliasing rules though if you're not careful. If you had a precise scenario in mind (fit N objects of type T at a 16 byte boundary?) I think I could recommend something nicer.
Try this:
It returns aligned memory and frees the memory, with virtually no extra memory management overhead.
#include <malloc.h>
#include <assert.h>
size_t roundUp(size_t a, size_t b) { return (1 + (a - 1) / b) * b; }
// we assume here that size_t and void* can be converted to each other
void *malloc_aligned(size_t size, size_t align = sizeof(void*))
{
assert(align % sizeof(size_t) == 0);
assert(sizeof(void*) == sizeof(size_t)); // not sure if needed, but whatever
void *p = malloc(size + 2 * align); // allocate with enough room to store the size
if (p != NULL)
{
size_t base = (size_t)p;
p = (char*)roundUp(base, align) + align; // align & make room for storing the size
((size_t*)p)[-1] = (size_t)p - base; // store the size before the block
}
return p;
}
void free_aligned(void *p) { free(p != NULL ? (char*)p - ((size_t*)p)[-1] : p); }
Warning:
I'm pretty sure I'm stepping on parts of the C standard here, but who cares. :P
In glibc library malloc, realloc always returns 8 bytes aligned. If you want to allocate memory with some alignment which is a higher power 2 then you can use memalign and posix_memalign. Read http://www.gnu.org/s/hello/manual/libc/Aligned-Memory-Blocks.html
posix_memalign is one way: http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_memalign.html as long as your size is a power of two.
The problem with the solution you provide is that you run the risk of writing off the end of your allocated memory. An alternative solution is to alloc the size you want + 16 and to use a similar trick to the one you're doing to get a pointer that is aligned, but still falls within your allocated region. That said, I'd use posix_memalign as a first solution.
Updated: New Faster Algorithm
Don't use modulo because it takes hundreds of clock cycles on x86 due to the nasty division and a lot more on other systems. I came up with a faster version of std::align than GCC and Visual-C++. Visual-C++ has the slowest implementation, which actually uses an amateurish conditional statement. GCC is very similar to my algorithm but I did the opposite of what they did but my algorithm is 13.3 % faster because it has 13 as opposed to 15 single-cycle instructions. See here is the research paper with dissassembly. The algorithm is actually one instruction faster if you use the mask instead of the pow_2.
/* Quickly aligns the given pointer to a power of two boundaries.
#return An aligned pointer of typename T.
#desc Algorithm is a 2's compliment trick that works by masking off
the desired number in 2's compliment and adding them to the
pointer. Please note how I took the horizontal comment whitespace back.
#param pointer The pointer to align.
#param mask Mask for the lower LSb, which is one less than the power of
2 you wish to align too. */
template <typename T = char>
inline T* AlignUp(void* pointer, uintptr_t mask) {
intptr_t value = reinterpret_cast<intptr_t>(pointer);
value += (-value) & mask;
return reinterpret_cast<T*>(value);
}
Here is how you call it:
enum { kSize = 256 };
char buffer[kSize + 16];
char* aligned_to_16_byte_boundary = AlignUp<> (buffer, 15); //< 16 - 1 = 15
char16_t* aligned_to_64_byte_boundary = AlignUp<char16_t> (buffer, 63);
Here is the quick bit-wise proof for 3 bits, it works the same for all bit counts:
~000 = 111 => 000 + 111 + 1 = 0x1000
~001 = 110 => 001 + 110 + 1 = 0x1000
~010 = 101 => 010 + 101 + 1 = 0x1000
~011 = 100 => 011 + 100 + 1 = 0x1000
~100 = 011 => 100 + 011 + 1 = 0x1000
~101 = 010 => 101 + 010 + 1 = 0x1000
~110 = 001 => 110 + 001 + 1 = 0x1000
~111 = 000 => 111 + 000 + 1 = 0x1000
Just in case you're here to learn how to align to a cache line an object in C++11, use the in-place constructor:
struct Foo { Foo () {} };
Foo* foo = new (AlignUp<Foo> (buffer, 63)) Foo ();
Here is the std::align implmentation, it uses 24 instructions where the GCC implementation uses 31 instructions, though it can be tweaked to eliminate a decrement instruction by turning (--align) to the mask for the Least Significant bits but that would not operate functionally identical to std::align.
inline void* align(size_t align, size_t size, void*& ptr,
size_t& space) noexcept {
intptr_t int_ptr = reinterpret_cast<intptr_t>(ptr),
offset = (-int_ptr) & (--align);
if ((space -= offset) < size) {
space += offset;
return nullptr;
}
return reinterpret_cast<void*>(int_ptr + offset);
}
Faster to Use mask rather than pow_2
Here is the code for aligning using a mask rather than the the pow_2 (which is the even power of 2). This is 20% fatert than the GCC algorithm but requires you to store the mask rather than the pow_2 so it's not interchangable.
inline void* AlignMask(size_t mask, size_t size, void*& ptr,
size_t& space) noexcept {
intptr_t int_ptr = reinterpret_cast<intptr_t>(ptr),
offset = (-int_ptr) & mask;
if ((space -= offset) < size) {
space += offset;
return nullptr;
}
return reinterpret_cast<void*>(int_ptr + offset);
}
few things:
don't change the pointer returned by the malloc/new: you'll need it later to free the memory;
make sure your buffer is big enough after adjusting the alignment
use size_t instead of unsigned long, since size_t guaranteed to have the same size as the pointer, as opposed to anything else:
here's the code:
size_t size = 1024; // this is how many bytes you need in the aligned buffer
size_t align = 16; // this is the alignment boundary
char *p = (char*)malloc(size + align); // see second point above
char *aligned_p = (char*)((size_t)p + (align - (size_t)p % align));
// use the aligned_p here
// ...
// when you're done, call:
free(p); // see first point above