Mapping non-contiguous blocks from a file into contiguous memory addresses - c++

I am interested in the prospect of using memory mapped IO, preferably
exploiting the facilities in boost::interprocess for cross-platform
support, to map non-contiguous system-page-size blocks in a file into
a contiguous address space in memory.
A simplified concrete scenario:
I've a number of 'plain-old-data' structures, each of a fixed length
(less than the system page size.) These structures are concatenated
into a (very long) stream with the type & location of structures
determined by the values of those structures that proceed them in the
stream. I'm aiming to minimize latency and maximize throughput in a
demanding concurrent environment.
I can read this data very effectively by memory-mapping it in blocks
of at least twice the system-page-size... and establishing a new
mapping immediately having read a structure extending beyond the
penultimate system-page-boundary. This allows the code that interacts
with the plain-old-data structures to be blissfully unaware that these
structures are memory mapped... and, for example, could compare two
different structures using memcmp() directly without having to care
about page boundaries.
Where things get interesting is with respect to updating these data
streams... while they're being (concurrently) read. The strategy I'd
like to use is inspired by 'Copy On Write' on a system-page-size
granularity... essentially writing 'overlay-pages' - allowing one
process to read the old data while another reads the updated data.
While managing which overlay pages to use, and when, isn't necessarily
trivial... that's not my main concern. My main concern is that I may
have a structure spanning pages 4 and 5, then update a
structure wholly contained in page 5... writing the new page in
location 6... leaving page 5 to be 'garbage collected' when it is
determined to be no-longer reachable. This means that, if I map page
4 into location M, I need to map page 6 into memory location
M+page_size... in order to be able to reliably process structures that
cross page boundaries using existing (non-memory-mapping-aware) functions.
I'm trying to establish the best strategy, and I'm hampered by
documentation I feel is incomplete. Essentially, I need to decouple
allocation of address space from memory mapping into that address
space. With mmap(), I'm aware that I can use MAP_FIXED - if I wish to
explicitly control the mapping location... but I'm unclear how I
should reserve address space in order to do this safely. Can I map
/dev/zero for two pages without MAP_FIXED, then use MAP_FIXED twice to
map two pages into that allocated space at explicit VM addresses? If
so, should I call munmap() three times too? Will it leak resources
and/or have any other untoward overhead? To make the issue even more
complex, I'd like comparable behaviour on Windows... is there any way
to do this? Are there neat solutions if I were to compromise my
cross-platform ambitions?
--
Thanks for your answer, Mahmoud... I've read, and think I've understood that code... I've compiled it under Linux and it behaves as you suggest.
My main concerns are with line 62 - using MAP_FIXED. It makes some assumptions about mmap, which I've been unable to confirm when I read the documentation I can find. You're mapping the 'update' page into the same address space as mmap() returned initially - I assume that this is 'correct' - i.e. not something that just happens to work on Linux? I'd also need to assume that it works cross-platform for file-mappings as well as anonymous mappings.
The sample definitely moves me forwards... documenting that what I ultimately need is probably achievable with mmap() on Linux - at least. What I'd really like is a pointer to documentation that shows that the MAP_FIXED line will work as the sample demonstrates... and, idealy, a transformation from the Linux/Unix specific mmap() to a platform independent (Boost::interprocess) approach.

Your question is a little confusing. From what I understood, this code will do what you need:
#define PAGESIZE 4096
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <errno.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
struct StoredObject
{
int IntVal;
char StrVal[25];
};
int main(int argc, char **argv)
{
int fd = open("mmapfile", O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600);
//Set the file to the size of our data (2 pages)
lseek(fd, PAGESIZE*2 - 1, SEEK_SET);
write(fd, "", 1); //The final byte
unsigned char *mapPtr = (unsigned char *) mmap(0, PAGESIZE * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
struct StoredObject controlObject;
controlObject.IntVal = 12;
strcpy(controlObject.StrVal, "Mary had a little lamb.\n");
struct StoredObject *mary1;
mary1 = (struct StoredObject *)(mapPtr + PAGESIZE - 4); //Will fall on the boundary between first and second page
memcpy(mary1, &controlObject, sizeof(StoredObject));
printf("%d, %s", mary1->IntVal, mary1->StrVal);
//Should print "12, Mary had a little lamb.\n"
struct StoredObject *john1;
john1 = mary1 + 1; //Comes immediately after mary1 in memory; will start and end in the second page
memcpy(john1, &controlObject, sizeof(StoredObject));
john1->IntVal = 42;
strcpy(john1->StrVal, "John had a little lamb.\n");
printf("%d, %s", john1->IntVal, john1->StrVal);
//Should print "12, Mary had a little lamb.\n"
//Make sure the data's on the disk, as this is the initial, "read-only" data
msync(mapPtr, PAGESIZE * 2, MS_SYNC);
//This is the inital data set, now in memory, loaded across two pages
//At this point, someone could be reading from there. We don't know or care.
//We want to modify john1, but don't want to write over the existing data
//Easy as pie.
//This is the shadow map. COW-like optimization will take place:
//we'll map the entire address space from the shared source, then overlap with a new map to modify
//This is mapped anywhere, letting the system decide what address we'll be using for the new data pointer
unsigned char *mapPtr2 = (unsigned char *) mmap(0, PAGESIZE * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
//Map the second page on top of the first mapping; this is the one that we're modifying. It is *not* backed by disk
unsigned char *temp = (unsigned char *) mmap(mapPtr2 + PAGESIZE, PAGESIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_ANON, 0, 0);
if (temp == MAP_FAILED)
{
printf("Fixed map failed. %s", strerror(errno));
}
assert(temp == mapPtr2 + PAGESIZE);
//Make a copy of the old data that will later be changed
memcpy(mapPtr2 + PAGESIZE, mapPtr + PAGESIZE, PAGESIZE);
//The two address spaces should still be identical until this point
assert(memcmp(mapPtr, mapPtr2, PAGESIZE * 2) == 0);
//We can now make our changes to the second page as needed
struct StoredObject *mary2 = (struct StoredObject *)(((unsigned char *)mary1 - mapPtr) + mapPtr2);
struct StoredObject *john2 = (struct StoredObject *)(((unsigned char *)john1 - mapPtr) + mapPtr2);
john2->IntVal = 52;
strcpy(john2->StrVal, "Mike had a little lamb.\n");
//Test that everything worked OK
assert(memcmp(mary1, mary2, sizeof(struct StoredObject)) == 0);
printf("%d, %s", john2->IntVal, john2->StrVal);
//Should print "52, Mike had a little lamb.\n"
//Now assume our garbage collection routine has detected that no one is using the original copy of the data
munmap(mapPtr, PAGESIZE * 2);
mapPtr = mapPtr2;
//Now we're done with all our work and want to completely clean up
munmap(mapPtr2, PAGESIZE * 2);
close(fd);
return 0;
}
My modified answer should address your safety concerns.
Only use MAP_FIXED on the second mmap call (like I have above). The cool thing about MAP_FIXED is that it lets you overwrite an existing mmap address section. It'll unload the range you're overlapping and replace it with your new mapped content:
MAP_FIXED
[...] If the memory
region specified by addr and len overlaps pages of any existing
mapping(s), then the overlapped part of the existing mapping(s) will be
discarded. [...]
This way, you let the OS take care of finding a contiguous memory block of hundreds of megs for you (never call MAP_FIXED on address you don't know for sure isn't available). Then you call MAP_FIXED on a subsection of that now-mapped huge space with the data that you will be modifying. Tada.
On Windows, something like this should work (I'm on a Mac at the moment, so untested):
int main(int argc, char **argv)
{
HANDLE hFile = CreateFile(L"mmapfile", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
//Set the file to the size of our data (2 pages)
SetFilePointer(hFile, PAGESIZE*2 - 1, 0, FILE_BEGIN);
DWORD bytesWritten = -1;
WriteFile(hFile, "", 1, &bytesWritten, NULL);
HANDLE hMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, PAGESIZE * 2, NULL);
unsigned char *mapPtr = (unsigned char *) MapViewOfFile(hMap, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE * 2);
struct StoredObject controlObject;
controlObject.IntVal = 12;
strcpy(controlObject.StrVal, "Mary had a little lamb.\n");
struct StoredObject *mary1;
mary1 = (struct StoredObject *)(mapPtr + PAGESIZE - 4); //Will fall on the boundary between first and second page
memcpy(mary1, &controlObject, sizeof(StoredObject));
printf("%d, %s", mary1->IntVal, mary1->StrVal);
//Should print "12, Mary had a little lamb.\n"
struct StoredObject *john1;
john1 = mary1 + 1; //Comes immediately after mary1 in memory; will start and end in the second page
memcpy(john1, &controlObject, sizeof(StoredObject));
john1->IntVal = 42;
strcpy(john1->StrVal, "John had a little lamb.\n");
printf("%d, %s", john1->IntVal, john1->StrVal);
//Should print "12, Mary had a little lamb.\n"
//Make sure the data's on the disk, as this is the initial, "read-only" data
//msync(mapPtr, PAGESIZE * 2, MS_SYNC);
//This is the inital data set, now in memory, loaded across two pages
//At this point, someone could be reading from there. We don't know or care.
//We want to modify john1, but don't want to write over the existing data
//Easy as pie.
//This is the shadow map. COW-like optimization will take place:
//we'll map the entire address space from the shared source, then overlap with a new map to modify
//This is mapped anywhere, letting the system decide what address we'll be using for the new data pointer
unsigned char *reservedMem = (unsigned char *) VirtualAlloc(NULL, PAGESIZE * 2, MEM_RESERVE, PAGE_READWRITE);
HANDLE hMap2 = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, PAGESIZE, NULL);
unsigned char *mapPtr2 = (unsigned char *) MapViewOfFileEx(hMap2, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE, reservedMem);
//Map the second page on top of the first mapping; this is the one that we're modifying. It is *not* backed by disk
unsigned char *temp = (unsigned char *) MapViewOfFileEx(hMap2, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE, reservedMem + PAGESIZE);
if (temp == NULL)
{
printf("Fixed map failed. 0x%x\n", GetLastError());
return -1;
}
assert(temp == mapPtr2 + PAGESIZE);
//Make a copy of the old data that will later be changed
memcpy(mapPtr2 + PAGESIZE, mapPtr + PAGESIZE, PAGESIZE);
//The two address spaces should still be identical until this point
assert(memcmp(mapPtr, mapPtr2, PAGESIZE * 2) == 0);
//We can now make our changes to the second page as needed
struct StoredObject *mary2 = (struct StoredObject *)(((unsigned char *)mary1 - mapPtr) + mapPtr2);
struct StoredObject *john2 = (struct StoredObject *)(((unsigned char *)john1 - mapPtr) + mapPtr2);
john2->IntVal = 52;
strcpy(john2->StrVal, "Mike had a little lamb.\n");
//Test that everything worked OK
assert(memcmp(mary1, mary2, sizeof(struct StoredObject)) == 0);
printf("%d, %s", john2->IntVal, john2->StrVal);
//Should print "52, Mike had a little lamb.\n"
//Now assume our garbage collection routine has detected that no one is using the original copy of the data
//munmap(mapPtr, PAGESIZE * 2);
mapPtr = mapPtr2;
//Now we're done with all our work and want to completely clean up
//munmap(mapPtr2, PAGESIZE * 2);
//close(fd);
return 0;
}

but I'm unclear how I should reserve address space in order to do this safely
That's going to vary by OS, but a little digging on msdn for mmap (I started with "xp mmap" on the msdn search) shows Microsoft have their usual VerboseAndHelpfullyCapitalizedNames for (the many) functions that implement pieces of mmap. Both the file- and anonymous- mappers can handle fixed-address requests just the same as any POSIX-2001 system can, i.e. if something else in your address space is talking to the kernel, you get to sort it out. No way I'm going to touch "safely", there's no such thing as "safely" with code you're wanting to port to unspecified platforms. You're going to have to build your own pool of pre-mapped anonymous memory that you can unmap and parcel out later under your own control.

I tested the windows code from #Mahmoud, well actually I tested the following similar code, and it doesn't work (the Linux code works.) If you uncomment VirtualFree, it will work. As mentioned in my comment above, on windows you can reserve the address space with VirtualAlloc, but you can't use MapViewOfFileEx with an already mapped address, so you need to VirtualFree it first. Then there's a race condition where another thread can grab the memory address before you do, so you have to do everything in a loop, e.g. try up to 1000 times and then give up.
package main
import (
"fmt"
"os"
"syscall"
)
func main() {
const size = 1024 * 1024
file, err := os.Create("foo.dat")
if err != nil {
panic(err)
}
if err := file.Truncate(size); err != nil {
panic(err)
}
const MEM_COMMIT = 0x1000
addr, err := virtualAlloc(0, size, MEM_COMMIT, protReadWrite)
if err != nil {
panic(err)
}
fd, err := syscall.CreateFileMapping(
syscall.Handle(file.Fd()),
nil,
uint32(protReadWrite),
0,
uint32(size),
nil,
)
//if err := virtualFree(addr); err != nil {
// panic(err)
//}
base, err := mapViewOfFileEx(fd, syscall.FILE_MAP_READ|syscall.FILE_MAP_WRITE, 0, 0, size, addr)
if base == 0 {
panic("mapViewOfFileEx returned 0")
}
if err != nil {
panic(err)
}
fmt.Println("success!")
}
type memProtect uint32
const (
protReadOnly memProtect = 0x02
protReadWrite memProtect = 0x04
protExecute memProtect = 0x20
protAll memProtect = 0x40
)
var (
modkernel32 = syscall.MustLoadDLL("kernel32.dll")
procMapViewOfFileEx = modkernel32.MustFindProc("MapViewOfFileEx")
procVirtualAlloc = modkernel32.MustFindProc("VirtualAlloc")
procVirtualFree = modkernel32.MustFindProc("VirtualFree")
procVirtualProtect = modkernel32.MustFindProc("VirtualProtect")
)
func mapViewOfFileEx(handle syscall.Handle, prot memProtect, offsetHigh uint32, offsetLow uint32, length uintptr, target uintptr) (addr uintptr, err error) {
r0, _, e1 := syscall.Syscall6(procMapViewOfFileEx.Addr(), 6, uintptr(handle), uintptr(prot), uintptr(offsetHigh), uintptr(offsetLow), length, target)
addr = uintptr(r0)
if addr == 0 {
if e1 != 0 {
err = error(e1)
} else {
err = syscall.EINVAL
}
}
return addr, nil
}
func virtualAlloc(addr, size uintptr, allocType uint32, prot memProtect) (mem uintptr, err error) {
r0, _, e1 := syscall.Syscall6(procVirtualAlloc.Addr(), 4, addr, size, uintptr(allocType), uintptr(prot), 0, 0)
mem = uintptr(r0)
if e1 != 0 {
return 0, error(e1)
}
return mem, nil
}
func virtualFree(addr uintptr) error {
const MEM_RELEASE = 0x8000
_, _, e1 := syscall.Syscall(procVirtualFree.Addr(), 3, addr, 0, MEM_RELEASE)
if e1 != 0 {
return error(e1)
}
return nil
}

Related

How to use ReadFileScatter

I've been attempting to use ReadFileScatter today in my code (which sounds like exactly what I need), so far without a lot of luck. Google'ing the internet for what goes wrong doesn't give me much insight.
The documentation states:
The array must contain enough elements to store nNumberOfBytesToRead bytes of data, plus one element for the terminating NULL. For example, if there are 40 KB to be read and the page size is 4 KB, the array must have 11 elements that includes 10 for the data and one for the NULL.
Each buffer must be at least the size of a system memory page and must be aligned on a system memory page size boundary. The system reads one system memory page of data into each buffer.
The function stores the data in the buffers in sequential order. For example, it stores data into the first buffer, then into the second buffer, and so on until each buffer is filled and all the data is stored, or there are no more buffers.
So far I've been attempting to do just that. I allocated a bunch of bytes using VirtualAlloc (which ensures the page boundary constraint), add a terminator NULL to the list, ensure the data on disk is on the system boundary (and implicitly a disk sector size boundary) as well and issue the call.
Without further due, here's the minimum test case in C++:
// Setup: c:\tmp\test.dat is a file with at least 12K of stuff.
// I attempt to read page 2/3, e.g. offset [4096-4096+8192>
// TEST:
SYSTEM_INFO systemInfo;
GetSystemInfo(&systemInfo);
auto pageSize = systemInfo.dwPageSize;
std::cout << "Page size: "<< pageSize << std::endl;
// Allocate 2 pages that are aligned with one in the middle:
auto buffer = reinterpret_cast<char*>(VirtualAlloc(NULL, pageSize * 3, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE));
// Create read buffer:
std::vector<FILE_SEGMENT_ELEMENT> elements;
{
FILE_SEGMENT_ELEMENT element1;
element1.Buffer = buffer;
elements.push_back(element1);
}
{
FILE_SEGMENT_ELEMENT element2;
element2.Buffer = buffer + pageSize * 2;
elements.push_back(element2);
}
{
FILE_SEGMENT_ELEMENT terminator;
terminator.Buffer = NULL;
elements.push_back(terminator);
}
// [..] Physical sector size is normally checked as well. In my case it's 512 bytes,
// so I guess that's irrelevant here.
//
// Open file:
auto fileHandle = CreateFile(
"c:\\tmp\\test.dat",
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ,
NULL,
OPEN_ALWAYS,
FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH,
NULL);
auto err = GetLastError();
if (err != ERROR_ALREADY_EXISTS && err != ERROR_SUCCESS)
{
throw std::exception(); // FIXME.
}
OVERLAPPED overlapped;
memset(&overlapped, 0, sizeof(OVERLAPPED));
LARGE_INTEGER tmp;
tmp.QuadPart = 4096; // Read from disk page 1
overlapped.Offset = tmp.LowPart;
overlapped.OffsetHigh = tmp.HighPart;
overlapped.hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
auto succes = ReadFileScatter(fileHandle, elements.data(), pageSize * 2, NULL, &overlapped);
err = GetLastError();
if (!succes && err != ERROR_IO_PENDING && err != ERROR_SUCCESS)
{
throw std::exception(); // The call always ends up here with error 87: Invalid parameter
}
WaitForSingleObject(overlapped.hEvent, INFINITE);
std::cout << "Call succeeded!" << std::endl;
// FIXME: Proper exception handling.
// Clean up:
VirtualFree(buffer, pageSize * 3, MEM_DECOMMIT | MEM_RELEASE);
CloseHandle(overlapped.hEvent);
CloseHandle(fileHandle);
In the code, the error is noted with the comment // The call always ends up here with error 87: Invalid parameter. However, as far as I can see, I check all the boxes that they describe on MSDN... so...
What am I doing wrong here?

Kernel AIO (libaio) write request fails when writing from memory reserved at boot time

I am pretty new to linux aio (libaio) and am hoping someone here with more experience can help me out.
I have a system that performs high-speed DMA from a PCI device to the system RAM. I then want to write the data to disk using libaio directly from the DMA address space. Currently the memory used for DMA is reserved on boot through the use of the "memmap" kernel boot command.
In my application, the memory is mapped with mmap to a virtual userspace address pointer which I believe I should be able to pass as my buffer to the io_prep_pwrite() call. However, when I do this, the write does not seem to succeed. When the request is reaped with io_getevents the "res" field has error code -22 which prints as "Bad Address".
However, if I do a memcpy from the previously mapped memory location to a new buffer allocated by my application and then use the pointer to this local buffer in the call to io_prep_pwrite the request works just fine and the data is written to disk. The problem is that performing the memcpy creates a bottle-neck for the speed at which I need to stream data to disk and I am unable to keep up with the DMA rate.
I do not understand why I have to copy the memory first for the write request to work. I created a minimal example to illustrate the issue below. The bufBaseLoc is the mapped address and the localBuffer is the address the data is copied to. I do not want to have to perform the following line:
memcpy(localBuffer, bufBaseLoc, WRITE_SIZE);
...or have the localBuffer at all. In the "Prepare IOCB" section I want to use:
io_prep_pwrite(iocb, fid, bufBaseLoc, WRITE_SIZE, 0);
...but it does not work. But the local buffer, which is just a copy, does work.
Does anyone have any insight into why? Any help would be greatly appreciated.
Thanks,
#include <cstdio>
#include <string>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <libaio.h>
#define WRITE_SIZE 0x80000 //4MB buffer
#define DMA_BASE_ADDRESS 0x780000000 //Physical address of memory reserved at boot
//Reaping callback
void writeCallback(io_context_t ctx, struct iocb *iocb, long res, long res2)
{
//Res is number of bytes written by the request. It should match the requested IO size. If negative it is an error code
if(res != (long)iocb->u.c.nbytes)
{
fprintf(stderr, "WRITE_ERROR: %s\n", strerror(-res));
}
else
{
fprintf(stderr, "Success\n");
}
}
int main()
{
//Initialize Kernel AIO
io_context_t ctx = 0;
io_queue_init(256, &ctx);
//Open /dev/mem and map physical address to userspace
int fdmem = open("/dev/mem", O_RDWR);
void *bufBaseLoc = mmap(NULL, WRITE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fdmem, DMA_BASE_ADDRESS);
//Set memory here for test (normally done be DMA)
memset(bufBaseLoc, 1, WRITE_SIZE);
//Initialize Local Memory Buffer (DON’T WANT TO HAVE TO DO THIS)
uint8_t* localBuffer;
posix_memalign((void**)&localBuffer, 4096, WRITE_SIZE);
memset(localBuffer, 1, WRITE_SIZE);
//Open/Allocate file on disk
std::string filename = "tmpData.dat";
int fid = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH);
posix_fallocate(fid, 0, WRITE_SIZE);
//Copy from DMA buffer to local process buffer (THIS IS THE COPY I WANT TO AVOID)
memcpy(localBuffer, bufBaseLoc, WRITE_SIZE);
//Prepare IOCB
struct iocb *iocbs[1];
struct iocb *iocb = (struct iocb*) malloc(sizeof (struct iocb));
io_prep_pwrite(iocb, fid, localBuffer, WRITE_SIZE, 0); //<--THIS WORKS (but is not what I want to do)
//io_prep_pwrite(iocb, fid, bufBaseLoc, WRITE_SIZE, 0); //<--THIS DOES NOT WORK (but is what I want to do)
io_set_callback(iocb, writeCallback);
iocbs[0] = iocb;
//Send Request
int res = io_submit(ctx, 1, iocbs);
if (res !=1)
{
fprintf(stderr, "IO_SUBMIT_ERROR: %s\n", strerror(-res));
}
//Reap Request
struct io_event events[1];
size_t ret = io_getevents(ctx, 1, 1, events, 0);
if (ret==1)
{
io_callback_t cb=(io_callback_t)events[0].data;
struct iocb *iocb_e = events[0].obj;
cb(ctx, iocb_e, events[0].res, events[0].res2);
}
}
It could be that your original buffer is not aligned.
Any buffer that libAIO writes to disk needs to be aligned(to around 4K I think). When you allocate your temporary buffer you use posix_memalign, which will allocate it correctly aligned, meaning the write will succeed. If your original reserved buffer is not aligned it could be causing you issues.
If you can try to allocate that initial buffer with an aligned alloc it could solve the issue.

How to initialize disk on Windows server 2008/2012 through a C++ program

We are trying to initialize disk with the properties of some existing disk on Windows server 2008/2012 through a C++ program.
We are using DeviceIoControl() method and IOCTL_DISK_CREATE_DISK, IOCTL_DISK_SET_DRIVE_LAYOUT_EX, IOCTL_DISK_SET_PARTITION_INFO_EX codes from Disk management control codes to make the disk available for use.
Got the following code snippet by searching a bit
//To open the drive
hDevice = CreateFile( TEXT("\\\\.\\PhysicalDrive7"),
GENERIC_READ | GENERIC_WRITE, // no access to the drive
FILE_SHARE_READ | FILE_SHARE_WRITE, // share mode
NULL, // default security attributes
OPEN_EXISTING, // disposition
0, // file attributes
NULL); // do not copy file attributes
CREATE_DISK dsk;
dsk.PartitionStyle = PARTITION_STYLE_MBR; //It can also be PARTITION_STYLE_GPT
dsk.Mbr.Signature = 1;
// Initialize disk
bResult = DeviceIoControl( hDevice, // device to be queried
IOCTL_DISK_CREATE_DISK, // operation to perform
&dsk, sizeof(dsk),
NULL, 0, // no output buffer
&junk, // # bytes returned
NULL
);
LARGE_INTEGER lgPartitionSize;
lgPartitionSize.QuadPart = (1024 * 1024 * 1024);
DWORD dwDriverLayoutInfoExLen = sizeof (DRIVE_LAYOUT_INFORMATION_EX) + 3 * sizeof(PARTITION_INFORMATION_EX);
DRIVE_LAYOUT_INFORMATION_EX *pdg = (DRIVE_LAYOUT_INFORMATION_EX *)new BYTE[dwDriverLayoutInfoExLen];
SecureZeroMemory(pdg, dwDriverLayoutInfoExLen);
pdg->PartitionStyle = PARTITION_STYLE_MBR;
pdg->PartitionCount = 1;
pdg->Mbr.Signature = 1;
pdg->PartitionEntry[0].PartitionStyle = PARTITION_STYLE_MBR;
pdg->PartitionEntry[0].StartingOffset.QuadPart = 1048576;
pdg->PartitionEntry[0].PartitionLength.QuadPart = lgPartitionSize.QuadPart * 200;
pdg->PartitionEntry[0].PartitionNumber = 1;
pdg->PartitionEntry[0].RewritePartition = TRUE;
pdg->PartitionEntry[0].Mbr.PartitionType = PARTITION_NTFT; // PARTITION_IFS (NTFS partition or logical drive)
pdg->PartitionEntry[0].Mbr.BootIndicator = TRUE;
pdg->PartitionEntry[0].Mbr.RecognizedPartition = 1;
pdg->PartitionEntry[0].Mbr.HiddenSectors = 32256 / 512;
// Partition a disk
bResult = DeviceIoControl( hDevice, // device to be queried
IOCTL_DISK_SET_DRIVE_LAYOUT_EX, // operation to perform
pdg, sizeof DRIVE_LAYOUT_INFORMATION_EX, //output buffer
NULL, 0, // no output buffer
&junk, // # bytes returned
NULL
);
bResult = DeviceIoControl( hDevice,
IOCTL_DISK_UPDATE_PROPERTIES,
NULL, 0, NULL, 0, &junk, NULL);
PARTITION_INFORMATION_EX dskinfo;
PARTITION_INFORMATION_MBR mbrinfo;
mbrinfo.PartitionType = PARTITION_NTFT;
mbrinfo.HiddenSectors = (32256 / 512);
mbrinfo.BootIndicator = 1;
mbrinfo.RecognizedPartition = 1;
dskinfo.PartitionStyle = PARTITION_STYLE_MBR;
dskinfo.StartingOffset.QuadPart = 1048576;//0;
dskinfo.PartitionLength.QuadPart = lgPartitionSize.QuadPart * 200;
dskinfo.PartitionNumber = 1;
dskinfo.RewritePartition = TRUE;
dskinfo.Mbr = mbrinfo;
bResult = DeviceIoControl( hDevice, // device to be queried
IOCTL_DISK_SET_PARTITION_INFO_EX, // operation to perform
&dskinfo, sizeof(dskinfo), // output buffer
NULL, 0, // no output buffer
&junk, // # bytes returned
NULL
);
All the calls to DeviceIoControl() are getting succeeded except the last one with IOCTL_DISK_SET_PARTITION_INFO_EX code with error 1 (i.e Incorrect function). What could be the reason for this?
If we comment out the last call, the disk is being initialized as raw disk, But this won't meet our requirements.
The above sample is for MBR partition style only. We could not find any sample for GPT,... styles. Please give a link if someone is aware of one.
You're using the wrong structure type with IOCTL_DISK_SET_PARTITION_INFO_EX. It takes a SET_PARTITION_INFORMATION_EX structure, not a PARTITION_INFORMATION_EX structure.
You probably don't need to use IOCTL_DISK_SET_PARTITION_INFO_EX, since it just sets the partition type, which should have already been set with IOCTL_DISK_SET_DRIVE_LAYOUT_EX. Unfortunately you've used it to set the wrong partition type. NTFS partitions have the partition type PARTITION_IFS.
Multiplying lgPartitionSize by 200 is almost certainly wrong. If lgPartitionSize is supposed the size in sectors then you need to multiply this by the sector size of the disk. The sector size of hard drives used to be always 512 bytes (0x200 bytes), but modern drives use a 4096 byte sector size.
Correctly creating partition tables is not easy, and mindlessly copying other people's code like you've done isn't going to work. Even after you fix the problems I've mentioned above you're still likely to encounter other issues. You really need to understand the all the restrictions placed on how partitions should be laid out.
You might want to consider using the diskpart command to programically initialize disks instead of C++ code.

C++ using 7zip.dll

I'm developing an app which will need to work with different types of archives. As many of the archive types as possible is good. I have choosen a 7zip.dll as an engine of archive-worker. But there is a problem, does anyone knows how to uncompress a file from archive to memory buffer? As I see, 7zip.dll supports only uncompressing to hard disk. Also, it would be nice to load archive from memory buffer. Has anyone tried to do something like that?
Not sure if I completely understand your needs (for example, don't you need the decompressed file on disk?).
I was looking at LZMA SDK 9.20 and its lzma.txt readme file, and there are plenty of hints that decompression to memory is possible - you may just need to use the C API rather than the C++ interface. Check out, for example, the section called Single-call Decompressing:
When to use: RAM->RAM decompressing
Compile files: LzmaDec.h + LzmaDec.c + Types.h
Compile defines: no defines
Memory Requirements:
- Input buffer: compressed size
- Output buffer: uncompressed size
- LZMA Internal Structures: state_size (16 KB for default settings)
Also, there is this function:
SRes LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode, ELzmaStatus *status);
You can utilize these by memory-mapping the archive file. To the best of my knowledge, if your process creates a memory-mapped file with exclusive access (so no other process can access it) and does no explicit flushing, all changes to the file will be kept in memory until the mapping is destroyed or the file closed. Alternatively, you could just load the archive contents in memory.
For the sake of completeness, I hacked together several examples into a demo of using memory mapping in Windows.
#include <stdio.h>
#include <time.h>
#include <Windows.h>
#include <WinNT.h>
// This demo will limit the file to 4KiB
#define FILE_SIZE_MAX_LOWER_DW 4096
#define FILE_SIZE_MAX_UPPER_DW 0
#define MAP_OFFSET_LOWER_DW 0
#define MAP_OFFSET_UPPER_DW 0
#define TEST_ITERATIONS 1000
#define INT16_SIZE 2
typedef short int int16;
// NOTE: This will not work for Windows less than XP or 2003 Server!
int main()
{
HANDLE hFile, hFileMapping;
PBYTE mapViewStartAddress;
// Note: with no explicit security attributes, the process needs to have
// the necessary rights (e.g. read, write) to this location.
LPCSTR path = "C:\\Users\\mcmlxxxvi\\Desktop\\test.dat";
// First, open a file handle.
hFile = CreateFile(path,
GENERIC_READ | GENERIC_WRITE, // The file is created with Read/Write permissions
FILE_SHARE_READ, // Set this to 0 for exclusive access
NULL, // Optional security attributes
CREATE_ALWAYS, // File is created if not found, overwritten otherwise
FILE_ATTRIBUTE_TEMPORARY, // This affects the caching behaviour
0); // Attributes template, can be left NULL
if ((hFile) == INVALID_HANDLE_VALUE)
{
fprintf(stderr, "Unable to open file");
return 1;
}
// Then, create a memory mapping for the opened file.
hFileMapping = CreateFileMapping(hFile, // Handle for an opened file
NULL, // Optional security attributes
PAGE_READWRITE, // File can be mapped for Read/Write access
FILE_SIZE_MAX_UPPER_DW, // Maximum file size split in DWORDs.
FILE_SIZE_MAX_LOWER_DW, // NOTE: I may have these two mixed up!
NULL); // Optional name
if (hFileMapping == 0)
{
CloseHandle(hFile);
fprintf(stderr, "Unable to open file for mapping.");
return 1;
}
// Next, map a view (a continuous portion of the file) to a memory region
// The view must start and end at an offset that is a multiple of
// the allocation granularity (roughly speaking, the machine page size).
mapViewStartAddress = (PBYTE)MapViewOfFile(hFileMapping, // Handle to a memory-mapped file
FILE_MAP_READ | FILE_MAP_WRITE, // Maps the view for Read/Write access
MAP_OFFSET_UPPER_DW, // Offset in the file from which
MAP_OFFSET_LOWER_DW, // the view starts, split in DWORDs.
FILE_SIZE_MAX_LOWER_DW); // Size of the view (here, entire file)
if (mapViewStartAddress == 0)
{
CloseHandle(hFileMapping);
CloseHandle(hFile);
fprintf(stderr, "Couldn't map a view of the file.");
return 1;
}
// This is where actual business stuff belongs.
// This example application does iterations of reading and writing
// random numbers for the entire length of the file.
int16 value;
errno_t result = 0;
srand((int)time(NULL));
for (int i = 0; i < TEST_ITERATIONS; i++)
{
// Write
for (int j = 0; j < FILE_SIZE_MAX_LOWER_DW / INT16_SIZE; j++)
{
value = rand();
result = memcpy_s(mapViewStartAddress + j * INT16_SIZE, INT16_SIZE, &value, INT16_SIZE);
if (result != 0)
{
CloseHandle(hFileMapping);
CloseHandle(hFile);
fprintf(stderr, "File write error during iteration #%d, error %d", i, GetLastError());
return 1;
}
}
// Read
SetFilePointer(hFileMapping, 0, 0, FILE_BEGIN);
for (int j = 0; j < FILE_SIZE_MAX_LOWER_DW / sizeof(int); j++)
{
result = memcpy_s(&value, INT16_SIZE, mapViewStartAddress + j * INT16_SIZE, INT16_SIZE);
if (result != 0)
{
CloseHandle(hFileMapping);
CloseHandle(hFile);
fprintf(stderr, "File read error during iteration #%d, error %d", i, GetLastError());
return 1;
}
}
}
// End business stuff
CloseHandle(hFileMapping);
CloseHandle(hFile);
return 0;
}

How can I detect only deleted, changed, and created files on a volume?

I need to know if there is an easy way of detecting only the files that were deleted, modified or created on an NTFS volume.
I have written a program for offsite backup in C++. After the first backup, I check the archive bit of each file to see if there was any change made, and back up only the files that were changed. Also, it backs up from the VSS snapshot in order to prevent file locks.
This seems to work fine on most file systems, but for some with lots of files and directories, this process takes too long and often the backup takes more than a day to finish backing up.
I tried using the change journal to easily detect changes made on an NTFS volume, but the change journal would show a lot of records, most of them relating to small temporary files created and destroyed. Also, I could the file name, file reference number, and the parent file reference number, but I could not get the full file path. The parent file reference number is somehow supposed to give you the parent directory path.
EDIT: This needs to run everyday, so at the beginning of every scan, it should record only the changes that took place since the last scan. Or atleast, there should be a way to say changes since so and so time and date.
You can enumerate all the files on a volume using FSCTL_ENUM_USN_DATA. This is a fast process (my tests returned better than 6000 records per second even on a very old machine, and 20000+ is more typical) and only includes files that currently exist.
The data returned includes the file flags as well as the USNs so you could check for changes whichever way you prefer.
You will still need to work out the full path for the files by matching the parent IDs with the file IDs of the directories. One approach would be to use a buffer large enough to hold all the file records simultaneously, and search through the records to find the matching parent for each file you need to back up. For large volumes you would probably need to process the directory records into a more efficient data structure, perhaps a hash table.
Alternately, you can read/reread the records for the parent directories as needed. This would be less efficient, but the performance might still be satisfactory depending on how many files are being backed up. Windows does appear to cache the data returned by FSCTL_ENUM_USN_DATA.
This program searches the C volume for files named test.txt and returns information about any files found, as well as about their parent directories.
#include <Windows.h>
#include <stdio.h>
#define BUFFER_SIZE (1024 * 1024)
HANDLE drive;
USN maxusn;
void show_record (USN_RECORD * record)
{
void * buffer;
MFT_ENUM_DATA mft_enum_data;
DWORD bytecount = 1;
USN_RECORD * parent_record;
WCHAR * filename;
WCHAR * filenameend;
printf("=================================================================\n");
printf("RecordLength: %u\n", record->RecordLength);
printf("MajorVersion: %u\n", (DWORD)record->MajorVersion);
printf("MinorVersion: %u\n", (DWORD)record->MinorVersion);
printf("FileReferenceNumber: %lu\n", record->FileReferenceNumber);
printf("ParentFRN: %lu\n", record->ParentFileReferenceNumber);
printf("USN: %lu\n", record->Usn);
printf("Timestamp: %lu\n", record->TimeStamp);
printf("Reason: %u\n", record->Reason);
printf("SourceInfo: %u\n", record->SourceInfo);
printf("SecurityId: %u\n", record->SecurityId);
printf("FileAttributes: %x\n", record->FileAttributes);
printf("FileNameLength: %u\n", (DWORD)record->FileNameLength);
filename = (WCHAR *)(((BYTE *)record) + record->FileNameOffset);
filenameend= (WCHAR *)(((BYTE *)record) + record->FileNameOffset + record->FileNameLength);
printf("FileName: %.*ls\n", filenameend - filename, filename);
buffer = VirtualAlloc(NULL, BUFFER_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
if (buffer == NULL)
{
printf("VirtualAlloc: %u\n", GetLastError());
return;
}
mft_enum_data.StartFileReferenceNumber = record->ParentFileReferenceNumber;
mft_enum_data.LowUsn = 0;
mft_enum_data.HighUsn = maxusn;
if (!DeviceIoControl(drive, FSCTL_ENUM_USN_DATA, &mft_enum_data, sizeof(mft_enum_data), buffer, BUFFER_SIZE, &bytecount, NULL))
{
printf("FSCTL_ENUM_USN_DATA (show_record): %u\n", GetLastError());
return;
}
parent_record = (USN_RECORD *)((USN *)buffer + 1);
if (parent_record->FileReferenceNumber != record->ParentFileReferenceNumber)
{
printf("=================================================================\n");
printf("Couldn't retrieve FileReferenceNumber %u\n", record->ParentFileReferenceNumber);
return;
}
show_record(parent_record);
}
void check_record(USN_RECORD * record)
{
WCHAR * filename;
WCHAR * filenameend;
filename = (WCHAR *)(((BYTE *)record) + record->FileNameOffset);
filenameend= (WCHAR *)(((BYTE *)record) + record->FileNameOffset + record->FileNameLength);
if (filenameend - filename != 8) return;
if (wcsncmp(filename, L"test.txt", 8) != 0) return;
show_record(record);
}
int main(int argc, char ** argv)
{
MFT_ENUM_DATA mft_enum_data;
DWORD bytecount = 1;
void * buffer;
USN_RECORD * record;
USN_RECORD * recordend;
USN_JOURNAL_DATA * journal;
DWORDLONG nextid;
DWORDLONG filecount = 0;
DWORD starttick, endtick;
starttick = GetTickCount();
printf("Allocating memory.\n");
buffer = VirtualAlloc(NULL, BUFFER_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
if (buffer == NULL)
{
printf("VirtualAlloc: %u\n", GetLastError());
return 0;
}
printf("Opening volume.\n");
drive = CreateFile(L"\\\\?\\c:", GENERIC_READ, FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_ALWAYS, FILE_FLAG_NO_BUFFERING, NULL);
if (drive == INVALID_HANDLE_VALUE)
{
printf("CreateFile: %u\n", GetLastError());
return 0;
}
printf("Calling FSCTL_QUERY_USN_JOURNAL\n");
if (!DeviceIoControl(drive, FSCTL_QUERY_USN_JOURNAL, NULL, 0, buffer, BUFFER_SIZE, &bytecount, NULL))
{
printf("FSCTL_QUERY_USN_JOURNAL: %u\n", GetLastError());
return 0;
}
journal = (USN_JOURNAL_DATA *)buffer;
printf("UsnJournalID: %lu\n", journal->UsnJournalID);
printf("FirstUsn: %lu\n", journal->FirstUsn);
printf("NextUsn: %lu\n", journal->NextUsn);
printf("LowestValidUsn: %lu\n", journal->LowestValidUsn);
printf("MaxUsn: %lu\n", journal->MaxUsn);
printf("MaximumSize: %lu\n", journal->MaximumSize);
printf("AllocationDelta: %lu\n", journal->AllocationDelta);
maxusn = journal->MaxUsn;
mft_enum_data.StartFileReferenceNumber = 0;
mft_enum_data.LowUsn = 0;
mft_enum_data.HighUsn = maxusn;
for (;;)
{
// printf("=================================================================\n");
// printf("Calling FSCTL_ENUM_USN_DATA\n");
if (!DeviceIoControl(drive, FSCTL_ENUM_USN_DATA, &mft_enum_data, sizeof(mft_enum_data), buffer, BUFFER_SIZE, &bytecount, NULL))
{
printf("=================================================================\n");
printf("FSCTL_ENUM_USN_DATA: %u\n", GetLastError());
printf("Final ID: %lu\n", nextid);
printf("File count: %lu\n", filecount);
endtick = GetTickCount();
printf("Ticks: %u\n", endtick - starttick);
return 0;
}
// printf("Bytes returned: %u\n", bytecount);
nextid = *((DWORDLONG *)buffer);
// printf("Next ID: %lu\n", nextid);
record = (USN_RECORD *)((USN *)buffer + 1);
recordend = (USN_RECORD *)(((BYTE *)buffer) + bytecount);
while (record < recordend)
{
filecount++;
check_record(record);
record = (USN_RECORD *)(((BYTE *)record) + record->RecordLength);
}
mft_enum_data.StartFileReferenceNumber = nextid;
}
}
Additional notes
As discussed in the comments, you may need to replace MFT_ENUM_DATA with MFT_ENUM_DATA_V0 on versions of Windows later than Windows 7. (This may also depend on what compiler and SDK you are using.)
I'm printing the 64-bit file reference numbers as if they were 32-bit. That was just a mistake on my part. Probably in production code you won't be printing them anyway, but FYI.
The change journal is your best bet. You can use the file reference numbers to match file creation/deletion pairs and thus ignore temporary files, without having to process them any further.
I think you have to scan the Master File Table to make sense of ParentFileReferenceNumber. Of course you only need to keep track of directories when doing this, and use a data structure that will allow you to quickly lookup the information, so you only need to scan the MFT once.
You can use ReadDirectoryChanges and surrounding windows API.
I know how to achieve this in java. It will help you if you implement Java code inside C++.
In Java you can achieve this using Jnotify API.It looks for changes in sub-directory also.