I have found a few strange things while using C++ Thread - c++

I realized the salsa20 algorithm on my computer(windows10) yesterday.
This salsa20 function is a function that receives fileName and key and converts a file at a very high speed.
The main part of the code is here:
void salsa20(const char *fileName, const char *key) {
uint8_t temp[BN]; //40KByte memory
int currentpos = 0;
int step = BN * 100;
int file = open(fileName, O_RDWR | O_BINARY, S_IRUSR | S_IWUSR);
int readSize = 0;
int bufsec = 0;
int i;
while (readSize = read(file, temp, BN)) {
if (readSize < 0) break;
for (i = 0; i < readSize; i += 64) {
salsa_encrypt((uint8_t*)key, bufsec++, (uint8_t*)temp + i, readSize - i);
}
lseek(file, -readSize, SEEK_CUR);
write(file, temp, readSize);
lseek(file, readSize, SEEK_CUR);
}
close(file);
}
int main() {
salsa20("1.iso", "PASSWORDTYPE1___!##$%^&*()!##$%^");
}
This function worked well and the memory usage was very small (<1Mbyte).
And today I wanted to convert several files at the same time by using this function with several threads.
int main() {
thread t1(salsa20, "1.iso", "PASSWORDTYPE1___!##$%^&*()!##$%^");
thread t2(salsa20, "2.iso", "PASSWORDTYPE2___!##$%^&*()!##$%^");
thread t3(salsa20, "3.iso", "PASSWORDTYPE2___!##$%^&*()!##$%^");
t1.join();
t2.join();
t3.join();
}
By the way, the speed was a bit faster, but I found that the usage of momory suddenly increased by more than 700Mbyte(12.3G->13.1G) and it gradually decreased, and even though the program was terminated, it was completely recovered after about 30 seconds.
It is thought to be a problem related to the operation system and file management, but I have not yet had an accurate understanding of it.
I would like to know the countermeasures to prevent memory increase while using threads.
I need a safe program that uses threads but does not use memory.

In the Windows system, when using Thread, if the processing speed of the Harddisk does not keep up, cache memory is used. Of course, if Thread is not used, this phenomenon does not appear. Perhaps if you reduce the number of threads to 1 or 2, such a phenomenon may not appear. However, the influence of other programs that are thoroughly executed on the operating system is also related.
You can probably try running this program on a Linux system as well.
In such a case, you can see that the operating system uses cache memory when reading and writing files.
This memory use can be initialized by using the posix_fadvise function.

You can change the code like this on windows.
void salsa20(const char* fileName, const char* key) {
uint8_t temp[BN]; //40KByte memory
FILE* file;
int currentpos = 0;
int step = BN * 100;
HANDLE hFile = CreateFileA(fileName, GENERIC_READ|GENERIC_WRITE, FILE_SHARE_READ|FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_FLAG_NO_BUFFERING, NULL);
DWORD readSize = 0, writeSize;
int bufsec = 0;
int i;
DWORD dwSizeHi;
DWORD dwSizeLo = GetFileSize(hFile, &dwSizeHi);
while (true) {
ReadFile(hFile, temp, BN, (DWORD*) &readSize, NULL);
for (i = 0; i < readSize; i += 64) {
salsa_encrypt((uint8_t*)key, bufsec++, (uint8_t*)temp + i, readSize - i);
}
SetFilePointer(hFile, -(LONG)readSize, NULL, FILE_CURRENT);
WriteFile(hFile, temp, BN, (DWORD*) & writeSize, NULL);
if (BN != readSize)
break;
}
CloseHandle(hFile);
hFile = CreateFileA(fileName, GENERIC_READ|GENERIC_WRITE, FILE_SHARE_READ|FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL);
SetFilePointer(hFile, dwSizeLo, (LONG*)&dwSizeHi, FILE_BEGIN);
SetEndOfFile(hFile);
CloseHandle(hFile);
}
On linux system, you can use posix_fadvise function.
void salsa20_256(const char *fileName, const char *key) {
uint8_t temp[BN]; //40KByte memory
int file = open(fileName, O_RDWR | O_BINARY, S_IRWXU);
fdatasync(file);
posix_fadvise(file, 0,0,POSIX_FADV_DONTNEED);
int readSize = 0;
int bufsec = 0;
int i;
long long totalSize = 0;
while (readSize = read(file, temp, BN)) {
if (readSize < 0) break;
for (i = 0; i < readSize; i += 64) {
salsa_encrypt((uint8_t*)key, bufsec++, (uint8_t*)temp + i, readSize - i);
}
lseek(file, -readSize, SEEK_CUR);
write(file, temp, readSize);
lseek(file, 0, SEEK_CUR);
posix_fadvise(file, 0,0,POSIX_FADV_DONTNEED);
}
posix_fadvise(file, 0,0,POSIX_FADV_DONTNEED);
close(file);
}

This program is safe. The operating system has made a decision to keep the data written in memory. This may be to avoid having the program wait until the write completes. This may be in case another program reads the data soon.
Unless you think the operating system's decision is bad -- which it can only be if the memory was needed for some other purpose -- then this is not a problem. This is the system being efficient.
Note that you are not measuring the program's memory usage. You are measuring the operating system's memory usage while the program runs. The operating system may, for example, decide to cache more data from disk because of what the program is doing. It's usually going to benefot from those kinds of decisions and likely here it lets the program complete much faster and thus makes its results available to other programs faster.

Related

Is it possible to send the instance of class with member function between two process?

Consider I have this kind of class in process 1. And run process2 whose code is written in completely different file.
class A{
public:
int data;
A()
int f();
}
I want to make instance a of class A in process1, send it to process2, and run a.f() in process2. I studied about IPC mechanism including POSIX shared memory or message queue and read many examples, but the most of examples only expalin how to send single data like integer, or struct without member function.
Is it possible?
If it's possible, how can I do it using POSIX shared memory? Can you give me short example?
It is definitely possible to do it in shared memory.
Take for example the following class:
struct A {
A(int initial_value) {
internal_value = initial_value;
}
void write(int value) {
internal_value = value;
}
int read() const {
return internal_value;
}
void watch(int value) {
while (internal_value == value) usleep(1000);
}
std::atomic<int> internal_value;
};
The class uses an atomic counter inside that will be shared between two processes, both reading and writing from it.
Let's first create some shared memory space for it.
int mapsize = getpagesize();
int fid = ::open("/tmp/shared.dat", O_CREAT | O_RDWR, S_IRWXU | S_IRWXG);
ftruncate(fid, mapsize);
void* ptr = ::mmap(nullptr, mapsize, PROT_READ | PROT_WRITE, MAP_SHARED, fid, 0);
Notice I am not checking the return values for brevity but you should do in your code.
Now with this memory space allocated, let's initialize an object of class "A" inside that memory with placement new.
A* a = new (ptr) A(0); // placement new
Then we fork another process to simulate the IPC mechanism working
pid_t pid = fork();
For the parent process code, we loop from 0 to 10 first waiting for it to be different from zero then 2, 4 etc. Then in the end we wait for the child to finish.
if (pid != 0) {
for (int j = 0; j < 10; j += 2) {
printf("-->parent %d\n", j);
a->watch(j);
a->write(j + 2);
}
printf("Finishing parent\n");
int status;
wait(&status);
For the child process, we write 1 and wait for 1, then write 3 and wait for 3 and so on like the parent but with odd numbers.
} else {
for (int j = 1; j < 10; j += 2) {
printf("-->child %d\n", j);
a->write(j);
a->watch(j);
}
printf("Finishing child\n");
}
In the end of both we unmap the memory and close the file
::munmap(ptr, mapsize);
::close(fid);
It prints:
-->parent 0
-->child 1
-->parent 2
-->child 3
-->parent 4
-->child 5
-->parent 6
-->child 7
-->parent 8
-->child 9
Finishing parent
Finishing child
Compiler Explorer link: https://godbolt.org/z/5orPaEhPM
If you need two independent processes running side by side you just need to be careful with how you create them, synchronize them.
Start the same way opening a file and memory mapping it
int mapsize = getpagesize();
const char* filename = "/tmp/shared.dat";
int fid = ::open(filename, O_CREAT | O_RDWR, S_IRWXU | S_IRWXG);
void* ptr = ::mmap(nullptr, mapsize, PROT_READ | PROT_WRITE, MAP_SHARED, fid, 0);
Notice that at this point if you write or read from the pointer ptr you will get a segfault.
Then you need to lock the file
while (flock(fid, LOCK_EX) != 0) {
usleep(10000); // sleep a bit
}
Then you need to check the file. If the size is zero, it is uninitialized. If the file is already initialized, you do not use placement new, you just cast the raw pointer.
A* a;
if (getfilesize(fid) == 0) {
ftruncate(fid, mapsize);
a = new (ptr) A(0);
} else {
a = reinterpret_cast<A*>(ptr);
}
After that you read your value and increment it while the file is still locked.
int value = a->read();
value += 1;
a->write(value);
Then you can unlock the file
flock(fid, LOCK_UN); // unlock
Then it's just the same drill of waiting, incrementing and looping, only a tad different.
// Change 5 times
for (int j = 0; j < 5; ++j) {
printf("-->proc %d %d\n", getpid(), value);
a->watch(value);
value += 2;
a->write(value);
}
In the end, just unmap and close as before
::munmap(ptr, mapsize);
::close(fid);
The whole code is here: https://godbolt.org/z/zK3WKWqj4
Do not to use IPC, but RPC (Remote Procedure Calls) for that. E.g. gRPC : https://grpc.io/docs/what-is-grpc/introduction/. Sharing data "manually" with shared memory can be done but you end up having to write synchronization mechanism yourself (locks on data, signals that data has been written into shared memory etc.)

Read 1 TB or greater size binary file using any Fastest reading method

I want to read 1 TB or greater size binary file using fastest reading method in c++.
I am trying file memory mapping and map a block of ((500 * 1024 * 1024)byte) size of the file sequentially and then read (every 32768 byte) from this block.
But it takes double time compared to if I use dynamic buffer to read and parse the file data.
System configuration:
Processor: Intel(R)Core(TM) i5-2310 CPU #2.90GHz 2.90 Ghz
RAM: 8 GB, OS: 64-bit,x64-based processor, windows10
I am using following code:
const unsigned int BlockSize = 500 * 1024 * 1024;
HANDLE hFile = CreateFile(FilePath, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_SEQUENTIAL_SCAN, 0);
if(hFile)
{ ULARGE_INTEGER uli; uli.LowPart = GetFileSize(hFile , &uli.HighPart);
LONGLONG FileSize = uli.QuadPart;
HANDLE hFileMap = NULL;
hFileMap = CreateFileMapping(hFile, NULL, PAGE_READONLY, 0, 0, NULL);
if (hFileMap !=NULL)
{
unsigned __int64 iPos =0,iLeft=0,iBytesToRead;
unsigned __int64 BlockSize = 500*1024*1024;
unsigned __int64 curPos=0;
while(iPos < FileSize)
{
iLeft = FileSize - iPos;
iBytesToRead = iLeft > BlockSize ? BlockSize: iLeft;
uli.QuadPart = iPos;
char* rawBuffer = (char*)MapViewOfFile(hFileMap, FILE_MAP_READ, uli.HighPart,uli.LowPart,iBytesToRead);
if (!rawBuffer) {
curPos=0;
while((curPos < iBytesToRead){
curPos += 32768;
}
UnmapViewOfFile(rawBuffer);
}
iPos += iBytesToRead ;
}
CloseHandle(hFileMap);
}
CloseHandle(hFile);
First I want to know that is file memory mapping the fastest way to read a binary file?
If it is the fastest method then why it take double the time?
How can I improve the above code to get maximum reading speed?
In MapViewOfFile() function, how much NumberOfBytes can I map if file size is greater than available RAM?
Already tried above method using multithread but no speed gain.
If there is any other fastest method to read a binary file (1 TB or greater size), please tell me.

Two-threaded app is slower than single-threaded on C++ (VC++ 2010 Express). How to solve?

I have some program that allocate memory a lot, I hoped to boost it's speed by splitting task on threads, but it made my program only slower.
I made this minimal example that has nothing to do with my real code aside of the fact it allocate memory in different threads.
class ThreadStartInfo
{
public:
unsigned char *arr_of_5m_elems;
bool TaskDoneFlag;
ThreadStartInfo()
{
this->TaskDoneFlag = false;
this->arr_of_5m_elems = NULL;
}
~ThreadStartInfo()
{
if (this->arr_of_5m_elems)
free(this->arr_of_5m_elems);
}
};
unsigned long __stdcall CalcSomething(void *tsi_ptr)
{
ThreadStartInfo *tsi = (ThreadStartInfo*)tsi_ptr;
for (int i = 0; i < 5000000; i++)
{
double *test_ptr = (double*)malloc(tsi->arr_of_5m_elems[i] * sizeof(double));
memset(test_ptr, 0, tsi->arr_of_5m_elems[i] * sizeof(double));
free(test_ptr);
}
tsi->TaskDoneFlag = true;
return 0;
}
void main()
{
ThreadStartInfo *tsi1 = new ThreadStartInfo();
tsi1->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
ThreadStartInfo *tsi2 = new ThreadStartInfo();
tsi2->arr_of_5m_elems = (unsigned char*)malloc(5000000 * sizeof(unsigned char));
ThreadStartInfo **tsi_arr = (ThreadStartInfo**)malloc(2 * sizeof(ThreadStartInfo*));
tsi_arr[0] = tsi1;
tsi_arr[1] = tsi2;
time_t start_dt = time(NULL);
CalcSomething(tsi1);
CalcSomething(tsi2);
printf("Task done in %i seconds.\n", time(NULL) - start_dt);
//--
tsi1->TaskDoneFlag = false;
tsi2->TaskDoneFlag = false;
//--
start_dt = time(NULL);
unsigned long th1_id = 0;
void *th1h = CreateThread(NULL, 0, CalcSomething, tsi1, 0, &th1_id);
unsigned long th2_id = 0;
void *th2h = CreateThread(NULL, 0, CalcSomething, tsi2, 0, &th2_id);
retry:
for (int i = 0; i < 2; i++)
if (!tsi_arr[i]->TaskDoneFlag)
{
Sleep(100);
goto retry;
}
CloseHandle(th1h);
CloseHandle(th2h);
printf("MT Task done in %i seconds.\n", time(NULL) - start_dt);
}
It prints me such results:
Task done in 16 seconds.
MT Task done in 19 seconds.
And... I didn't expected slow down. Is there anyway to make memory allocations faster in multiple threads?
Apart from some undefined behavior due to lack of synchronization on TaskDoneFlag, all the threads are doing is calling malloc/free repeatedly.
The Visual C++ CRT heap is single-threaded1, as malloc/free delegate to HeapAlloc/HeapFree which execute in a critical section (only one thread at a time). Calling them from more than one thread at a time will never be faster than a single thread, and often slower due to the lock contention overhead.
Either reduce allocations in threads or switch to another memory allocator, like jemalloc or tcmalloc.
1 See this note for HeapAlloc:
Serialization ensures mutual exclusion when two or more threads attempt to simultaneously allocate or free blocks from the same heap. There is a small performance cost to serialization, but it must be used whenever multiple threads allocate and free memory from the same heap. Setting the HEAP_NO_SERIALIZE value eliminates mutual exclusion on the heap. Without serialization, two or more threads that use the same heap handle might attempt to allocate or free memory simultaneously, likely causing corruption in the heap.

using write() function in C/C++ on Linux to write 70kB via UART Beaglebone Black

I'm trying to write a image via UART on Beaglebone Black. But when I use the write() function in library .
int write(int handle, void *buffer, int nbyte);
Regardless of the agurment nbyte has int type, I can not transfer 70kB at once. I displayed the number of byte which is transfered, and the result is the number of byte = 4111.
length = write(fd,body.c_str(),strlen(body.c_str())); //
cout<<length<<endl; // result length = 4111;
cout<<strlen(body.c_str())<<endl; // result strlen(body.c_str()) = 72255;
I hope to hear from you!
The write call does not assure that you can write the amount of data supplied, that's why it as an integer as its return, not a Boolean. The behavior you see is actually common among different operating systems, it may be due to the underline device might does not have sufficient buffer or storage for you to write 70kb. What you need is to write in a loop, each write will write the amount that is left unwritten:
int total = body.length(); // or strlen(body.c_str())
char *buffer = body.c_str();
int written = 0;
int ret;
while (written < total) {
ret = write(fd, buffer + written, total - written);
if (ret < 0) {
// error
break;
}
written += ret;
}

Mapping large files using MapViewOfFile

I have a very large file and I need to read it in small pieces and then process each piece. I'm using MapViewOfFile function to map a piece in memory, but after reading first part I can't read the second. It throws when I'm trying to map it.
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
HANDLE fileMap = CreateFileMapping(inputFile, NULL, PAGE_READONLY, 0, 0, input);
while (offset < fileSize)
{
long k = 0;
bool cutted = false;
offset -= tempBufferSize;
if (fileSize - offset <= bufferSize)
{
bufferSize = fileSize - offset;
}
char *buffer = new char[bufferSize + tempBufferSize];
for(int i = 0; i < tempBufferSize; i++)
{
buffer[i] = tempBuffer[i];
}
char *tmp_buffer = new char[bufferSize];
LPCWSTR input = L"input";
HANDLE inputFile;
OFSTRUCT tOfStr;
tOfStr.cBytes = sizeof tOfStr;
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
long long offsetLow = (offset & 0xFFFFFFFF);
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
memcpy(&buffer[tempBufferSize], &tmp_buffer[0], bufferSize);
UnmapViewOfFile(tmp_buffer);
offset += bufferSize;
offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
offsetLow = (offset & 0xFFFFFFFF);
if (offset < fileSize)
{
char *next;
next = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, 1);
if (next[0] >= '0' && next[0] <= '9')
{
cutted = true;
}
UnmapViewOfFile(next);
}
ostringstream path_stream;
path_stream << tempPath << splitNum;
ProcessChunk(buffer, path_stream.str(), cutted, bufferSize);
delete buffer;
cout << (splitNum + 1) << " file(s) sorted" << endl;
splitNum++;
}
One possibility is that you're not using an offset that's a multiple of the allocation granularity. From MSDN:
The combination of the high and low offsets must specify an offset within the file mapping. They must also match the memory allocation granularity of the system. That is, the offset must be a multiple of the allocation granularity. To obtain the memory allocation granularity of the system, use the GetSystemInfo function, which fills in the members of a SYSTEM_INFO structure.
If you try to map at something other than a multiple of the allocation granularity, the mapping will fail and GetLastError will return ERROR_MAPPED_ALIGNMENT.
Other than that, there are many problems in the code sample that make it very difficult to see what you're trying to do and where it's going wrong. At a minimum, you need to solve the memory leaks. You seem to be allocating and then leaking completely unnecessary buffers. Giving them better names can make it clear what they are actually used for.
Then I suggest putting a breakpoint on the calls to MapViewOfFile, and then checking all of the parameter values you're passing in to make sure they look right. As a start, on the second call, you'd expect offsetHigh to be 0 and offsetLow to be bufferSize.
A few suspicious things off the bat:
HANDLE inputFile = (HANDLE)OpenFile(inputFileName, &tOfStr, OF_READ);
Every cast should make you suspicious. Sometimes they are necessary, but make sure you understand why. At this point you should ask yourself why every other file API you're using requires a HANDLE and this function returns an HFILE. If you check OpenFile documentation, you'll see, "This function has limited capabilities and is not recommended. For new application development, use the CreateFile function." I know that sounds confusing because you want to open an existing file, but CreateFile can do exactly that, and it returns the right type.
long long offsetHigh = ((offset >> 32) & 0xFFFFFFFF);
What type is offset? You probably want to make sure it's an unsigned long long or equivalent. When bitshifting, especially to the right, you almost always want an unsigned type to avoid sign-extension. You also have to make sure that it's a type that has more bits than the amount you're shifting by--shifting a 32-bit value by 32 (or more) bits is actually undefined in C and C++, which allows the compilers to do certain types of optimizations.
long long offsetLow = (offset & 0xFFFFFFFF);
In both of these statements, you have to be careful about the 0xFFFFFFFF value. Since you didn't cast it or give it a suffix, it can be hard to predict whether the compiler will treat it as an int or unsigned int. In this case,
it'll be an unsigned int, but that won't be obvious to many people. In fact,
I got this wrong when I first wrote this answer. [This paragraph corrected 16-MAY-2017] With bitwise operations, you almost always want to make sure you're using unsigned values.
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
You're casting offsetHigh and offsetLow to ints, which are signed values. The API actually wants DWORDs, which are unsigned values. Rather than casting in the call, I would declare offsetHigh and offsetLow as DWORDs and do the casting in the initialization, like this:
DWORD offsetHigh = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD offsetLow = static_cast<DWORD>( offset & 0xFFFFFFFFul);
tmp_buffer = reinterpret_cast<const char *>(MapViewOfFile(fileMap, FILE_MAP_READ, offsetHigh, offsetLow, bufferSize));
Those fixes may or may not resolve your problem. It's hard to tell what's going on from the incomplete code sample.
Here's a working sample you can compare to:
// Calls ProcessChunk with each chunk of the file.
void ReadInChunks(const WCHAR *pszFileName) {
// Offsets must be a multiple of the system's allocation granularity. We
// guarantee this by making our view size equal to the allocation granularity.
SYSTEM_INFO sysinfo = {0};
::GetSystemInfo(&sysinfo);
DWORD cbView = sysinfo.dwAllocationGranularity;
HANDLE hfile = ::CreateFileW(pszFileName, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
if (hfile != INVALID_HANDLE_VALUE) {
LARGE_INTEGER file_size = {0};
::GetFileSizeEx(hfile, &file_size);
const unsigned long long cbFile =
static_cast<unsigned long long>(file_size.QuadPart);
HANDLE hmap = ::CreateFileMappingW(hfile, NULL, PAGE_READONLY, 0, 0, NULL);
if (hmap != NULL) {
for (unsigned long long offset = 0; offset < cbFile; offset += cbView) {
DWORD high = static_cast<DWORD>((offset >> 32) & 0xFFFFFFFFul);
DWORD low = static_cast<DWORD>( offset & 0xFFFFFFFFul);
// The last view may be shorter.
if (offset + cbView > cbFile) {
cbView = static_cast<int>(cbFile - offset);
}
const char *pView = static_cast<const char *>(
::MapViewOfFile(hmap, FILE_MAP_READ, high, low, cbView));
if (pView != NULL) {
ProcessChunk(pView, cbView);
}
}
::CloseHandle(hmap);
}
::CloseHandle(hfile);
}
}
You have a memory leak in your code:
char *tmp_buffer = new char[bufferSize];
[ ... ]
while (offset < fileSize)
{
[ ... ]
char *tmp_buffer = new char[bufferSize];
[ ... ]
tmp_buffer = (char *)MapViewOfFile(fileMap, FILE_MAP_READ, (int)offsetHigh, (int)offsetLow, bufferSize);
[ ... ]
}
You're never delete what you allocate via new char[] during every iteration there. If your file is large enough / you do enough iterations of this loop, the memory allocation will eventually fail - that's then you'll see a throw() done by the allocator.
Win32 API calls like MapViewOfFile() are not C++ and never throw, they return error codes (the latter NULL on failure). Therefore, if you see exceptions, something's wrong in you C++ code. Likely the above.
I also had some troubles with memory mapped files.
Basically I just wanted to share memory (1Mo) between 2 apps on the same Pc.
- Both apps where written in Delphi
- Using Windows8 Pro
At first one application (the first one launched) could read and write the memoryMappedFile, but the second one could only read it (error 5 : AccessDenied)
Finally after a lot of testing It suddenly worked when both application where using CreateFileMapping. I even tried to create my on security descriptor, nothing helped.
Just before my applications where first calling OpenFileMapping and then CreateFileMapping if the first one failed
Another thing that misleaded me is that the handles , although visibly referencing the same MemoryMappedFile where different in both applications.
One last thing, after this correction my application seemed to work all right, but after a while I had error_NotEnough_Memory. when calling MapViewOfFile.
It was just a beginner's mistake of my part, I was not always calling UnmapViewOfFile.