Visual C++6 MFC MapViewOfFile returns error code 8 - c++

I have a program that is creating a map file, its able to do that call just fine, m_hMap = CreateFileMapping(m_hFile,0,dwProtect,0,m_dwMapSize,NULL);, but when the subsequent function call to MapViewOfFile(m_hMap,dwViewAccess,0,0,0), I get an error code of 8, which is ERROR_NOT_ENOUGH_MEMORY, or error string "error Not enough storage is available to process this command".
So I'm not totally understanding what the MapViewOfFile does for me, and how to fix the situation.
some numbers...
m_dwMapSize = 453427200
dwProtect = PAGE_READWRITE;
dwViewAccess = FILE_MAP_ALL_ACCESS;
I think my page size is 65536

In case of very large file and to read it, it is recommended to read it in small pieces and then process each piece. And MapViewOfFile function is used to map a piece in memory.
Look at http://msdn.microsoft.com/en-us/library/windows/desktop/aa366761(v=vs.85).aspx need offset to do its job properly i.e. in case you want to read a very large file in pieces. Mostly due to fragmentation and related reason very large memory request fails.

If you are working on a 64 bit processor then the system will allocate a total of 4GB memory with bit set LargeaddressAware.
go to Configuration properties->linker->system. in Enable largeaddressware: check
Yes /LARGEADDRESSAWARE and check.

Related

Parallel Writes to NFS-backed File

UPDATE: I had each node write to a separate file, and when the separate files were concatenated together the result was correct. I also updated the code to attempt a channel flush and file sync after each write of a single record, but there are still issues between nodes 0 and 1, now. If I make Node 0 sleep for a few seconds before it starts its iteration of the coforall loop, the records come out correct. If not, the last few hundred bytes of Node 0's records seem to be reliably overwritten with NULL bytes, up to the start of Node 1's records. The issues between Node 1 and Node 2, and Node 2 and Node 3, seem to not show up anymore.
Additionally, if I suppress either Node 0 or Node 1 from writing, I see the fully-formed records from the un-suppressed node written correctly to the file. In the case that Node 1 is suppressed, I see 9,997 100B records (or 999,700) correct bytes followed by NULL bytes in the file where Node 1's suppressed records would go. In the case that Node 0 is suppressed, I see exactly 999,700 NULL bytes in the file, after which Node 1's records begin.
Original Post:
I'm trying to troubleshoot an issue with parallel writes from different nodes to a shared NFS-backed file on disk. At the moment, I suspect that something is wrong with the way writes to the disk happen on the NFS server.
I'm working on adapting MPI+C code that uses pwrite to write to coordinated chunks of a file. If I try to have the equivalent locales in Chapel write to the file inside of a coforall loop, I end up with the bits of the file around the node boundaries messed up - usually the final few hundred bytes of each node's data are garbled. However, if I have just one locale iterate through the data on all locales and write it, the data comes out correctly. That is, I use the same data structures to calculate the offsets, but only Locale 0 seeks to that offset and performs the writes.
I've verified that the offsets into the file that each locale runs do not overlap, and I'm using a single channel per task, defined from within the on loc do block, so that tasks don't share a single channel.
Are there known issues with writing to a file from different locales? A lot of the documentation makes it seem like this is known to be safe, but an unsubstantiated guess seems to indicate that there are issues with caching of file contents; when examining the incorrect data, the bits that are incorrect seem to be the original data from the file in that location at the beginning of the program.
I've included the relevant routine below, in case you easily spot something I missed. To make this serial, I convert the coforall loc in Locales and on loc do block into a for j in 0..numLocales-1 loop, and replace here.id with j. Please let me know what else would help get to the bottom of this. Thanks!
proc write_share_of_data(data_filename: string, ref record_blocks) throws {
coforall loc in Locales {
on loc do {
var data_file: file = open(data_filename, iomode.cwr);
var data_writer = data_file.writer(kind=ionative, locking=false);
var line: [1..100] uint(8);
const indices = record_blocks[here.id].D;
var local_record_offset = + reduce record_blocks[0..here.id-1].D.size;
writeln("Loc ", here.id, ": record offset is ", local_record_offset);
var local_bytes_offset = terarec.terarec_width_disk * local_record_offset;
data_writer.seek(start=local_bytes_offset);
for i in indices {
var write_rec: terarec_t = record_blocks[here.id].records[i];
line[1..10] = write_rec.key;
line[11..98] = write_rec.value;
line[99] = 13; // \r
line[100] = 10; // \n
data_writer.write(line);
lines_written += 1;
}
data_file.fsync();
data_writer.close();
data_file.close();
}
}
return;
}
Adding an answer here that solved my particular problem, though it doesn't explain the behavior seen. I ended up changing the outer loop from coforall loc in Locales to for loc in Locales. This isn't too big of an issue since it's all writing to one file anyway - I doubt that multiple locales can actually make much headway in all attempting to write concurrently to a single file on an NFS server. As a result, the change still allows nodes to write the data they have locally to NFS, rather than forcing Node 0 to collect and then write the data on behalf of all locales. This amounts to only adding idle time to the write operation commensurate with the time it takes Locale 0 to start the remote task on other nodes when the previous node has finished writing, which for the application at hand is not a concern.
Have you tried specifying start/end in file.writer instead of using seek? Does that change anything? What about specifying the end offset for the channel.seek call? Does it matter if the file is created and has the appropriate size before you start?
Other than that, I wonder if this issue would appear for both NFS and Lustre. If it appears for both it might well be a Chapel bug. It sounds from your description that the C program was using this pattern, which points to it being a bug. But, have you run C code doing this on your setup? If it being a Chapel bug seems most likely after further investigation, we would appreciate a bug report issue with a reproducer.
I know that NFS does not always do what one would like, in terms of data consistency. It's my understanding that it has "close to open" semantics but it's unclear to me what that means in the context of opening a file and writing to a particular region within it, in parallel from different locales.
From Why NFS Sucks by Olaf Kirch:
An NFS client is permitted to cache changes locally and send them to
the server whenever it sees fit. This sort of lazy write-back greatly
helps write performance, but the flip side is that everyone else will
be blissfully unaware of these change before they hit the server. To
make things just a little harder, there is also no requirement for a
client to transmit its cached write in any particular fashion, so
dirty pages can (and often will be) written out in random order.
I read two implications from this paragraph that are relevant to your situation here:
The writes you do on different locales can be observed by the NFS server in an arbitrary order. (However as I understand it, the data should be sent to the server by the time your fsync call returns).
These writes are done at an OS page granularity (usually 4k). (Note that this is more a hypothesis I am making than it is a fact. It should be tested or further investigated).
It would be interesting to check if 2. is a plausible explanation for the behavior you are seeing. For example, you could explore having each locale operate on a multiple of 4096 records (or potentially try writing records of 4096 bytes each) and see if that changes the behavior. If 2 is indeed the explanation, it should be possible to create a C program that demonstrates the behavior as well.

Visual Studio C++ : same program using different amounts of RAM under different executable names

This might be a general question, but this problem is really getting me confused.
I have two different C++ applications, compiled with Visual Studio 2012, needing an instance of the same object. I have put a breakpoint before the creation of each object to measure the RAM usage by stepping my programs. The first one takes approximately 2.5 MiB more RAM after creating the object, while the second app is taking 30 MiB!
Both objects are created using a simple call to new with the same parameters. The code behind the constructors is the same.
As a detail: the first project contains much fewer .cpp files than the second one. So I thought it might be a problem of internal fragmentation of the exe. Plus, I've also tried to break BEFORE any code was executed inside the main function, and the memory usage was already much different (6 MiB for the first app, 35 MiB for the second one).
Does anyone have any idea what might be going on?
EDIT : The said object is a DirectX context, with a constructor creating a Direct3D instance and a device. Both the instance AND device are created the same way, but both have different RAM usage between the two apps.
Here is the code for the D3D device creation :
d3d_presentParams = new D3DPRESENT_PARAMETERS; ZeroMemory( d3d_presentParams, sizeof(D3DPRESENT_PARAMETERS) );
d3d_presentParams->Windowed = !window->isFullscreen();
d3d_presentParams->SwapEffect = D3DSWAPEFFECT_DISCARD;
d3d_presentParams->hDeviceWindow = window->getHwnd();
d3d_presentParams->MultiSampleType = antiAlias;
d3d_presentParams->EnableAutoDepthStencil = true;
d3d_presentParams->AutoDepthStencilFormat = D3DFMT_D32F_LOCKABLE;
d3d_presentParams->PresentationInterval = (info.m_vsync ? D3DPRESENT_INTERVAL_ONE : D3DPRESENT_INTERVAL_IMMEDIATE);
{
d3d_presentParams->BackBufferCount = 1;
d3d_presentParams->BackBufferFormat = D3DFMT_A8R8G8B8;
d3d_presentParams->BackBufferWidth = m_viewportSize.x;
d3d_presentParams->BackBufferHeight = m_viewportSize.y;
}
d3d->CreateDevice(
D3DADAPTER_DEFAULT,
D3DDEVTYPE_HAL,
window->getHwnd(),
D3DCREATE_HARDWARE_VERTEXPROCESSING | D3DCREATE_MULTITHREADED,
d3d_presentParams,
&d3d_device);
EDIT 2 : Problem has been "solved" for now. See the answer below.
Alright, I "solved" it, kind of. You're not going to believe what the problem was.
TL;DR : Renaming the executable lowers the RAM usage from 80 MiB to 28 MiB. It seems that Windows is suspicious of non-verified applications, as I discovered a directory full of logs inside C:/Users/<Me>/AppVerifierLogs/.
It seemed that RAM usage was linked to how many sources files it contained. Thus, I tried copying one of my projects entirely. I had to rename it, because a solution cannot have two projects with the same name... And guess what, memory usage was down to 7.3 MiB at the beginning of main(), instead of 30 MiB previously.
Later, I found out that it was a specific executable name that caused it to use more memory. I simply renamed the .exe file, and the problem was gone. I even tried to rename Firefox's exe by my app's name, and it made it crash.
And finally, I discovered in C:/Users/<me>/AppVerifierLogs/ at least 3000 files named after my application. This immediately made me think about the way Windows tries to verify applications (e.g. in order to see if they're not viruses).
I have no idea where this is coming from, nor how to turn off this verfier, but it's really annoying as a dev to have Windows panicking about an application you're developping, and injecting (possibly useless) stuff inside it.
The fix for now is just to rename the executable. If anyone knows about which application might have access to this "AppVerifierLogs" directory, please mention it, because even Google searches dont't seem to be able to help at this point...

Can a process be limited on how much physical memory it uses?

I'm working on an application, which has the tendency to use excessive amounts of memory, so I'd like to reduce this.
I know this is possible for a Java program, by adding a Maximum heap size parameter during startup of the Java program (e.g. java.exe ... -Xmx4g), but here I'm dealing with an executable on a Windows-10 system, so this is not applicable.
The title of this post refers to this URL, which mentions a way to do this, but which also states:
Maximum Working Set. Indicates the maximum amount of working set assigned to the process. However, this number is ignored by Windows unless a hard limit has been configured for the process by a resource management application.
Meanwhile I can confirm that the following lines of code indeed don't have any impact on the memory usage of my program:
HANDLE h_jobObject = CreateJobObject(NULL, L"Jobobject");
if (!AssignProcessToJobObject(h_jobObject, OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetCurrentProcessId())))
{
throw "COULD NOT ASSIGN SELF TO JOB OBJECT!:";
}
JOBOBJECT_EXTENDED_LIMIT_INFORMATION tagJobBase = { 0 };
tagJobBase.BasicLimitInformation.MaximumWorkingSetSize = 1; // far too small, just to see what happens
BOOL bSuc = SetInformationJobObject(h_jobObject, JobObjectExtendedLimitInformation, (LPVOID)&tagJobBase, sizeof(tagJobBase));
=> bSuc is true, or is there anything else I should expect?
In top of this, the mentioned tools (resource managed applications, like Hyper-V) seem not to work on my Windows-10 system.
Next to this, there seems to be another post about this subject "Is there any way to force the WorkingSet of a process to be 1GB in C++?", but here the results seem to be negative too.
For a good understanding: I'm working in C++, so the solution, proposed in this URL are not applicable.
So now I'm stuck with the simple question: is there a way, implementable in C++, to limit the memory usage of the current process, running on Windows-10?
Does anybody have an idea?
Thanks in advance

C++ memory allocation use Under Green Hills INTEGRITY

Sorry I'm new to Greenhill's. I'm using MULTI 6.1.6 and my language of choice is C++.
I have a problem when try to use simulator to initiate an object of a class bigger than 1M in size using new.
Class_Big* big_obj;
Class_Big = new Class_Big();
Class_Small* Small_obj;
Small_obj = new Class_Small();
if sizeOf(Class_Big) > 1MB it simply never call the class constructor, return NULL and go to the next instruction (Class_Small* Small_obj;) and creates the next object correctly. If I scope out some variables on the Class_Big to make its size < 1MB the code works fine and the object created.
I added both
MemoryPoolSize="0x200000"
HeapSize="0x200000"
to my xml file.
Another error I get in building phase If I used a lib have a big class:
intex: error: Not enough RAM for request.
intex: fatal: Integrate failed.
Error: build failed
Can you help with it?
Thanks
To specify memory sizes for the Heap and memory pool, in the MULTI GUI go to the .int file (it can be found under the .gpj dropdown when it is expanded) and double click on it to edit it. Then right-click inside the purple box and go to "Edit". Go to the "Attributes" tab and you can modify the memory pool size and heap size to be larger.
Alternatively you can just edit the .int file in a text editor, but if you want to use the gui to set these follow the above steps.
Also from their manual:
"Check the .bsp file in use. The memory declared with the
MinimumAddress/MaximumAddress keywords must match your board's memory.
If it does not, modify these keywords as needed. If the memory
declared in the .bsp file does match the board, you must modify your
application to use less memory."
Additionally, check the default.ld and you can set the values for the RAM limits there. Look at __INTEGRITY_RamLimit and the other values there. Hope this helps!
With INTEGRITY you are in total control of how much memory is used for each partition. It is a static configuration. Everything, code stack heap you name it, comes out of that. So if you have a bunch of code, automatics, etc in the partition then a memory allocation may fail if you ask for too much. Try increasing the size.
For the first part of the problem Basically I should have modified the "VirtualHeapSize" on the .ld component file.
Second part still try to figure it out.

NtQuerySystemInformation Hook Failure

After successfully building a trampoline and learning more about process memory space, I tested the trampoline on MessageBoxA. It worked perfectly so I decided to finally put the code to the use it was supposed to be for in the first place, hiding a process by hooking NtQuerySystemInformation. The redirect function should work fine, but the code I used to write the jmp instruction now causes the task manager to crash every time.
BYTE tmpJMP[5] = {0xE9,0x00,0x00,0x00,0x00}; //jmp,A,D,D,R
memcpy(JMP,tmpJMP,5);
DWORD Addr = ((DWORD)func - ((DWORD)oNtQuerySystemInformation + 0x5));
for (int i=0;i<4;++i)
JMP[i+1] = ((BYTE*)&Addr)[i];
if (VirtualProtect((LPVOID)oNtQuerySystemInformation,5,PAGE_EXECUTE_READWRITE,&oldProtect) == FALSE)
MessageBox(NULL,L"Error unprotecting memory",L"",MB_OK);
memcpy(oldBytes,(LPVOID)oNtQuerySystemInformation,5);
if (!WriteProcessMemory(GetCurrentProcess(),(LPVOID)oNtQuerySystemInformation,(LPCVOID)JMP,5,NULL))
MessageBox(NULL,L"Unable to write to process memory space",L"",MB_OK);
VirtualProtect((LPVOID)oNtQuerySystemInformation,5,oldProtect,NULL);
FlushInstructionCache(GetCurrentProcess(),NULL,NULL);
I'm writing to the memory as such. I can't seem to find a problem with the code. I was thinking that maybe the memory changes from API to API but I was told that's incorrect, making me all out confused. Is there anything wrong that you all see? Please be descriptive. I love to learn o3o