C++ _popen() windows leaks paged pool memory - c++

Main application runs in Windows service and that process starts other c++ console processes but all console modes are hidden, i.e. parent process is Windows Service and child processes are non-console applications.
Observed paged pool memory of the system is increasing during call _popen() on the customer system windows server 2016. The application runs clean on our lab system same OS.
From the Windows Performance tool xperf, captured the logs and check the call stack.
attached the pic for reference.
void CMachine::GetJavaVersion()
{
m_stJavaVersion.m_strName = " Java version";
CPUChar strVersion[64] = { 0 };
BOOL bFound = CheckJREVersion(strVersion, 64);
BYTE bytColorSt = RED;
string strRemark;
FILE *fp = NULL;
char version[130] = { 0 };
BOOL bFoundVersion = FALSE;
fp = _popen("java -version 2>&1", "r");
while (fp && fgets(version, sizeof version, fp))
{
string strTmp = version;
if (strTmp.find("version") != string::npos)
{
bFoundVersion = TRUE;
break;
}
}
if(fp) _pclose(fp);
....
PoolMon trace
Memory:33401164K Avail:30057324K PageFlts: 92362 InRam Krnl:20212K P:776328K
Commit:3228052K Limit:37595468K Peak:4747992K Pool N:182820K P:782568K
System pool information
Tag Type Allocs Frees Diff Bytes Per Alloc
Toke Paged 10546816 ( 390) 10319712 ( 382) 227104 324868080 ( 11392) 1430
CM31 Paged 42886 ( 0) 20849 ( 0) 22037 101154816 ( 0) 4590
SeAt Paged 44678436 (1662) 43769798 (1630) 908638 87253680 ( 3072) 96
QINi Paged 234 ( 0) 1 ( 0) 233 60293216 ( 0) 258769
MmSt Paged 2683066 ( 79) 2670922 ( 83) 12144 27223856 ( 3312) 2241
PoolMon

Eric Lippert writes about benchmark mistakes. I think mistake #1 applies to your case:
Mistake #1: Choosing a bad metric.
Why do you measure "paged pool" to determine a memory leak?
Paged memory is the memory that is swapped out to disk. This happens because the physical RAM is needed for something else. What is the physical RAM needed for? Probably for running the process that you start.
Once the memory is swapped to disk, it may take a while until it is swapped back to RAM. That will happen just when some other application tries to access the memory - and that may be minutes, if ever.
I also tend to say that memory isn't leaked during a method call but after a method call. After the method call, all variables should be destroyed and the related resources should be released.
If you are told that the paged pool is the cause, then ask for proof.
On my Windows 10 system, the paged pool limit is 17 GB. This can be shown by Process Explorer in View/System Information with Symbols configured.
If you're running java -version so often that it leaks 17 GB of kernel memory, then something is seriously wrong. Of course there will be a pipe or something to redirect the output from Java to your application so you can read the stream. There will also be other kernel objects like a process, a thread etc.
Even with 1 kB of kernel memory leak for each call, you would need to call that 17 million times to exhaust the paged pool. If that's the case, maybe you should consider caching the result anyway. It should be unlikely that server admins install and uninstall Java 17 million times in a few days.
For monitoring the paged pool, you can try Poolmon with /p /P command line parameters. Poolmon is part of the WDK.
Problems in your code:
Your code has at least 2 problems:
if "version" never appears in the output, your code might run in an endless loop. How could that happen? It's unlikely, but if I rename my HelloWorld.exe to java.exe, it could.
if "version" appears in the output but accidentally "ver" is in the first buffer and "sion" is in the second buffer, you'll never find out it actually was there. Your code could run into an endless loop.

Related

How to diagnose a visual studio project slowing down as time goes on?

Computer:
Processor: Intel Xeon Silver 4114 CPU # 2.19Ghz (2 processors)
Ram: 96 Gb 2666 Hz: 12 - 8 Gb sticks
OS: Windows 10
GPU: None
Hard drive: Samsung MZVLB512HAJQ-000H2 - 512GB M.2 PCIe NVMe
IDE:
Visual Studio 2019
I am including what I am doing in case it is relevant. I am running a visual studio code where I read data off a GSC PCI SIO4B Sync Card 256K. Using the API for this card (Documentation: http://www.generalstandards.com/downloads/GscApi.1.6.10.1.pdf) I read 150 bytes of data at a speed of 100Hz using the code below. That data is then being split into to the message structure my device. I can’t give info on the message structure but the data is then combined into the various words using a union and added to an integer array int Data[100];
Union Example:
union data_set{
unsigned int integer;
unsigned char input[2];
} word;
Example of how the data is read read:
PLX_PHYSICAL_MEM cpRxBuffer;
#define TEST_BUFFER_SIZE 0x400
//allocates memory for the buffer
cpRxBuffer.Size = TEST_BUFFER_SIZE;
status = GscAllocPhysicalMemory(BoardNum, &cpRxBuffer);
status = GscMapPhysicalMemory(BoardNum, &cpRxBuffer);
memset((unsigned char*)cpRxBuffer.UserAddr, 0xa5, sizeof(cpRxBuffer));
// start data reception:
status = GscSio4ChannelReceivePlxPhysData(BoardNum, iRxChannel, &cpRxBuffer, SetMaxBytes, &messageID);
// wait for Rx operation to complete
status = GscSio4ChannelWaitForTransfer(BoardNum, iRxChannel, 7000, messageID, &amount);
if (status)
{
// If we have an error, "bytesTransferred" will contain the number of bytes that we
// actually transmitted.
DisplayErrorMessage(status);
printf("\n\t%04X bytes out of %04X transferred", amount, SetMaxBytes);
}
My issue is that this code works fine and keeps up for around 5 minutes then randomly it stops being able to keep up and the FIFO (first in first out) register on the PCI card begins to fill up faster than the code can process the data. To me this seems like a memory leak issue since the code works fine for a long time, then starts to slow down when nothing has changed as all the code is doing it reading the data off the card. We used to save the data in a really large array but even after removing that we had the same issue.
I am unsure how to figure out exactly what is happening and I'm hopping for a way to determine if there is a memory leak and how to fix it if there is.
It being a data leak is only a guess though and it very well could be something else that is the problem so any out of the box suggestions for diagnosing the problem are also appreciated.
Similar to Paul's answer, but I like to strategically place two (or more) _CrtMemCheckpoint followed by _CrtMemDifference, to cut down the noise.
Memory leaks can be detected and reported on (in Debug builds) by calling the _CrtDumpMemoryLeaks function. When running under the debugger, this will tell you (in the output tab) how many allocations you have at the time that it is called and the file and line number that each was allocated from.
Call this right at the end of your program, after you (think you) have freed all the resources you use. Anything left over is a candidate for being a leak.

Shared memory "Too many open files" but ipcs doesn't show many allocations

I'm writing unit tests for code which creates shared memory.
I only have a couple of tests. I make 4 allocations of shared memory and then it fails on the fifth.
After calling shmat() perror() says Too many open files:
template <typename T>
bool Attach(T** ptr, const key_type& key)
{
// shmemId was 262151
int32_t shmemId = shmget( key.key( ), ( size_t )0, 0644 );
if (shmemId < 0)
{
perror("Error: ");
return false;
}
*ptr = ( T * ) shmat(shmemId, 0, 0 );
if ( ( int64_t ) * ptr < 0 )
{
// Problem is here. perror() says 'Too many open files'
perror( "Error: ");
return false;
}
return true;
}
However, when I check ipcs -m -p I only have a couple of shared memory allocations.
T ID KEY MODE OWNER GROUP CPID LPID
Shared Memory:
m 262151 0x0000a028 --rw-r--r-- 3229 0
m 262152 0x0000a029 --rw-r--r-- 3229 0
In addition, when I check my OS shared memory limits sysctl -A | grep shm I get:
kern.sysv.shmall: 1024
kern.sysv.shmmax: 4194304
kern.sysv.shmmin: 1
kern.sysv.shmmni: 32
kern.sysv.shmseg: 8
security.mac.posixshm_enforce: 1
security.mac.sysvshm_enforce: 1
Are these variables large enough/are they the cause/what values should I have?
I'm sure I edited the file to increase them and restarted machine but perhaps it hasn't accepted (this is on Mac/OSX).
Your problem may be elsewhere.
Edit: This may be a shmmni limit of macOS. See below.
When I run your [simplified] code on my system (linux), the shmget fails.
You didn't specify IPC_CREAT to the third argument. If another process has created the segment, this may be okay.
But, it doesn't/shouldn't like a size of 0. The [linux] man page states that it returns an error (errno set to EINVAL) if the size is less than SHMMIN (which is 1).
That is what happened on my system. So, I adjusted the code to use a size of 1.
This was done [as I mentioned] on linux.
macOS may allow a size of 0, even if that doesn't make practical sense. (e.g.) It may round it up to a page size.
For shmat, it returns (void *) -1.
But, some systems can have valid addresses that have the high bit set. (e.g.) 0xFFE0000000000000 is a valid address, but would fail your if test because casting that to int64_t will test negative.
Better to do:
if ((int64_t) *ptr == (int64_t) -1)
Or [possibly better]:
if ((void *) *ptr == (void *) -1)
Note that errno is not set/changed if the call succeeds.
To verify this, do: errno = 0 before the shmat call. If perror says "Success", then the shmat is okay. And, your current test needs to be adjusted as above--I'd do that change regardless.
You could also do (e.g):
printf("ptr=%p\n",*ptr);
Normally, errno starts as 0.
Note that there are some differences between macOS and linux.
So, if errno is ever set to "too many open files", this can be because the process has too many open files (EMFILE).
It might be because the system-wide limit is reached (ENFILE) but that is "file table overflow".
Note that under linux shmat can not generate EMFILE. However, it appears that under macOS it can.
However, if the number of calls to shmat is limited [as you mention], the shmat should succeed.
The macOS man page is a little vague as to what the limit is based on. However, I checked the FreeBSD man page for shmat and that says it is limited by the sysctl parameter: kern.ipc.shmseg. Your grep should have caught that [if applicable].
It is possible some other syscall elsewhere in the code is opening too many files. And, that syscall is not checking the error return.
Again, I realize you're running macOS.
But, if available, you may want to try your program under linux. For example, it has much larger limits from the sysctl:
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
vm.hugetlb_shm_group = 0
Note that shmmni is the system-wide maximum number of shared memory segments.
Note that for macOS, shmmni is 32 (vs. 4096 for linux)!?!?
That means that the entire system can only have 32 open shared memory segments for any/all processes???
That seems very low. You can probably set this to a larger number and see if that helps.
Linux has the strace program and you could use it to monitor the syscalls.
But, macOS has dtruss: How to trace system calls of a program in Mac OS X?

Windows shared memory access time slow

I am currently using shared memory with two mapped files (1.9 GBytes for the first one and 600 MBytes for the second) in a software.
I am using a process that read data from the first file, process the data and write the results to the second file.
I have noticed a strong delay sometimes (the reason is out of my knowledge) when reading or writing to the mapping view with memcpy function.
Mapped files are created this way :
m_hFile = ::CreateFileW(SensorFileName,
GENERIC_READ | GENERIC_WRITE,
0,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL,
NULL);
m_hMappedFile = CreateFileMapping(m_hFile,
NULL,
PAGE_READWRITE,
dwFileMapSizeHigh,
dwFileMapSizeLow,
NULL);
And memory mapping is done this way :
m_lpMapView = MapViewOfFile(m_hMappedFile,
FILE_MAP_ALL_ACCESS,
dwOffsetHigh,
dwOffsetLow,
m_i64ViewSize);
The dwOffsetHigh/dwOffsetLow are "matching" granularity from the system info.
The process is reading about 300KB * N times, storing that in a buffer, processing and then writing 300KB * N times the processed contents of the previous buffer to the second file.
I have two different memory views (created/moved with MapViewOfFile function) with a size of 10 MBytes as default size.
For memory view size, I tested 10kBytes, 100kB, 1MB, 10MB and 100MB. Statistically no difference, 80% of the time reading process is as described below (~200ms) but writing process is really slow.
Normally :
1/ Reading is done in ~200ms.
2/ Process done in 2.9 seconds.
3/ Writing is done in ~200ms.
I can see that 80% of the time, either reading or writing (in the worst case both are slow) will take between 2 and 10 seconds.
Example : For writing, I am using the below code
for (unsigned int i = 0 ; i < N ; i++) // N = 500~3k
{
// Check the position of the memory view for ponderation
if (###)
MoveView(iOffset);
if (m_lpMapView)
{
memcpy((BYTE*)m_lpMapView + iOffset, pANNHeader, uiANNStatus);
// uiSize = ~300 kBytes
memcpy((BYTE*)m_lpMapView + iTemp, pLine[i], uiSize);
}
else
return uiANNStatus;
}
After using GetTickCount function to pinpoint where is the delay, I am seeing that the second memcpy call is always the one taking most of the time.
So, so far I am seeing N (for test, I used N = 500) calls to memcpy taking 10 seconds at the worst time when using those shared memories.
I made a temporary software that was doing the same quantity of memcpy calls, same amount of data and couldn't see the problem.
For tests, I used the following conditions, they all show the same delay :
1/ I can see this on various computers, 32 or 64 bits from windows 7 to windows 10.
2/ Using the main thread or multi-threads (up to 8 with critical sections for synchronization purpose) for reading/writing.
3/ OS on SATA or SSD, memory mapped files of the software physically on a SATA or SSD hard-disk, and if on external hard-disk, tests were done through USB1, USB2 or USB3.
I am kindly asking you what you would think my mistake is for memcpy to go slow.
Best regards.
I found a solution that works for me but not might be the case for others.
Following Thomas Matthews comments, I checked the MSDN and found two interesting functions FlushViewOfFile and FlushFileBuffers (but couldn't find anything interesting about locking memory).
Calling both after the for loop force update of the mapped file.
I am having no more "random" delay, but instead of the expected 200ms, I have an average of 400ms which is enough for my application.
After doing some tests I saw that calling those too often will cause heavy hard-disk access and will make the delay worse (10 seconds for every for loop) so the flush should be use carefully.
Thanks.

_beginthreadex leaking memory

The code below is my entire test program. Each time I press ENTER, the RAM that the process is using is increasing with 4k (it will keep increasing, without stopping; I am seeing it with task manager). What is wrong? The same things happens with _beginthread.
I am trying to write a server, and I want to process each connection with a thread. (Note that this means that I can't join the thread, because that will pause the main thread from accepting new connections.)
unsigned __stdcall thread_test(void *)
{
for(int i = 0; i < 10000; i++)
{
i+=1;
i-=1;
} //simulating processing
_endthreadex( 0 );
}
int main()
{
HANDLE hThread;
while(1)
{
getchar();
hThread = (HANDLE)_beginthreadex( NULL, 0, thread_test, 0, 0, NULL );
CloseHandle( hThread );
}
}
Compiled with code blocks and visual studio.
EDIT: I've made some tests, and the memory stops filling up once it reach around 133.000K (when the program starts, the memory is around 800k); but at this stage, the program runs like 4-5 times slower than it did in the beginning (higher the memory - slower the program runs), so it would not be good for my server to run like that.
EDIT 2: I've got Visual Studio 2013 and the problem gone.
EDIT 3: If I test the code above in Visual Studio 2013, it gives no leaks. But if I use beginthreadex with a small server code, it gives me leaks like before, each request giving 4k. Here is the server testcode(it does nothing, only to see that it leaks memory) that I use http://pastebin.com/EDmJXkZU . You can compile it and test it by typing your IP into the adress bar of the browser.
Task Manager is not showing RAM used by your program. For a better view use Task Manager's Resource Monitor and observe the private bytes indication. But all memory monitors show only "virtual memory," which is commonly retained by the runtime library instead of being freed back to Windows. You don't have a real problem.

Unexplained Linux System V IPC shared memory segment marked for destruction

I have a Linux System V IPC shared memory segment that is populated by one process and read by many others. All the processes use interface to the shared memory segment in the form of a class which takes care of looking up, attaching, and detaching to the segment as part of its constructor/destructor methods.
The problem here is that from time to time I'm seeing that the segment has "split". What I mean here is that looking in the "ipcs -m -s" output I see that I've got two segments listed: one which has been marked for destruction but still has some processes attached to it, and a second which appears to get all new attempts to attach to the segment. However, I'm never actually asking the kernel to destroy the segment. What's happening here?!
One other thing to note is that unfortunately the system this is running on is seriously overcommited in the memory department. There is 1 GB of physical memory, no swap, and the Committed_AS in /proc/meminfo is reporting about 2.5GB of commited memory. Fortunately the system processes are not actually using this much memory... they're just asking for it (I still have about 660MB "free" memory as reported by vmstat). While I know this is far from ideal, for the time being there is nothing I can do about the overcommitted memory. However, browsing the kernel/libc source I don't see anything in there that would mark a shared memory segment for deletion for any reason other than a user request (but perhaps I've missed it hidden in there somewhere).
For reference here's the shared memory interface class' constructor:
const char* shm_ftok_pathname = "/usr/bin";
int shm_ftok_proj_id = 21;
// creates a key from a file path so different processes will get same key
key_t m_shm_key = ftok(shm_ftok_pathname, shm_ftok_proj_id);
if ( m_shm_key == -1 )
{
fprintf(stderr,"Couldn't get the key for the shared memory\n%s\n",strerror(errno));
exit ( status );
}
m_shm_id = shmget(m_shm_key, sizeof(shm_data_s), (IPC_CREAT | 0666));
if (m_shm_id < 0)
{
fprintf(stderr,"Couldn't get the shared memory ID\nerrno = %s \n",strerror(errno));
exit ( status );
}
// get a ptr to shared memory, which is a shared mem struct
// second arg of 0 says let OS choose shm address
m_shm_data_ptr = (shm_data_s *)shmat(m_shm_id, 0, 0);
if ( (int)m_shm_data_ptr == -1 )
{
fprintf(stderr,"Couldn't get the shared memory pointer\n");
exit ( status );
}
And here's my uname output:
Linux 2.6.18-5-686 #1 SMP Fri Jun 1 00:47:00 UTC 2007 i686 GNU/Linux
My first guess is that you probably are calling shmctl(..., IPC_RMID, ...) somewhere.
Can you show the shared memory interface class' destructor?
The only reason for kernel to mark the segment for deletion is the explicit user call.May be you can give a try to strace/truss(in solaris) to find out if there is a user call to the said function, mentioned in 1 above.
Raman Chalotra