When an executable is running on Linux, it generates processes, threads, I/O ... etc, and uses libraries from languages like C/C++, sometimes there might be timers in question, is it possible to monitor this? how can I get a deep dive into these software and processes and what is going on in the background?
I know this stuff is abstracted from me because I shouldn't be worrying about it as a regular user, but I'm curious to what would I see.
What I need to see are:
System calls for this process/thread.
Open/closed sockets.
Memory management and utilization, what block is being accessed.
Memory instructions.
If a process is depending on the results of another one.
If a process/thread terminates, why, and was it successful?
I/O operations and DB read/write if any.
The different things you wanted to monitor may require different tools. All tools I will mention below have extensive manual pages where you can find exactly how to use them.
System calls for this process/thread.
The strace command does exactly this - it lists exactly which system calls are invoked by your program. The ltrace tool is similar, but focuses on calls to library functions - not just system calls (which involve the kernel).
Open/closed sockets.
The strace/ltrace commands will list among other things socket creation, but if you want to know which sockets are open - connected, listening, and so on - right now, there is the netstat utility, which lists all the connected (or with "-a", also listening) sockets in the system, and which process they belong to.
Memory management and utilization, what block is being accessed.
Memory instructions.
Again ltrace will let you see all malloc()/free() calls, but to see exactly what memory is being access where, you'll need a debugger, like gdb. The thing is that almost everything your program does will be a "memory instruction" so you'll need to know exactly what you are looking for, with breakpoints, tracepoints, single-stepping, and so on, and usually don't just want to see every memory access in your program.
If you don't want to find all memory accesses but rather are searching for bugs in this area - like accessing memory after it's freed and so on, there are tools that help you find those more easily. One of them called ASAN ("Address Sanitizer") is built into the C++ compiler, so you can build with it enabled and get messages on bad access patterns. Another one you can use is valgrind.
Finally, if by "memory utilization" you meant to just check how much memory your process or thread is using, well, both ps and top can tell you that.
If a process is depending on the results of another one.
If a process/thread terminates, why, and was it successful?
Various tools I mentioned like strace/ltrace will let you know when the process they follow exits. Any process can print the exit code of one of its sub-processes, but I'm not aware of a tool which can print the exit status of all processes in the system.
I/O operations
There is iostat that can give you periodic summaries of how much IO was done to each disk. netstat -s gives you network statistics so you can see how many network operations were done. vmstat gives you, among other things, statistics on IO caused by swap in/out (in case this is a problem in your case).
and DB read/write if any.
This depends on your DB, I guess, and how you monitor it.
Related
I have an online service which is single thread, since whole program share one memory pool(use new or malloc),a module might destroy memory which leads to another module work incorrectly, so I want to split whole program into two part, each part runs on a thread, is it possible to isolate thread memory like multiprocess so I can check where is the problem? (splitting it into multiprocess cost a lot of time and risky so I don't want to try)
As long as you'll use threads, memory can be easily corrupted since, BY DESIGN, threads are sharing the same memory. Splitting your program across two threads won't help in ANY manner for security - it can greatly help with CPU load, latency, performances, etc. but in no way as an anti memory corruption mechanism.
So either you'll need to ensure a proper development and that your code won't plow memory where it must not, or you use multiple process - those are isolated from each other by operating system.
You can try to sanitize your code by using tools designed for this purpose, but it depends on your platform - you didn't gave it in your question.
It can go from a simple Debug compilation with MSVC under Windows, up to a Valgrind analysis under Linux, and so on - list can be huge, so please give additional informations.
Also, if it's your threading code that contains the error, maybe rethink what you're doing: switching it to multiprocess may be the cheapest solution in the end - don't be fooled by sunk cost, especially since threading can NOT protect part 1 against part 2 and reciprocally...
Isolation like this is quite often done by using separate processes.
The downsides of doing that are
harder communication between the processes (but thats kind of the point)
the overhead of starting processes is typically a lot larger than starting thread. So you would not typically spawn a new process for each request for example.
Common model is a lead process that starts a child process to do the request serving. The lead process just monitors the health of the worker
Or you could fix your code so that it doesnt get corrupted (AN easy thing to say I know)
Memory editors such as Cheat Engine are able to read the memory of other processes and modify it.
How do they do it?(a code snippet would be interesting!) A process does typically not have the ability to access the memory of another one, the only cases that I've heard of are in sub-processes/threading, but memory editors are typically not related to the target process in any way.
Why do they work? In what scenario is this ability useful aside from using it to hack other processes, why wouldn't the operating system simply disallow unrelated processes from reading the memory of each other?
On Windows, the function typically used to alter the memory of another process is called WriteProcessMemory:
https://learn.microsoft.com/en-in/windows/win32/api/memoryapi/nf-memoryapi-writeprocessmemory
If you search the Cheat Engine source code for WriteProcessMemory you can find it both in their Pascal code and the C kernel code. It needs PROCESS_VM_WRITE and PROCESS_VM_OPERATION access to the process which basically means you need to run Cheat Engine as admin.
WriteProcessMemory is used any time you want to alter the runtime behavior of another process. There are legitimate uses, such as with Cheat Engine or ModOrganizer, and of course lots of illegitimate ones. It's worth mentioning that anti-virus software is typically trained to look for this API call (among others) so unless your application has been whitelisted it might get flagged because of it.
Operating systems typically expose system calls that allow a userspace program to inspect a target process's memory using said system calls.
For instance, the linux kernel supports the ptrace system call. This system call is primarily used by the well known debugger gdb and by popular system call tracing utilities such as strace.
The ptrace system call allows for the inspection of memory contents of the target process, code injection, register manipulation and much more. I would suggest this as a resource if you are interested in learning more.
On Linux, you can either run a binary from within gdb, or attach to a process. In case of the latter, you need elevated privileges. There are other protections that try to limit what you can do with ptrace, such as the one mentioned here.
On Windows you can achieve the same effect by using the following functions in order : OpenProcess, GetProcAddress, VirtualAllocEx, WriteProcessMemory and CreateRemoteThread. I would suggest this as a resource if you are interested in knowing more. You might need elevated privileges to do this on newer Windows versions.
In Unix systems, it is possible to dynamically monitor the system by reading data from /proc. I am hoping to implement this kind of monitoring in my application, by dynamically saving "current status" into a file. However, I do not want IO delay my program, so it would be best to make the file virtual, i.e. not stored into disk but actually in memory. Is there a way of doint that? Thanks for the hint!
Why not used shared memory and semaphores. Do a 'man shmget' as a starting point.
As an alternative you could make your application a socket server. Doing this way you can have it responding with status information only if being asked to (there's not even the need to keep updating a memory area with the current status) and you can also control your program from a remote machine. If the status itself is not a huge quantity of data I think this is the most flexible solution.
If also you make your application responding to an HTTP request (i don't mean handling all the http protocol possibilities, just the requests you want to support) then you can also avoid to write a client and if you want to write it anyway it's probably easier to find libraries and programmers able to do that.
Make it listening to port 80 and you could check your program over the internet getting through firewalls without efforts :-) (well... assuming the program itself can be reached from the internet, but even for that it's a simple and common thing to ask for to sysadmins).
Try FUSE. it is particularly useful for writing virtual file systems. There are already many filesystems on top of it.
I have no idea about your exact requirements, so I can only guess, but every file that under linux is put into /dev/shm is in ram. But that doesn't mean it is not doing I/O, just that the I/O is faster. If you don't want to do I/O via filedescriptors or similar, do as someone else suggested and use shared memory segments, but this is a bit harder for everyone to read. Having other programs just open and read a file, which then calls some functions in your program (which is done by /proc in kernel space) is not possible. Maybe also a filesystem socket or fifo is something that suits your needs more (e.g. when you are having a select/(e)poll routine anyways). When you have full control over the system, also tmpfs might be useful for you.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What's the graceful way of handling out of memory situations in C/C++?
Hi,
this seems to be a simple question a first glance. And I don't want to start a huge discussion on what-is-the-best-way-to-do-this....
Context: Windows >= 5, 32 bit, C++, Windows SDK / Win32 API
But after asking a similar question, I read some MSDN and about the Win32 memory management, so now I'm even more confused on what to do if an allocation fails, let's say the C++ new operator.
So I'm very interested now in how you implement (and implicitly, if you do implement) an error handling for OOM in your applications.
If, where (main function?), for which operations (allocations) , and how you handle an OOM error.
(I don't really mean that subjectively, turning this into a question of preference, I just like to see different approaches that account for different conditions or fit different situations. So feel free to offer answers for GUI apps, services - user-mode stuff ....)
Some exemplary reactions to OOM to show what I mean:
GUI app: Message box, exit process
non-GUI app: Log error, exit process
service: try to recover, e.g. kill the thread that raised an exception, but continue execution
critical app: try again until an allocation succeeds (reducing the requested amount of memory)
hands from OOM, let STL / boost / OS handle it
Thank you for your answers!
The best-explained way will receive the great honour of being an accepted answer :D - even if it only consists of a MessageBox line, but explains why evering else was useless, wrong or unneccessary.
Edit: I appreciate your answers so far, but I'm missing a bit of an actual answer; what I mean is most of you say don't mind OOM since you can't do anything when there's no memory left (system hangs / poor performance). But does that mean to avoid any error handling for OOM? Or only do a simple try-catch in the main showing a MessageBox?
On most modern OSes, OOM will occur long after the system has become completely unusable, since before actually running out, the virtual memory system will start paging physical RAM out to make room for allocating additional virtual memory and in all likelihood the hard disk will begin to thrash like crazy as pages have to be swapped in and out at higher and higher frequencies.
In short, you have much more serious concerns to deal with before you go anywhere near OOM conditions.
Side note: At the moment, the above statement isn't as true as it used to be, since 32-bit machines with loads of physical RAM can exhaust their address space before they start to page. But this is still not common and is only temporary, as 64-bit ramps up and approaches mainstream adoption.
Edit: It seems that 64-bit is already mainstream. While perusing the Dell web site, I couldn't find a single 32-bit system on offer.
You do the exact same thing you do when:
you created 10,000 windows
you allocated 10,000 handles
you created 2,000 threads
you exceeded your quota of kernel pool memory
you filled up the hard disk to capacity.
You send your customer a very humble message where you apologize for writing such crappy code and promise a delivery date for the bug fix. Any else is not nearly good enough. How you want to be notified about it is up to you.
Basically, you should do whatever you can to avoid having the user lose important data. If disk space is available, you might write out recovery files. If you want to be super helpful, you might allocate recovery files while your program is open, to ensure that they will be available in case of emergency.
Simply display a message or dialog box (depending on whether your in a terminal or window system), saying "Error: Out of memory", possibly with debugging info, and include an option for your user to file a bug report, or a web link to where they can do that.
If your really out of memory then, in all honesty, there's no point doing anything other than gracefully exiting, trying to handle the error is useless as there is nothing you can do.
In my case, what happens when you have an app that fragments the memory up so much it cannot allocate the contiguous block needed to process the huge amount of nodes?
Well, I split the processing up as much as I could.
For OOM, you can do the same thing, chop your processes up into as many pieces as possible and do them sequentially.
Of course, for handling the error until you get to fix it (if you can!), you typically let it crash. Then you determine that those memory allocs are failing (like you never expected) and put a error message direct to the user along the lines of "oh dear, its all gone wrong. log a call with the support dept". In all cases, you inform the user however you like. Though, its established practice to use whatever mechanism the app currently uses - if it writes to a log file, do that, if it displays an error dialog, do the same, if it uses the Windows 'send info to microsoft' dialog, go right ahead and let that be the bearer of bad tidings - users are expecting it, so don't try to be clever and do something else.
It depends on your app, your skill level, and your time. If it needs to be running 24/7 then obviously you must handle it. It depends on the situation. Perhaps it may be possible to try a slower algorithm but one that requires less heap. Maybe you can add functionality so that if OOM does occur your app is capable of cleaning itself up, and so you can try again.
So I think the answer is 'ALL OF THE ABOVE!', apart from LET IT CRASH. You take pride in your work, right?
Don't fall into the 'there's loads of memory so it probably won't happen' trap. If every app writer took that attitude you'd see OOM far more often, and not all apps are running on a desktop machines, take a mobile phone for example, it's highly likely for you to run into OOM on a RAM starved platform like that, trust me!
If all else fails display a useful message (assuming there's enough memory for a MessageBox!)
I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash).
After calculating the hash, it stores info about the chunk in an STL map (basically <chunkID, hash>) and then writes the chunk itself to another file (WriteFile).
That's all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes > 2 min to show, ctrl+alt+del is unresponsive, running programs are slow.... the works.
I've done literally everything I can think of to optimize the program, and have triple-checked all objects.
What I've done:
Tried different (less intensive) hashing algorithms.
Switched all allocations to nedmalloc instead of the default new operator
Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google's dense_hash_map.
Converted all objects to store pointers to objects instead of the objects themselves.
Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk.
Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon.
Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
Set the thread priority to THREAD_PRIORITY_IDLE
Added a Sleep(100) call after every loop.
Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances.
Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks.
The machines I've tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions.
I'm at a loss as to what to do next. I don't know what's causing it - I'm torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there's a huge difference in how often the issue occurs when using unordered_set vs Google's dense_hash_map.
What's really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)..... but when I compile the code and run it on OS X or Linux (it's fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs.
What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that's killing system performance, when all the indicators are that the application itself isn't doing anything extreme?
Any advice would be most welcome.
I know you said you had monitored memory usage and that it seems minimal here, but the symptoms sound very much like the OS thrashing like crazy, which would definitely cause general loss of OS responsiveness like you're seeing.
When you run the application on a file say 1/4 to 1/2 the size of available physical memory, does it seem to work better?
What I suspect may be happening is that Windows is "helpfully" caching your disk reads into memory and not giving up that cache memory to your application for use, forcing it to go to swap. Thus, even though swap use is minimal (150MB), it's going in and out constantly as you calculate the hash. This then brings the system to its knees.
Some things to check:
Antivirus software. These often scan files as they're opened to check for viruses. Is your delay occuring before any data is read by the application?
General system performance. Does copying the file using Explorer also show this problem?
Your code. Break it down into the various stages. Write a program that just reads the file, then one that reads and writes the files, then one that just hashes random blocks of ram (i.e. remove the disk IO part) and see if any particular step is problematic. If you can get a profiler then use this as well to see if there any slow spots in your code.
EDIT
More ideas. Perhaps your program is holding on to the GDI lock too much. This would explain everything else being slow without high CPU usage. Only one app at a time can have the GDI lock. Is this a GUI app, or just a simple console app?
You also mentioned RtlEnterCriticalSection. This is a costly operation, and can hang the system quite easily, i.e. mismatched Enters and Leaves. Are you multi-threading at all? Is the slow down due to race conditions between threads?
XPerf is your guide here - watch the PDC Video about it, and then take a trace of the misbehaving app. It will tell you exactly what's happening throughout the system, it is extremely powerful.
I like the disk-caching/thrashing suggestions, but if that's not it, here are some scattershot suggestions:
What non-MSVC libraries, if any, are you linking to?
Can your program be modified (#ifdef'd) to run without a GUI? Does the problem occur?
You added ::Sleep(100) after each loop in each thread, right? How many threads are you talking about? A handful or hundreds? How long does each loop take, roughly? What happens if you make that ::Sleep(10000)?
Is your program perhaps doing something else that locks a limited resources (ProcExp can show you what handles are being acquired ... of course you might have difficulty with ProcExp not responding:-[)
Are you sure CriticalSections are userland-only? I recall that was so back when I worked on Windows (or so I believed), but Microsoft could have modified that. I don't see any guarantee in the MSDN article Critical Section Objects (http://msdn.microsoft.com/en-us/library/ms682530%28VS.85%29.aspx) ... and this leads me to wonder: Anti-convoy locks in Windows Server 2003 SP1 and Windows Vista
Hmmm... presumably we're all multi-processor now, so are you setting the spin count on the CS?
How about running a debugging version of one of these OSes and monitoring the kernel debugging output (using DbgView)... possibly using the kernel debugger from the Platform SDK ... if MS still calls it that?
I wonder whether VMMap (another SysInternal/MS utility) might help with the Disk caching hypothesis.
It turns out that this is a bug in the Visual Studio compiler. Using a different compiler resolves the issue entirely.
In my case, I installed and used the Intel C++ Compiler and even with all optimizations disabled I did not see the fully-system hang that I was experiencing w/ the Visual Studio 2005 - 2010 compilers on this library.
I'm not certain as to what is causing the compiler to generate such broken code, but it looks like we'll be buying a copy of the Intel compiler.
It sounds like you're poking around fixing things without knowing what the problem is. Take stackshots. They will tell you what your program is doing when the problem occurs. It might not be easy to get the stackshots if the problem occurs on other machines where you cannot use an IDE or a stack sampler. One possibility is to kill the app and get a stack dump when it's acting up. You need to reproduce the problem in an environment where you can get a stack dump.
Added: You say it performs well on OSX and Linux, and poorly on Windows. I assume the ratio of completion time is some fairly large number, like 10 or 100, if you've even had the patience to wait for it. I said this in the comment, but it is a key point. The program is waiting for something, and you need to find out what. It could be any of the things people mentioned, but it is not random.
Every program, all the time while it runs, has a call stack consisting of a hierarchy of call instructions at specific addresses. If at a point in time it is calculating, the last instruction on the stack is a non-call instruction. If it is in I/O the stack may reach into a few levels of library calls that you can't see into. That's OK. Every call instruction on the stack is waiting. It is waiting for the work it requested to finish. If you look at the call stack, and look at where the call instructions are in your code, you will know what your program is waiting for.
Your program, since it is taking so long to complete, is spending nearly all of its time waiting for something to finish, and as I said, that's what you need to find out. Get a stack dump while it's being slow, and it will give you the answer. The chance that it will miss it is 1/the-slowness-ratio.
Sorry to be so elemental about this, but lots of people (and profiler makers) don't get it. They think they have to measure.