Does anyone know if memory usage have on lambda response time? I understand that memory allocation is directly correlated to CPU allocation, but what about % memory utilization. e.g. 100mb allocated but 95mb is being used(for dependencies, that should be in layers). Will that effect the execution time?
The utilisation at runtime does not change the allocated number of virtual CPU cores at runtime.
As you already know: the number of cores depends on the amount of memory you configure. But this is a configuration time allocation and has nothing to do with the runtime.
Commenter Anon Coward already mentioned, high memory utilisation still can have an impact on your Lambdas execution time. But it does not have to. It depends on what your code is actually doing.
The great thing is: you can measure all of this and you can therefore find out what the best memory size is for your Lambda function.
Even better, there are already projects that help you do exactly that:
https://github.com/alexcasalboni/aws-lambda-power-tuning
Related
I wrote a program that manipulates with files. In the idle state, the memory occupied by the process is ~ 60 MBs. Periodically, say every 2 minutes, the process allocates memory (~ 40 MBs), performs something with files, then frees the allocated memory. The procedure takes around 10 ~ 20 seconds. As the result, the memory usage of my process looks like in the below picture:
My question here is: should I reserve some memory in advance then use the memory when I need? This would make memory usage trend more stable. And the stability would be better for system, am I right?
No, any 21st century OS won't blink an eye if you do this. Things might get interesting if you try to allocate more than 4GB per millisecond across 100 CPU cores, but you're not even close to that.
If you allocate memory in advance say additional 50MB as you say the graph would be a straight line (i.e. stable) but programs competing for memory might not get enough and they would suffer due to it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Let us say my system has 4GB of RAM, and a certain program only consumes 100 MB of memory throughout its execution, and also runs for a limited time, like only 30 seconds. Don't you think its a good idea (by improving performance) not to deallocate memory at all during the execution of that program? Those 100 MBs would be freed anyway at the termination of that program.
The most important part of deallocating memory in a short running program is fascilitating memory reuse. Your processor has a limited cache, probably only a few MBs, and if you continuously allocate more memory you can never take advantage of that.
If however you deallocate some memory, and then reallocate and reuse it, there's a good chance that on reuse the memory will already be "hot" and resident in cache. This means you won't incur the penalty of a cache miss and your program will run faster.
Of course you are trading off the deallocation/reallocation cost for cache misses, but continuously allocating more and more memory is guaranteed to incur lowest level cache misses. Each of these cache misses cost ~100 instructions on a modern processor, way less than the cost of an allocation/deallocation.
Edit
So I made a little test program to test this.
#include <stdio.h>
#include <stdlib.h>
process1(){
char * ptr = malloc(1 << 20);
int i = 0;
while (i < (1<<20)) ptr[i++] = rand();
free(ptr);
}
process2(){
char * ptr = malloc(1 << 20);
int i = 0;
while (i < (1<<20)) ptr[i++] = rand();
}
main(){
int i = 100;
while(i--) process1();
i = 100;
while(i--) process2();
}
Both processes chew through 100 MB of data, the first one deallocates it as it goes, the second does not. I profiled this with valgrind's cachegrind tool. Here are the results
fn=process1
0 943720200 2 2 419430900 100 0 209715800 1638400 32767
fn=process2
0 943719900 0 0 419430800 100 0 209715700 1638400 1605784
Okay not that exciting, i'll translate. By avoiding the free you saved less than .1% of instructions cycles, and you incurred about a million and a half LL cache misses. If we use the standard cycle estimation formula CEst = lr + 10 Llm + 100 LLm thats about a 16% degradation in performance from avoiding the free in the function alone.
I reran in callgrind to get the full story (i'm not gonna post the detailed results because they are way more complicated than the cache grind ones) but when we include the full call to free the result is the same, a 16% more cycles are used when you don't call free.
Feel free to do this same test on your own program, but in this case the results are clear. The cost of managing dynamic memory is truly trivial compared to the cost of continuously refreshing the cache with new memory.
Also, I didn't mention this, but the direct comparison that should be made is the cost of the free to the cost of the cache misses. The frees cost in total 38930 cycles, the cache misses 157332000, you saved 39 thousand cycles, and paid for it with 150 million.
Don't you think its a good idea not to deallocate memory at all during the execution of that program?
Cutting corners is not a "good idea". I can think of plenty of reasons to deallocate memory; I can't think of quite so many to leave unused memory allocated. If you don't want to deal with memory issues, don't write in a language that doesn't have automatic garbage collection- or, since you're writing in C++, use smart pointers.
only consumes 100 MB of memory
That's the most egregious misuse of the word "only" I have ever seen. 100 MB is a serious chunk of change. That's ~1/40th of the memory you're using, and if even 40 other programs running on the system were programmed with the same perspective, at best you'd have the system come to a grinding halt because of swap delays, and at worst you'd have a complete system crash. Usually, there are more than 40 applications running on any given (modern) machine while it's turned on.
also runs for a limited time, like only 30 seconds
Again, 30 seconds is a long time for a computer. But, to clarify, do you mean 30 seconds ever, or 30 seconds every day? Because if it's a one-off and you're low on time, maybe that's an acceptable loss, but it certainly isn't good practice.
TL;DR?
If you have more important, immediate considerations (e.g. a product deadline is approaching) and you've got not other reasonable choice, do it. Otherwise, you've got no excuse. Hogging memory is never something you should feel good about; it can seriously slow down processes that really do need that memory. Yes, the O/S cleans up resources after the program ends. No, that does not give you carte blanche to write programs which abuse their resources.
No, it is not good programming style. Virtually all modern OS's will cleanup the memory afterwards, but it will remain a leak for as long as the program runs.
If you want to avoid overhead from cleanup you can implement an allocator that permits that while still maintaining good practices. For example, allocate a memory pool on program start-up, replace the global operator new with a version that takes memory from the pool, and replace global operator delete with a no-op. Then free the memory pool before program exit. You could also use more find grained pools to cut down on the deallocation overhead but still allow the program to run without continuously growing memory usage.
Another alternative is to simply use a faster implementation of new/delete. The built-in versions are traditionally slow, but they don't have to be.
I want to test a program's memory management capabilities, for example (say, program name is director)
What happens if some other processes take up too much memory, and there is too less memory for director to run? How does director behave?
What happens if too many of the CPU cycles are used by some other program while director is running?
What happens if memory used by the other programs is freed after sometime? How does director claim the memory and start working at full capabilities.
etc.
I'll be doing these experiments on a Unix machine. One way is to limit the amount of memory available to the process using ulimit, but there is no good way to have control over the CPU cycle utilization.
I have another idea. What if I write some program in C or C++ that acts as a dynamic memory and CPU filler, i.e. does nothing useful but eats up memory and/or CPU cycles anyways?
I need some ideas on how such a program should be structured. I need to have dynamic(runtime) control over memory used and CPU used.
I think that creating a lot of threads would be a good way to clog up the CPU cycles. Is that right?
Is there a better approach that I can use?
Any ideas/suggestions/comments are welcome.
http://weather.ou.edu/~apw/projects/stress/
Stress is a deliberately simple workload generator for POSIX systems. It imposes a configurable amount of CPU, memory, I/O, and disk stress on the system. It is written in C, and is free software licensed under the GPLv2.
The functionality you seek overlaps the feature set of "test tools". So also check out http://ltp.sourceforge.net/tooltable.php.
If you have a single core this is enough to put stress on a CPU:
while ( true ) {
x++;
}
If you have lots of cores then you need a thread per core.
You make it variably hungry by adding a few sleeps.
As for memory, just allocate lots.
There are several problems with such a design:
In a virtual memory system, memory size is effectively unlimited. (Well, it's limited by your hard disk...) In practice, systems usually run out of address space well before they run out of backing store -- and address space is a per-process resource.
Any reasonable (non realtime) operating system is going to limit how much CPU time and memory your process can use relative to other processes.
It's already been done.
More importantly, I don't see why you would ever want to do this.
Dynamic memory control, you could just allocate or free buffers of a certain size to use or free more or less memory. As for CPU utilization, you will have to get an OS function to check this and periodically check it and see if you need to do useful work.
I have a C++ program that has a pretty terrible memory leak, about 4MB / second. I know where it's coming from and can fix it, but that's not my main problem. My program is taking up a very large amount of CPU usage and it isn't running as fast as I want it to. I have two different threads in the program. One by itself takes ~50% CPU, which is fine, and the other by itself takes ~15% CPU, which is fine. Together however CPU usage is 100% and the program cannot run as fast as it needs to.
Can a memory leak by itself cause a problem like this? I know the program will eventually crash due to the leaked memory, but does a memory leak immediately lead to a slower program? By immediately I mean the program is too slow at the very start, not just when the memory footprint is huge.
Thanks!
Regardless of whether or not your memory leak is causing the problem it needs to be fixed. Once you fix the memory leak see if you're still having the problem. You should be able to answer your own question at that point.
Allocations in general can be slow and it sounds like you're doing a lot of them here. After fixing your leak, you will want to consider a pooled memory implementation to avoid thrashing your heap so much, especially in a multi-threaded environment.
It should not be a problem while you still have available memory. However, memory allocation is a relatively slow process, so depending on how often you do it, it might account for your problems.
Don't forget that a profiler is your friend for all and every optimization needs. It knows much better than you what your main bottleneck is.
Oh, and fix your memory leak now, while it's still easy.
Well, allocation is slow and does require at least some CPU effort to search for appropriate blocks to give. Besides that I wouldn't think so. It sounds like your program is in quite a mess so I wouldn't doubt that there's some larger issue at heart.
Have a look at your page faults. In Windows, use Task Manager, and in Unix/Linux try ps. It's likely that the extra processor time is being used to allocate additional memory to your process, and once the free physical memory has been exhausted, the OS has to swap out unused pages to the disk.
Another approach would be to use a profiling tool to see where the bottlenecks are in your code.
There are several angles here:
1) You have memory leaks. This is bound to cause page faults in your cache so data has to be sourced from RAM/disk. More I/O, more problems. Are you using a memory manager? If not consider using one. Look into dlmalloc for a start.
2) CPU usage 100% may not be due to this problem. Obviously the way to go is to first fix this leak and then use a profiler [Quantify is best, but costs. VTune is not bad, although I don't like the interface. Else gprof is not bad and its free.] and see which parts of your code is taking up CPU cycles.
3) I see that you are using threads. Syncing threads up is non-trivial. See if you are unnecessarily waiting for mutexes or something similar.
Disposing of previously allocated memory fragments should be relatively faster than their allocation and (as long as memory leak means "memory lost due to missing deallocation call") it shouldn't really affect your speed, only overall memory usage.
Although if you allocate huge amounts of memory every second and don't do proper deallocations, this could be the problem. I had the same issue when wrongly compiled gtkmm + librsvg leaked ~5 megabytes per second on screen redraw and that, of course, had some notable performance impact.
Of course, this doesn't mean you shouldn't eliminate your memory leaks if you know that they exist. Memory leaks could cause something more serious than performance troubles.
I'm writing a performance critical application where its essential to store as much data as possible in the physical memory before dumping to disc.
I can use ::GlobalMemoryStatusEx(...) and ::GetProcessMemoryInfo(...) to find out what percentage of physical memory is reserved\free and how much memory my current process handles.
Using this data I can make sure to dump when ~90% of the physical memory is in use or ~90 of the maximum of 2GB per application limit is hit.
However, I would like a method for simply recieving how many bytes are actually left before the system will start using the virtual memory, especially as the application will be compiled for both 32bit and 64bit, whereas the 2 GB limit doesnt exist.
How about this function:
int
bytesLeftUntilVMUsed() {
return 0;
}
it should give the correct result in nearly all cases I think ;)
Imagine running Windows 7 in 256Mb of RAM (MS suggest 1GB minimum). That's effectively what you're asking the user to do by wanting to reseve 90% of available RAM.
The real question is: Why do you need so much RAM? What is the 'performance critical' criteria exactly?
Usually, this kind of question implies there's something horribly wrong with your design.
Update:
Using top of the range RAM (DDR3) would give you a theoretical transfer speed of 12GB/s which equates to reading one 32 bit value every clock cycle with some bandwidth to spare. I'm fairly sure that it is not possible to do anything useful with the data coming into the CPU at that speed - instruction processing stalls would interrupt this flow. The extra, unsued bandwidth can be used to page data to/from a hard disk. Using RAID this transfer rate can be quite high (about 1/16th of the RAM bandwidth). So it would be feasible to transfer data to/from the disk and process it without having any degradation of performance - 16 cycles between reads is all it would take (OK, my maths might be a bit wrong here).
But if you throw Windows into the mix, it all goes to pot. Your memory can go away at any moment, your application can be paused arbitrarily and so on. Locking memory to RAM would have adverse affects on the whole system, thus defeating the purpose of locing the memory.
If you explain what you're trying to acheive and the performance critria, there are many people here that will help develop a suitable solution, because if you have to ask about system limits, you really are doing something wrong.
Even if you're able to stop your application from having memory paged out to disk, you'll still run into the problem that the VMM might be paging out other programs to disk and that might potentially affect your performance as well. Not to mention that another application might start up and consume memory that you're currently occupying and thus resulting in some of your applications memory being paged out. How are you planning to deal with that?
There is a way to use non-pageable memory via the non-paged pool but (a) this pool is comparatively small and (b) it's used by device drivers and might only be usable from inside the kernel. It's also not really recommended to use large chunks of it unless you want to make sure your system isn't that stable.
You might want to revisit the design of your application and try to work around the possibility of having memory paged to disk before you either try to write your own VMM or turn a Windows machine into essentially a DOS box with more memory.
The standard solution is to not worry about "virtual" and worry about "dynamic".
The "virtual" part of virtual memory has to be looked at as a hardware function that you can only defeat by writing your own OS.
The dynamic allocation of objects, however, is simply your application program's design.
Statically allocate simple arrays of the objects you'll need. Use those arrays of objects. Increase and decrease the size of those statically allocated arrays until you have performance problems.
Ouch. Non-paged pool (the amount of RAM which cannot be swapped or allocated to processes) is typically 256 MB. That's 12.5% of RAM on a 2GB machine. If another 90% of physical RAM would be allocated to a process, that leaves either -2,5% for all other applications, services, the kernel and drivers. Even if you'd allocate only 85% for your app, that would still leave only 2,5% = 51 MB.