C++ read/write - RamDisk vs RAM [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I'm using Ubuntu 32 BIT.
- My app need to store incoming data at RAM (because I need to do a lot of searches on the incoming data and calc somthing).
- I have a need to save the data for X seconds => So I need to allocate 12GB of memory. (client requirements)
- I'm using Ubuntu 32 BIT (and dont want to to work with Ubuntu 64 BIT)
- So I am using Ram Disk to save the incomming data and to searach on it. (So I can use 12GB of Ram on 32 BIT system)
when I test the app with 2GB of allocated memory (instead of 12GB) I saw that the performance of the CPU when using RAM is better than when using RAM DISK when I just write data into my DB (15% VS 17% CPU usage)
but when I test the queries (which read a lot of data / or Files if I'm working with RAM disk) I saw a huge different (20% vs 80% CPU usage).
I dont understand why there is a huge of DIFF ?
Both RAM and RAM DISK work on RAM ? no ? Is there anything I can do to get better performance ?

There are two reasons that I can think of as to why a RAM disk is slower.
With a RAMDisk we might use RAM as the file media but we still have the overhead of using a filesystem. This involved system calls to access data with other forms of indirection or copying. Directly accessing memory is just that.
Memory access tends to be fast because we often can find what we are looking for in the processor cache. This saves us from reading directly from slower RAM. Using a RAMDisk will probably not be able to make use of the processor cache to the same extent if for no other reason, it requires a system call.

Related

CPU utilization degradation over time [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a multi-threaded process. Each thread is CPU bound (performs calculations) and also uses a lot of memory. The process starts with 100% cpu utilization according to resource monitor, but after several hours, cpu utilization starts to degrade, slowly. After 24 hours, it's on 90-95% and falling.
The question is - what should I look for, and what best-known-methods can I use to debug this?
Additional info:
I have enough RAM - most of it is unused at any given moment.
According to perfmon - memory doesn't grow (so I don't think it's leaking).
The code is a mix of .Net and native c++, with some data marshaling back and forth.
I saw this on several different machines (servers with 24 logical cores).
One thing I saw in perfmon - Modified Page List Bytes indicator increases over time as CPU utilization degrades.
Edit 1
One of the third party libraries that is used is openfst. Looks like it's very related to some mis-usage of that library.
Specifically, I noticed that I have the following warnings:
warning LNK4087: CONSTANT keyword is obsolete; use DATA
Edit 2
Since the question is closed, and wasn't reopened, I will write my findings and how the issue was solved in the body of the question (sorry) for future users.
Turns out there is an openfst.def file that defines all the openfst FLAGS_* symbols to be used by consuming applications/dlls. I had to fix those to use the keyword "DATA" instead of "CONSTANT" (CONSTANT is obsolete because it's risky - more info: https://msdn.microsoft.com/en-us/library/aa271769(v=vs.60).aspx).
After that - no more degradation in CPU utilization was observed. No more rise in "modified page list bytes" indicator. I suspect that it was related to the default values of the FLAGS (specifically the garbage collection flags - FLAGS_fst_default_cache_gc) which were non deterministic because of the misusage of CONSTANT keyword in openfst.def file.
Conclusion Understand your warnings! Eliminate as much of them as you can!
Thanks.
For a non-obvious issue like this, you should also use a profiler that actually samples the underlying hardware counters in the CPU. Most profilers that I’m familiar with use kernel supplied statistics and not the underlying HW counters. This is especially true in Windows. (The reason is in part legacy, and in part that Windows wants its kernel statistics to be independent of hardware. PAPI APIs attempt to address this but are still relatively new.)
One of the best profilers is Intel’s VTune. Yes, I work for Intel but the internal HPC people use VTune as well. Unfortunately, it costs. If you’re a student, there are discounts. If not, there is a trial period.
You can find a lot of optimization and performance issue diagnosis information at software.intel.com. Here are pointers for optimization and for profiling. Even if you are not using an x86 architecture, the techniques are still valid.
As to what might be the issue, a degradation that slow is strange.
How often do you use new memory or access old? At what rate? If the rate is very slow, you might still be running into a situation where you are slowing using up a resource, e.g. pages.
What are your memory access patterns? Does it change over time? How rapidly? Perhaps your memory access patterns over time are spreading, resulting in more cache misses.
Perhaps your partitioning of the problem space is such that you have entered a new computational domain and there is no real pathology.
Look at whether there are periodic maintenance activities that take place over a longer interval, though this would result in a periodic degradation, say every 24 hours. This doesn’t sound like your situation since you are experiencing is a gradual degradation.
If you are using an x86 architecture, consider submitting a question in an Intel forum (e.g. "Intel® Clusters and HPC Technology" and "Software Tuning, Performance Optimization & Platform Monitoring").
Let us know what you ultimately find out.

c++ high memory usage application [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am running a memory intensive c++ application and it is being killed by the kernel for excessively high memory usage. I would have thought that the os will automatically use swap when ram gets full. However, I don't think my swap space is getting utilised.
I have read the following two questions, but I can't relate it to my problem.
"How to avoid running out of memory in high memory usage application? C / C++"
Who "Killed" my process and why?
I will be grateful if someone can give me some hints/pointers to how I may solve this problem. Thanks.
Edit: I am running my application on a 64 bit machine linux machine. My ram and swap is 6gb and 12gb respectively.
I suspect your process is asking for more memory than is available. In situations where you know you're going to use the memory you ask for, you need to disable memory overcommit:
echo 2 > /proc/sys/vm/overcommit_memory
and/or put
vm.overcommit_memory=2
in /etc/sysctl.conf so the setting survives reboots.
If your process asks for 32 GB of RAM on a machine with 16 GB of RAM + swap, your malloc() (or new...) calls might very well succeed, but once you try to use that memory your process is going to get killed.
Perhaps you have (virtual) memory framgentation and are trying to allocate a large block of memory which the OS cannot find as a contiguous block?
For instance an array would require this, but if you create a large linked list on the heap you should be able to allocate non-contiguous memory.
How much memory are you trying to allocate and how, and do you have suffiecient amount of free resources? If you debug your application what happens when the process is getting killed?

Low RAM usage + frequent allocation/deallocation causes Linux to swap out other programs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
the program i am working on at the moment processes a large amount of data (>32GB). Due to "pipelining" however, a maximum of arround 600 MB is present in the main memory at each given time (i checked that, that works as planned).
If the program has finished however and i switch back to the workspace with Firefox open, for example (but also others), it takes a while till i can use it again (also HDD is highly active for a while). This fact makes me wonder if Linux (operating system i use) swaps out other programs while my program is running and why?
I have 4 GB of RAM installed on my machine and while my program is active it never goes above 2 GB of utilization.
My program only allocates/deallocates dynamic memory of only two different sizes. 32 and 64 MB chunks. It is written in C++ and i use new and delete. Should Linux not be sufficiently smart enough to reuse these blocks once i freed them and leave my other memory untouched?
Why does Linux kick my stuff out of the memory?
Is this some other effect i have not considered?
Can i work arround this problem without writing a custom memory management system?
The most likely culprit is file caching. The good news is that you can disable file caching. Without caching, your software will run more quickly, but only if you don't need to reload the same data later.
You can do this directly with linux APIs, but I suggest you use a library such as Boost ASIO. If your software is I/O bound, you should additionally make use of asynchronous I/O to improve performance.
All the recently-used pages are causing older pages to get squeezed out of the disk cache. As a result, when some other program runs, it has to paged back in.
What you want to do is use posix_fadvise (or posix_madvise if you're memory mapping the file) to eject pages you've forced the OS to cache so that your program doesn't have a huge cache footprint. This will let older pages from other programs remain in cache.

How to measure memory bandwidth currently being used on Linux?

I'm writing a small Linux application which logs the computer's power consumption along with CPU utilisation and disk utilisation. I'd like to add the ability to log memory bandwidth currently being used so I can see how well that correlates with a power consumption.
I understand that I can get information about the amount of memory currently allocated from /proc/meminfo but, of course, that doesn't tell me how much bandwidth is being used at present. Does anyone know how I could measure memory bandwidth currently in use?
edit
I'd like this to work primarily on the x86 and x86-64 platforms
It's highly CPU-dependent but you'll need to be able to get access to the CPU's performance registers. You may be able to do this via oprofile. Note that not all CPUs have a performance register (or combination of registers) which can be used to calculate to memory bandwidth usage, however.

Staying away from virtual memory in Windows\C++

I'm writing a performance critical application where its essential to store as much data as possible in the physical memory before dumping to disc.
I can use ::GlobalMemoryStatusEx(...) and ::GetProcessMemoryInfo(...) to find out what percentage of physical memory is reserved\free and how much memory my current process handles.
Using this data I can make sure to dump when ~90% of the physical memory is in use or ~90 of the maximum of 2GB per application limit is hit.
However, I would like a method for simply recieving how many bytes are actually left before the system will start using the virtual memory, especially as the application will be compiled for both 32bit and 64bit, whereas the 2 GB limit doesnt exist.
How about this function:
int
bytesLeftUntilVMUsed() {
return 0;
}
it should give the correct result in nearly all cases I think ;)
Imagine running Windows 7 in 256Mb of RAM (MS suggest 1GB minimum). That's effectively what you're asking the user to do by wanting to reseve 90% of available RAM.
The real question is: Why do you need so much RAM? What is the 'performance critical' criteria exactly?
Usually, this kind of question implies there's something horribly wrong with your design.
Update:
Using top of the range RAM (DDR3) would give you a theoretical transfer speed of 12GB/s which equates to reading one 32 bit value every clock cycle with some bandwidth to spare. I'm fairly sure that it is not possible to do anything useful with the data coming into the CPU at that speed - instruction processing stalls would interrupt this flow. The extra, unsued bandwidth can be used to page data to/from a hard disk. Using RAID this transfer rate can be quite high (about 1/16th of the RAM bandwidth). So it would be feasible to transfer data to/from the disk and process it without having any degradation of performance - 16 cycles between reads is all it would take (OK, my maths might be a bit wrong here).
But if you throw Windows into the mix, it all goes to pot. Your memory can go away at any moment, your application can be paused arbitrarily and so on. Locking memory to RAM would have adverse affects on the whole system, thus defeating the purpose of locing the memory.
If you explain what you're trying to acheive and the performance critria, there are many people here that will help develop a suitable solution, because if you have to ask about system limits, you really are doing something wrong.
Even if you're able to stop your application from having memory paged out to disk, you'll still run into the problem that the VMM might be paging out other programs to disk and that might potentially affect your performance as well. Not to mention that another application might start up and consume memory that you're currently occupying and thus resulting in some of your applications memory being paged out. How are you planning to deal with that?
There is a way to use non-pageable memory via the non-paged pool but (a) this pool is comparatively small and (b) it's used by device drivers and might only be usable from inside the kernel. It's also not really recommended to use large chunks of it unless you want to make sure your system isn't that stable.
You might want to revisit the design of your application and try to work around the possibility of having memory paged to disk before you either try to write your own VMM or turn a Windows machine into essentially a DOS box with more memory.
The standard solution is to not worry about "virtual" and worry about "dynamic".
The "virtual" part of virtual memory has to be looked at as a hardware function that you can only defeat by writing your own OS.
The dynamic allocation of objects, however, is simply your application program's design.
Statically allocate simple arrays of the objects you'll need. Use those arrays of objects. Increase and decrease the size of those statically allocated arrays until you have performance problems.
Ouch. Non-paged pool (the amount of RAM which cannot be swapped or allocated to processes) is typically 256 MB. That's 12.5% of RAM on a 2GB machine. If another 90% of physical RAM would be allocated to a process, that leaves either -2,5% for all other applications, services, the kernel and drivers. Even if you'd allocate only 85% for your app, that would still leave only 2,5% = 51 MB.