ramdisk with a file mirror - c++

I wanted to speed up compilation so i was thinking i could have my files be build on a ramdisk but also have it flushed to the filesystem automatically and use the filesystem if there is not enough ram.
I may need something similar for an app i am writing where i would like files to be cached in ram and flushed into the FS. What are my options? Is there something like this that already exist? (perhaps fuse?) The app is a toy app (for now) and i would need to compile c++ code repeatedly. As we know, the longer it takes to compile when there is a specific problem to solve before progressing. the less we can get done.

Ram-disks went the way of the dodo with the file system cache. It can make much better decisions than a static cache, having awareness of RAM usage by other programs and the position of the disk write head. The lazy write-back is for free.

Compilation is CPU-bound, not disk bound. If you utilize all your CPU cores using the appropriate build flag, you can easily saturate them on typical PCs. Unless you have some sort of supercomputer, I don't think this will speed things up much.
For VS2008 this flag is /MP. It also exists on VS2005.

Not sure it helps answering a question from almost 12 years ago, but just now I was looking for some software to sync file systems or directories to do exactly that.
Both answers are correct, in the sense that the file system cache will attempt to predict what you need and make available in RAM, and, generally, building a software benefits a lot of multithreading, possibly maxing out your CPU. But right now I'm building with Unity and I don't have control over build optimization.
That said, a virtual RAM disk has advantages, because those files are always in RAM, different from file system cache (FSC) that has to deal with resources and applications competing for disk access.
Another difference is that when an application closes a file handle or forces a sync, FSC will attempt to get that files as soon as possible to the disk, in order to avoid problems (power failure and so on). I believe you can alter the FSC behavior on Linux.
A sync to a RAM disk does not write to disk, that may be responsible for the difference of performance you said in your comment.
That said, I still need to look for something to auto-sync my two file systems!

Related

Are there cross-platform ways to promote prefetching for reading a large Boost memory mapped_file?

I have a C++ application for Windows and Linux (e.g. Boost 1.53.2 on Amazon Linux) that is using Boost::iostream::mapped_file (i.e. a memory mapped file). The documentation doesn't mention "prefetch".
The application needs to sequentially read through large read-only files rapidly. Sometimes these files will be larger than available memory. So loading the whole file into memory at once may not be an option. But in all cases, processing will proceed from the beginning to the end sequentially.
It would be helpful if prefetching of upcoming pages happens in a way that keeps ahead of the processing of the pages (i.e. upcoming pages in memory before they are needed), yet not so far ahead that not-yet-processed pages are dropped from memory to make room.
I'm wondering if there are helpful cross-platform ways (Windows and Linux) to give hints or direction or otherwise promote the automated prefetching of the pages that will be needed in the not distant future. I expect the OS might do this to some degree automatically, but I am wondering if there is a convenient technique I should be using to improve on whatever is the default behavior.
Thanks in advance!
Not sure how portable, but I included fadvise and madvise in this answer:
boost file_mapping performance
Seems there are some good pointers for non-POSIX windows here: What is fadvise/madvise equivalent on windows?

does building from a RAM drive truly yield speed increase?

I'm working on a project that has thousands of .cpp files plus thousands more .h and .hpp and the build takes 28min running from an SSD.
We inherited this project from a different company just weeks ago but perusing the makefiles, they explicitly disabled parallel builds via the .NOPARALLEL phony target; we're trying to find out if they have a good reason.
Worst case, the only way to speed this up is to use a RAM drive.
So I followed the instructions from Tekrevue and installed Imdisk and then ran benchmarks using CrystalDiskMark:
SSD
RAM Drive
I also ran dd using Cygwin and there's a significant speedup (at least 3x) on the RAM drive compared to my SSD.
However, my build time changes not one minute!
So then I thought: maybe my proprietary compiler calls some Windows API and causes a huge slowdown so I built fftw from source on Cygwin.
What I expected is that my processor usage would increase to some max and stay there for the duration of the build. Instead, my usage was very spiky: one for each file compiled. I understand even Cygwin still has to interact with windows so the fact that I still got spiky proc usage makes me assume that it's not my compiler that's the issue.
Ok. New theory: invoking compiler for each source-file has some huge overhead in Windows so, I copy-pasted from my build-log and passed 45 files to my compiler and compared it to invoking the compiler 45 times separately. Invoking ONCE was faster but only by 4 secs total for the 45 files.
And I saw the same "spiky" processor usage as when invoking compiler once for each file.
Why can't I get the compiler to run faster even when running from RAM drive? What's the overhead?
UPDATE #1
Commenters have been saying, I think, that the RAM drive thing is kind of unnecessary bc windows will cache the input and output files in RAM anyway.
Plus, maybe the RAM drive implementation (ie drivers) is sub-optimal.
So, I'm not using the RAM drive anymore.
Also, people have said that I should run the 45-file build multiple times so as to remove the overhead for caching: I ran it 4 times and each time it was 52secs.
CPU usage (taken 5 secs before compilation ended)
Virtual memory usage
When the compiler spits out stuff to disk, it's actually cached in RAM first, right?
Well then this screenshot indicates that IO is not an issue or rather, it's as fast as my RAM.
Question:
So since everything is in RAM, why isn't the CPU % higher more of the time?
Is there anything I can do to make single- threaded/job build go faster?
(Remember this is single-threaded build for now)
UPDATE 2
It was suggested below that I should set the affinity, of my compile-45-files invocation, to 1 so that windows won't bounce around the invocation to multiple cores.
The result:
100% single-core usage! for the same 52secs
So it wasn't the hard drive, RAM or the cache but CPU that's the bottleneck.
**THANK YOU ALL! ** for your help
========================================================================
My machine: Intel i7-4710MQ # 2.5GHz, 16GB RAM
I don't see why you are blaming so much the operating system, besides sequential, dumb IO (to load sources/save intermediate output - which should be ruled out by seeing that an SSD and a ramdisk perform the same) and process starting (ruled out by compiling a single giant file) there's very little interaction between the compiler and the operating system.
Now, once you ruled out "disk" and processor, I expect the bottleneck to be the memory speed - not for the RAM-disk IO part (which probably was already mostly saturated by the SSD), but for the compilation process itself.
That's actually quite a common problem, at this moment of time processors are usually faster than memory, which is often the bottleneck (that's the reason why currently it's critical to write cache-friendly code). The processor is probably wasting some significant time waiting for out of cache data to be fetched from main memory.
Anyway, this is all speculation. If you want a reliable answer, as usual you have to profile. Grab some sampling profiler from a list like this and go see where the compiler is wasting time. Personally, I expect to see a healthy dose of cache misses (or even page faults if you burned too much RAM for the ramdisk), but anything can be.
Reading your source code from the drive is a very, very small part of the overhead of compiling software. Your CPU speed is far more relevant, as parsing and generating binaries are the slowest part of the process.
**Update
Your graphs show a very busy CPU, I am not sure what you expect to see. Unless the build is multithreaded AND your kernel stops scheduling other, less intensive threads, this is certainly the graph of a busy processor.
Your trace is showing 23% CPU usage. Your CPU has 4 actual cores (with hyperthreading to make it look like 8). So, you're using exactly one core to its absolute maximum (plus or minus 2%, which is probably better accuracy than you can really expect).
The obvious conclusion from this would be that your build process is CPU bound, so improving your disk speed is unlikely to make much difference.
If you want substantially faster builds, you need to either figure out what's wrong with your current makefiles or else write entirely new ones without the problems, so you can support both partial and parallel builds.
That can gain you a lot. Essentially anything else you do (speeding up disks, overclocking the CPU, etc.) is going to give minor gains at best (maybe 20% if you're really lucky, where a proper build environment will probably give at least a 20:1 improvement for most typical builds).

Why is the first compilation of the day slower than next ones?

I'm using GCC/G++ for almost all my C and C++ projects for years, and there is something I never understood: the first use of one of them that you do after the start of your machine take always one or two seconds more than every next compilation until you shut down your computer.
As it is only once in a days, it is not really troublesome, but I just wonder why. Does it initialize anything and re-use it then?
If yes, does it save something somewhere?
If yes, why does it have to reinitialize it everyday?
Notably because of the page cache. If you have enough RAM, most of the data (including your source code files, and object files also) tend to stay there.
If that bothers you, use some SSD disks and/or run while drinking your first coffee before your first compilation a command reading all your source code, e.g. on Linux wc *.cc (and you could even make it a crontab job with #reboot); read also http://linuxatemyram.com/
You might also use the suspend-to-disk facility instead of the shutdown: but I don't think it is worth the pain (and I like /tmp/ being cleaned at boot time)
(I am guessing or hoping you are using Linux with a native filesystem, e.g. Ext4 or BTRFS)
The short answer is caching, possibly at multiple levels.
For example, a lot of modern hard drives have a built in cache, so if some data is being read repeatedly (e.g. the gcc executables, libraries, makefiles, source files) the first access will result in data being in the device cache, and subsequent operations will interact with the cache rather than reading from the platters. Operations on the cache are must faster than - for example - positioning drive platters for reading.
A number of operating systems also implement some form of page cache, which does a similar thing (just using system RAM as a cache).
Most modern machines have various levels of cache (in devices, managed by the OS, multiple cache levels in the CPU, etc).
Caching can, depending on configuration, affect both read and write operations. Caches that affect writing (write operations actually write to cache and return, and are committed - such as to a drive - at a later time) are part of the reason operating systems require an explicit shutdown, so they commit all cached writes.

How to avoid HDD thrashing

I am developing a large program which uses a lot of memory. The program is quite experimental and I add and remove big chunks of code all the time. Sometimes I will add a routine that is rather too memory hungry and the HDD drive will start thrashing and the program (and the whole system) will slow to a snails pace. It can easily take 5 mins to shut it down!
What I would like is a mechanism for avoiding this scenario. Either a run time procedure or even something to be done before running the program, which can say something like "If you run this program there is a risk of HDD thrashing - aborting now to avoid slowing to a snails pace".
Any ideas?
EDIT: Forgot to mention, my program uses multiple threads.
You could consider using SetProcessWorkingSetSize . This would be useful in debugging, because your app will crash with a fatal exception when it runs out of memory instead of dragging your machine into a thrashing situation.
http://msdn.microsoft.com/en-us/library/ms686234%28VS.85%29.aspx
Similar SO question
Set Windows process (or user) memory limit
Windows XP is terrible when there are multiple threads or processes accessing the disk at the same time. This is effectively what you experience when your application begins to swap, as the OS is writing out some pages while reading in others. Windows XP (and Server 2003 for that matter) is utterly trash for this. This is a real shame, as it means that swapping is almost synonymous with thrashing on these systems.
Your options:
Microsoft fixed this problem in Vista and Server 2008. So stop using a 9 year old OS. :)
Use unbuffered I/O to read/write data to a file, and implement your own paging inside your application. Implementing your own "swap" like this enables you to avoid thrashing.
See here many more details of this problem: How to obtain good concurrent read performance from disk
I'm not familiar with Windows programming, but under Unix you can limit the amount of memory that a program can use with setrlimit(). Maybe there is something similar. The goal is to get the program to abort once it uses to much memory, rather than thrashing. The limit would be a bit less than the total physical memory on the machine. I would guess somewhere between 75% and 90%, but some experimentation would be necessary to find the optimal setting.
Chances are your program could use some memory management. While there are a few programs that do need to hold everything in memory at once, odds are good that with a little bit of foresight you might be able to rework your program to reuse or discard a lot of the memory you need.
Your program will run much faster too. If you are using that much memory, then basically all of your built-in first and second level caches are likely overflowing, meaning the CPU is mostly waiting on memory loads instead of processing your code's instructions.
I'd rather determine reasonable minimum requirements for the computer your program is supposed to run on, and during installation either warn the user if there's not enough memory available, or refuse to install.
Telling him each time he's starting the program is nonsensical.

Are there any downsides to using UPX to compress a Windows executable?

I've used UPX before to reduce the size of my Windows executables, but I must admit that I am naive to any negative side effects this could have. What's the downside to all of this packing/unpacking?
Are there scenarios in which anyone would recommend NOT UPX-ing an executable (e.g. when writing a DLL, Windows Service, or when targeting Vista or Win7)? I write most of my code in Delphi, but I've used UPX to compress C/C++ executables as well.
On a side note, I'm not running UPX in some attempt to protect my exe from disassemblers, only to reduce the size of the executable and prevent cursory tampering.
... there are downsides to
using EXE compressors. Most notably:
Upon startup of a compressed EXE/DLL, all of the code is
decompressed from the disk image into
memory in one pass, which can cause
disk thrashing if the system is low on
memory and is forced to access the
swap file. In contrast, with
uncompressed EXE/DLLs, the OS
allocates memory for code pages on
demand (i.e. when they are executed).
Multiple instances of a compressed EXE/DLL create multiple
instances of the code in memory. If
you have a compressed EXE that
contains 1 MB of code (before
compression) and the user starts 5
instances of it, approximately 4 MB of
memory is wasted. Likewise, if you
have a DLL that is 1 MB and it is used
by 5 running applications,
approximately 4 MB of memory is
wasted. With uncompressed EXE/DLLs,
code is only stored in memory once and
is shared between instances.
http://www.jrsoftware.org/striprlc.php#execomp
I'm surprised this hasn't been mentioned yet but using UPX-packed executables also increases the risk of producing false-positives from heuristic anti-virus software because statistically a lot of malware also uses UPX.
There are three drawbacks:
The whole code will be fully uncompressed in virtual memory, while in a regular EXE or DLL, only the code actually used is loaded in memory. This is especially relevant if only a small portion of the code in your EXE/DLL is used at each run.
If there are multiple instances of your DLL and EXE running, their code can't be shared across the instances, so you'll be using more memory.
If your EXE/DLL is already in cache, or on a very fast storage medium, or if the CPU you're running on is slow, you will experience reduced startup speed as decompression will still have to take place, and you won't benefit from the reduced size. This is especially true for an EXE that will be invoked multiple times repeatedly.
Thus the above drawbacks are more of an issue if your EXE or DLLs contains lots of resources, but otherwise, they may not be much of a factor in practice, given the relative size of executables and available memory, unless you're talking of DLLs used by lots of executables (like system DLLs).
To dispell some incorrect information in other answers:
UPX will not affect your ability to run on DEP-protected machines.
UPX will not affect the ability of major anti-virus software, as they support UPX-compressed executables (as well as other executable compression formats).
UPX has been able to use LZMA compression for some time now (7zip's compression algorithm), use the --lzma switch.
The only time size matters is during download off the Internet. If you are using UPX then you actually get worse performance than if you use 7-zip (based on my testing 7-Zip is twice as good as UPX). Then when it is actually left compressed on the target computer your performance is decreased (see Lars' answer). So UPX is not a good solution for file size. Just 7zip the whole thing.
As far as to prevent tampering, it is a FAIL as well. UPX supports decompressing too. If someone wants to modify the EXE then they will see it is compress with UPX and then uncompress it. The percentage of possible crackers you might slow down does not justify the effort and performance loss.
A better solution would be to use binary signing or at least just a hash. A simple hash verification system is to take a hash of your binary and a secret value (usually a guid). Only your EXE knows the secret value, so when it recalculates the hash for verification it can use it again. This isn't perfect (the secret value can be retrieved). The ideal situation would be to use a certificate and a signature.
The final size of the executable on disk is largely irrelevant these days. Your program may load a few milliseconds faster, but once it starts running the difference is indistinguishable.
Some people may be more suspicious of your executable just because it is compressed with UPX. Depending on your end users, this may or may not be an important consideration.
The last time I tried to use it on a managed assembly, it munged it so bad that the runtime refused to load it. That's the only time I can think of that you wouldn't want to use it (and, really, it's been so long since I tried that that the situation may even be better now). I've used it extensively in the past on all types of unmanaged binaries, and never had an issue.
If your only interest is in decreasing the size of the executables, then have you tried comparing the size of the executable with and without runtime packages? Granted you will have to also include the sizes of the packages overall along with your executable, but if you have multiple executables which use the same base packages, then your savings would be rather high.
Another thing to look at would be the graphics/glyphs you use in your program. You can save quite a bit of space by consolidating them to a single Timagelist included in a global data module rather than have them repeated on each form. I believe each image is stored in the form resource as hex, so that would mean that each byte takes up two bytes...you can shrink this a bit by loading the image from a RCData resource using a TResourceStream.
There are no drawbacks.
But just FYI, there is a very common misconception regarding UPX as--
resources are NOT just being compressed
Essentially you are building a new executable that has a "loader" duty and the "real" executable, well, is being section-stripped and compressed, placed as a binary-data resource of the loader executable (regardless the types of resources were in the original executable).
Using reverse-engineering methods and tools either for education purposes or other will show you the information regarding the "loader executable", and not variable information regarding the original executable.
IMHO routinely UPXing is pointless, but the reasons are spelled above, mostly, memory is more expensive than disk.
Erik: the LZMA stub might be bigger. Even if the algorithm is better, it does not always be a net plus.
Virus scanners that look for 'unknown' viruses can flag UPX compressed executables as having a virus. I have been told this is because several viruses use UPX to hide themselves. I have used UPX on software and McAfee will flag the file as having a virus.
The reason UPX has so many false alarms is because its open licensing allows malware authors to use and modify it with impunity. Of course, this issue is inherent to the industry, but sadly the great UPX project is plagued by this problem.
UPDATE: Note that as the Taggant project is completed, the ability to use UPX (or anything else) without causing false positives will be enhanced, assuming UPX supports it.
I believe there is a possibility that it might not work on computers that have DEP (Data Execution Prevention) turned on.
When Windows load a binary, first thing it does is called Import/Export Table resolution. Ie, what ever API and DLL that is indicated in the Import Table, it will first load the DLL into a randomly generated base address. And using the base address plus offset into the DLL's function, this information will be updated to the Import Table.
EXE does not have Export Table.
All these happened even before jumping to the original entry point for execution.
Then after it start executing from the entry point, the EXE will run a small piece of code before starting the decompression algorithm. This small piece of code also means that the Windows API needed will be very small, resulting in a small Import Table.
But after the binary is decompressed, if it started to use any Windows API not resolved before, then likely it is going to crash. So it is essential that the decompression routine will resolve and update the Import Table for all the referenced Window API inside the decompressed codes, before executing the decompressed codes.
References:
https://malwaretips.com/threads/malware-analysis-2-pe-imports-static-analysis.62135/