I've been wondering...
Are there some limitations with ccache?
If the difference in later compile times are so large,
why aren't more Linux developers using ccache more often?
I guess that the simple answer is that ccache is great when the build system is broken (i.e. the dependencies are not correctly tracked, and to get everything built correctly you might need make clean; make). On the other hand, if dependencies are correctly tracked, then ccache will not yield any advantage over plain make, and will actually incur the cost of maintaining the cache and updating it (the size of the cache might be huge depending on the size of the project)
Related
I have been trying to install LLVM on my system [i7 + 16GB RAM]. I have been following this tutorial : LLVM Install. But in building, it eats up all the RAM and the terminal closes automatically. Is there any way to solve this?
Thanks.
The resources consumed during build can depend on various factors:
Number of build targets that you are building. In general you should be able to skip a bunch of build targets (compiler-rt, libcxx etc)
The type of binaries that will be generated. I mean, shared vs. static. Enabling shared libraries (BUILD_SHARED_LIBS:ON) will consume way less memory.
The type of optimization flag. Debug, Release, RelWithDebInfo will also have an effect. The Debug build will have larger binary size so it may consume more memory during the link step. But the build time will be faster as few optimizations are enabled. Release build may consume less RAM during the link step.
Number of threads -jN
TLDR for reducing RAM pressure:
Enable shared libraries
Use Release builds
Keep number of parallel threads low (Instead of max jN try, -j(N-2)). Using -j1 may use less RAM but would take long time to build.
Skip building as many libraries (e.g., LLVM_ENABLE_RUNTIMES) and targets (e.g., LLVM_TARGETS_TO_BUILD) as you can. This may not be trivial as it requires spending time with the CMakeCache.txt file.
Build only what you want e.g., instead of invoking just ninja, invoke ninja clang, or ninja opt etc.
The -march=native option to gcc generates different code depending on the architecture of the host. ccache does not store the machine architecture in its hash which means that if you change the architecture of the machine, for example to switch to a high performance VPS node, the cached object files may be incompatible.
How can I make sure I get the correct object files while still taking advantage of caching?
ccache does not store the architecture, but it does store the compiler flags that were used when building an object for the first time. Therefore a potential solution to your problem could be to use a thin wrapper script that would expand -march=native to the actual set of flags (e.g. using something like this), before passing them to ccache.
(I will, of course, leave the actual implementation as an exercise to the reader..)
You can make sure to use machines with identical architectures in your build farm. Apart from that, I don't see how you can solve that problem.
Also remember that if you use -march=native then anyone who wants to run your binary needs to have the same (or possibly, newer but backwards-compatible) machine architecture. Which may or may not be a problem.
I'm developing HFT trading application which supposed to run on one machine only. So when compile I add -march=native -mtune=native flags
But boost is installed from repository: yum install boost and yum install boost-devel
What if I download boost sources and compile them myself on the target PC with -march=native -mtune=native enabled flags. Will this speed-up my application? Will it be signifficant performance improvement? If it worth it?
Theoretically. Yes. But since boost is highly generic, the chances of this being significant are really slim.
In my experience the only thing that requires prebuilt libraries have to do with lots of static data (UNICODE, localization) and non-generic facilities that don't need performance
In general: If you have performance issues in your application, try to find out where the problem is exactly. For that enable performance analyzes with your toolchain.
http://www.thegeekstuff.com/2012/08/gprof-tutorial/
To your question: Boost is mostly template stuff which is always compiled while included in your application. Before searching inside library stuff I would check with grpof where your performance really lack is. I think that enabling some optimization flags while compiling the boost libs will have not much effect. But why you don't try it?
In HFT time really counts. so have your design reviewed by the best professionals you can get. have your code reviewed by the best professionals you can get. switch to a state of the art compiler and use move semantics. design for concurrency to reduce latency. and profile, profile, profile. in my nice little real time world at some point you are fast enough and done. you will probably be in a constant struggle for microseconds.
I'm new to qt.I made a very simple qt program, since I didn't had the commercial license I used dependency walker and added all the .dlls from my qtsdk folder to my release folder. It worked just fine.But when I checked its size it was very large.My question is how can I make it smaller.Also explain what is the difference between static and dynamic libraries and how it affects the size of an application.How many ways are there to make qt app smaller with out buying the commercial license.Any help is appreciated.
In general, there's nothing you can do to significantly reduce the size of the application and its dependencies. There are a few things I don't think you have considered or, if necessary can be done:
You don't need all Qt DLLs. Depending on which parts of Qt you used, you can leave out things like QtWebKit and others. This might well cut your applications redistribution size in half.
If it is your executable that is large, try compiling with options to optimize for size instead of speed. The final performance should be near or maybe even better than what the usual optimization does, depending on your application. This is -Os for GCC and /Os for MSVC.
Strip the DLLs and executables of debug symbols. Release builds should have this already done automatically. I do hope you're not complaining about the size of debug binaries...
Compact the binaries with something like UPX. This is only cosmetic though, and comes with some subtle drawbacks.
Recompile Qt with the above options, this might shave of 5-10% of the total Qt DLL size. You could also disable features (like unused Qt styles) you don't need, further reducing the size of the DLLs. But this is tedious and probably not worth it, as you'd need to recompile Qt whenever you update to the newest version.
That being said, if your code is GPL licensed, there's nothing stopping you from linking Qt statically. You will need to compile the static Qt libraries yourself, or find them somewhere. Generally, static linking will increase the size of the executable, but might allow the compiler to omit unused code from the image that would usually still be present in the DLL version. So yes, static linking might make your application's total size smaller.
I want to compile a C++ program to an intermediate code. Then, I want to compile the intermediate code for the current processor with all of its resources.
The first step is to compile the C++ program with optimizations (-O2), run the linker and do most of the compilation procedure. This step must be independent of operating system and architecture.
The second step is to compile the result of the first step, without the original source code, for the operating system and processor of the current computer, with optimizations and special instructions of the processor (-march=native). The second step should be fast and with minimal software requirements.
Can I do it? How to do it?
Edit:
I want to do it, because I want to distribute a platform independent program that can use all resources of the processor, without the original source code, instead of distributing a compilation for each platform and operating system. It would be good if the second step be fast and easy.
Processors of the same architecture may have different features. X86 processors may have SSE1, SSE2 or others, and they can be 32 or 64 bit. If I compile for a generic X86, it will lack of SSE optimizations. After many years, processors will have new features, and the program will need to be compiled for new processors.
Just a suggestion - google clang and LLVM.
How much do you know about compilers? You seem to treat "-O2" as some magical flag.
For instance, register assignment is a typical optimization. You definitely need to now how many registers are available. No point in assigning foo to register 16, and then discover in phase 2 that you're targetting an x86.
And those architecture-dependent optimizations can be quite complex. Inlining depends critically on call cost, and that in turn depends on architecture.
Once you get to "processor-specific" optimizations, things get really tricky. It's really tough for a platform-specific compiler to be truly "generic" in its generation of object or "intermediate" code at an appropriate "level": Unless it's something like "IL" (intermediate language) code (like the C#-IL code, or Java bytecode), it's really tough for a given compiler to know "where to stop" (since optimizations occur all over the place at different levels of the compilation when target platform knowledge exists).
Another thought: What about compiling to "preprocessed" source code, typically with a "*.i" extension, and then compile in a distributed manner on different architectures?
For example, most (all) the C and C++ compilers support something like:
cc /P MyFile.cpp
gcc -E MyFile.cpp
...each generates MyFile.i, which is the preprocessed file. Now that the file has included ALL the headers and other #defines, you can compile that *.i file to the target object file (or executable) after distributing it to other systems. (You might need to get clever if your preprocessor macros are specific to the target platform, but it should be quite straight-forward with your build system, which should generate the command line to do this pre-processing.)
This is the approach used by distcc to preprocess the file locally, so remote "build farms" need not have any headers or other packages installed. (You are guaranteed to get the same build product, no matter how the machines in the build farm are configured.)
Thus, it would similarly have the effect of centralizing the "configuration/pre-processing" for a single machine, but provide cross-compiling, platform-specific compiling, or build-farm support in a distributed manner.
FYI -- I really like the distcc concept, but the last update for that particular project was in 2008. So, I'd be interested in other similar tools/products if you find them. (In the mean time, I'm writing a similar tool.)