I am writing a Fortran code which needs many continuation lines (thousands)
I found gfortran can compile without problem, however, ifort will complain that
catastrophic error: Statement too long
and the compilation is aborted.
Is there an option that can make ifort ignore the continuation lines limit?
There is a limit in the standard for the number of continuation lines. A processor may be less strict, but still will have limits. I don't think you can change that hard limit.
According to http://fortranwiki.org/fortran/show/Continuation+lines
The Intel Fortran compiler allows up to 511 lines by default, or 255
with -stand f95. The warning message is:
warning #5199: Too many continuation lines
the warning refers to more than 255 but less than 512, so it is not relevant to you.
This thread from 2007 also confirms 511 is a hard limit https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/268739
What you can do is to use very long lines which exceed the 132 characters limit of the Fortran standard (other compilers may require a flag to allow this). That is a recommendation of #SteveLionel in that thread.
Related
Let's say I have a C++ program and I compile it using g++. I then get an executable file which, as an example, has a size of 100 kb. I then add a couple of lines of C++ code and compile again and the size of the executable has increased to 101 kb. Then I add the exact same block of C++ code and compile a third time. This time the executable has increased to 106 kb. Why does it happen that the same code sometimes increases the size of the executable by one amount, and another time something much greater?
Also the big increase only happens every couple of times, most of the time it increases by the same, small, amount.
There are a variety of reasons why the size change of the resulting binary is not linear with the code size change. This is particularly true if some kind of optimization is enabled.
Even in debug mode (no optimizations), the following things could cause this to happen:
The code size in the binary typically needs to be aligned to a certain size (dependent on the hardware). The size can only grow in multiplies of the alignment.
The same applies for metadata tables (relocation tables, debug information)
The compiler reserving extra space for debug information, based just on the number of methods/variables in use
With some compilers (not sure about gcc), code in a binary can be updated in-place when only minor changes where done, instead of performing a full link on each build. This would result in different binary sizes when adding code and building vs. deleting the binary before each build.
If optimizations are enabled, it gets even more confusing, due to possible optimization strategies:
The compiler may remove code he finds is unreachable
If optimizing for speed, loop unrolling is a good thing to do, but only up to a certain degree. If adding more code inside the loop, the compiler might decide that the extra code size is no longer worth the speed gain.
Also other optimizations work only up to a certain level, after which they do more harm than good. This could even result in the binary file getting smaller by adding code.
These are just a bunch of possible reasons, there might be many more.
I am compiling intel tbb community version tbb2017_20161128oss. While compiling it runs few test cases. in one of the test case it gives me the warning
./test_global_control.exe
TBB Warning: The number of workers is currently limited to 0. The request for 1 workers is ignored. Further requests for more workers will be silently ignored until the limit changes.
What does this warning mean for my platform? Should I refrain from using certain components of ITBB?
Usually for TBB tests you can ignore run-time warnings starting from "TBB warning". Generally, these warnings are to tell programmers that they possibly use TBB sub-optimally or incorrectly. In the tests, however, the library is used in very complicated ways, and so sometimes warnings are issued.
This particular warning tells that a program first has limited the number of worker threads allowed to use, and then tries to request more workers than the limit allows. For the test, it's important to check that the behavior is correct in such corner cases; but the warning is outside of its control.
In real applications, these warnings can help diagnosing unexpected situations, and so should not be ignored.
As my GPU device Quadro FX 3700 doesn't support arch>sm_11. I was not able to use relocatable device code (rdc). Hence i combined all the utilities needed into 1 large file (say x.cu).
To give a overview of x.cu it contains 2 classes with 5 member functions each, 20 device functions, 1 global kernel, 1 kernel caller function.
Now, when i try to compile via Nsight it just hangs showing Build% as 3.
When i try compiling using
nvcc x.cu -o output -I"."
It shows the following Messages and compiles after a long time,
/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: Olimit was exceeded on function _Z18optimalOrderKernelPdP18PrepositioningCUDAdi; will not perform function-scope optimization.
To still perform function-scope optimization, use -OPT:Olimit=0 (no limit) or -OPT:Olimit=45022
/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: To override Olimit for all functions in file, use -OPT:Olimit=45022
(Compiler may run out of memory or run very slowly for large Olimit values)
Where optimalOrderKernel is the global kernel. As compiling shouldn't be taking much time. I want to understand the reason behind this messages, particularly Olimit.
Olimit is pretty clear, I think. It is a limit the compiler places on the amount of effort it will expend on optimizing code.
Most codes compile just fine using nvcc. However, no compiler is perfect, and some seemingly innocuous codes can cause the compiler to spend a long time at an optimization process that would normally be quick.
Since you haven't provided any code, I'm speaking in generalities.
Since there is the occasional case where the compiler spends a disproportionately long time in certain optimization phases, the Olimit provides a convenient watchdog, so you have some idea of why it is taking so long. Furthermore, the Olimit acts like a watchdog on an optimization process that is taking too long. When it is exceeded, certain optimization steps are aborted, and a "less optimized" version of your code is generated, instead.
I think the compiler messages you received are quite clear on how to modify the Olimit depending on your intentions. You can override it to increase the watchdog period, or disable it entirely (by setting it to zero). In that case, the compile process could take an arbitrarily long period of time, and/or run out of memory, as the messages indicate.
I got a C++ program (source) that is said to work in parallel. However, if I compile it (I am using Ubuntu 10.04 and g++ 4.4.3) with g++ and run it, one of my two CPU cores gets full load while the other is doing "nothing".
So I spoke to the one who gave me the program. I was told that I had to set specific flags for g++ in order to get the program compiled for 2 CPU cores. However, if I look at the code I'm not able to find any lines that point to parallelism.
So I have two questions:
Are there any C++-intrinsics for multithreaded applications, i.e. is it possible to write parallel code without any extra libraries (because I did not find any non-standard libraries included)?
Is it true that there are indeed flags for g++ that tell the compiler to compile the program for 2 CPU cores and to compile it so it runs in parallel (and if: what are they)?
AFAIK there are no compiler flags designed to make a single-threaded application exploit parallelism (it's definitely a nontrivial operation), with the exception of parallelization of loops iterations (-ftree-parallelize-loops), that, still, must be activated carefully; still, even if there's no explicit threads creation, there may be some OpenMP directives to parallelize several instruction sequences.
Look for the occurrence of "thread" and/or "std::thread" in the source code.
The current C++ language standard has no support for multi-processing in the language or the standard library. The proposed C++0x standard does have some support for threads, locks etc. I am not aware of any flags for g++ that would magically make your program do multi-processing, and it's hard to see what such flags could do.
The only thing I can think of is openMosix or LinuxPMI (the successor of openMosix). If the code uses processes then process "migration" technique makes is possible to put processes at work on different machines (which have the specified linux distribution installed).
Check for threads (grep -i thread), processes (grep fork) in your code. If none of this exists, then check for MPI. MPI requires some extra configuration since I recall (only used it for some homeworks in faculty).
As mentioned gcc (and others) implements some ways of parallelism with OpenMP with some pragmas.
I have read that there is some overhead to using C++ exceptions for exception handling as opposed to, say, checking return values. I'm only talking about overhead that is incurred when no exception is thrown. I'm also assuming that you would need to implement the code that actually checks the return value and does the appropriate thing, whatever would be the equivalent to what the catch block would have done. And, it's also not fair to compare code that throws exception objects with 45 state variables inside to code that returns a negative integer for every error.
I'm not trying to build a case for or against C++ exceptions solely based on which one might execute faster. I heard someone make the case recently that code using exceptions ought to run just as fast as code based on return codes, once you take into account all the extra bookkeeping code that would be needed to check the return values and handle the errors. What am I missing?
There is a cost associated with exception handling on some platforms and with some compilers.
Namely, Visual Studio, when building a 32-bit target, will register a handler in every function that has local variables with non-trivial destructor. Basically, it sets up a try/finally handler.
The other technique, employed by gcc and Visual Studio targeting 64-bits, only incurs overhead when an exception is thrown (the technique involves traversing the call stack and table lookup). In cases where exceptions are rarely thrown, this can actually lead to a more efficient code, as error codes don't have to be processed.
Only try/catch and try/except block take a few instructions to set up. The overhead should generally be negligible in every case except the tighest loops. But you wouldn't normally use try/catch/except in an inner loop anyway.
I would advise not to worry about this, and use a profiler instead to optimize your code where needed.
It's completely implementation dependent but many recent implementations have very little or no performance overhead when exceptions aren't thrown. In fact you are right. Correctly checking return codes from all functions in code that doesn't use exceptions can be slower then doing nothing for code using exceptions.
Of course, you would need to measure the performance for your particular requirements to be sure.
There is some overhead with exceptions (as the other answers pointed out).
But you do not have much of a choice nowadays. Try do disable exceptions in your project, and make sure that ALL dependent code and libraries can compile and run without.
Do they work with exceptions disabled?
Lets assume they do! Then benchmark some cases, but note that you have to set a "disable exceptions" compile switch. Without that switch you still have the overhead - even if the code never throws exceptions.
Only overhead is ~6 instructions which add 2 SEH at the start of the function and leave them at the end. No matter how many try/catches you have in a thread it is always the same.
Also what is this about local variables? I hear people always complaining about them when using try/catch. I don't get it, because the deconstructors would eventually be called anyways. Also you shouldn't be letting an exception go up more then 1-3 calls.
I took Chip Uni's test code and expanded it a bit. I split the code into two source files (one with exceptions; one without). I made each benchmark run 1000 times, and I used clock_gettime() with CLOCK_REALTIME to record the start and end times of each iteration. Then I computed the mean and variance of the data. I ran this test with 64-bit versions of g++ 5.2.0 and clang++ 3.7.0 on an Intel Core i7 box with 16GB RAM that runs ArchLinux with kernel 4.2.5-1-ARCH. You can find the expanded code and the full results here.
g++
No Exceptions
Average: 30,022,994 nanoseconds
Standard Deviation: 1.25327e+06 nanoseconds
Exceptions
Average: 30,025,642 nanoseconds
Standard Deviation: 1.83422e+06 nanoseconds
clang++
No Exceptions
Average: 20,954,657 nanoseconds
Standard Deviation: 426,662 nanoseconds
Exceptions
Average: 23,916,638 nanoseconds
Standard Deviation: 1.72583e+06 nanoseconds
C++ Exceptions only incur a non-trivial performance penalty with clang++, and even that penalty is only ~14%.