how does the std::sqrt() function work? [duplicate] - c++

This question already has answers here:
How is the square root function implemented? [closed]
(15 answers)
Closed 4 years ago.
Does anyone know how the std::sqrt() function works? (or at least have an idea?)
I've seen methods on the internet that seemed really slow, using lots of approximations and iterations.
Everyone knows sqrt() function is slow, but I'd like to know how the one from std works so I could have a vague idea of when it is beneficial to avoid it. (yes, if I want to be sure I can profile, but it's still nice to have a vague idea)
EDIT: Didn't really formulate the question too well... What I'm interested in:
how would the fastest C++ function calculating a square root look like? (more or less, I just want to know the actual logic behind it)

Nowadays, on modern machines, floating point functions are passed off to the hardware (floating point unit or math-coprocessor).

Sometimes, it uses what the CPU offers:
$ cat main.cc
#include <cmath>
#include <ctime>
#include <cstdlib>
int main(){
srand (clock());
const double d = rand();
return std::sqrt(d) > 2 ? 1 : 0;
}
(the blahblah is just so nothing relevant is optimized away, don't run that program!)
$ g++ -S main.cc
$ cat main.s
.file "main.cc"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB106:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call clock
movl %eax, %edi
call srand
call rand
cvtsi2sd %eax, %xmm1
sqrtsd %xmm1, %xmm0
ucomisd %xmm0, %xmm0
jp .L5
.L2:
xorl %eax, %eax
ucomisd .LC0(%rip), %xmm0
seta %al
addq $8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
movapd %xmm1, %xmm0
call sqrt
jmp .L2
.cfi_endproc
.LFE106:
.size main, .-main
.section .rodata.cst8,"aM",#progbits,8
.align 8
.LC0:
.long 0
.long 1073741824
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",#progbits
(hint: it is using a sqrt-cpu-instruction)

sqrt(); function Behind the scenes.
It always checks for the mid-points in a graph.
Example: sqrt(16)=4;
sqrt(4)=2;
Now if you give any input inside 16 or 4 like sqrt(10)==?
It finds the mid point of 2 and 4 i.e = x ,then again it finds the mid point of x and 4 (It excludes lower bound in this input). It repeats this step again and again until it gets the perfect answer i.e sqrt(10)==3.16227766017 .It lies b/w 2 and 4.All this in-built function are created using calculus,differentiation and Integration.

The standard does not specify a particular implementation.
One option is to look at a typical implementation, but you'll probably find it's heavily-optimised assembler.

Related

How to remove "noise" from GCC/clang assembly output?

I want to inspect the assembly output of applying boost::variant in my code in order to see which intermediate calls are optimized away.
When I compile the following example (with GCC 5.3 using g++ -O3 -std=c++14 -S), it seems as if the compiler optimizes away everything and directly returns 100:
(...)
main:
.LFB9320:
.cfi_startproc
movl $100, %eax
ret
.cfi_endproc
(...)
#include <boost/variant.hpp>
struct Foo
{
int get() { return 100; }
};
struct Bar
{
int get() { return 999; }
};
using Variant = boost::variant<Foo, Bar>;
int run(Variant v)
{
return boost::apply_visitor([](auto& x){return x.get();}, v);
}
int main()
{
Foo f;
return run(f);
}
However, the full assembly output contains much more than the above excerpt, which to me looks like it is never called. Is there a way to tell GCC/clang to remove all that "noise" and just output what is actually called when the program is ran?
full assembly output:
.file "main1.cpp"
.section .rodata.str1.8,"aMS",#progbits,1
.align 8
.LC0:
.string "/opt/boost/include/boost/variant/detail/forced_return.hpp"
.section .rodata.str1.1,"aMS",#progbits,1
.LC1:
.string "false"
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LCOLDB2:
.section .text._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LHOTB2:
.p2align 4,,15
.weak _ZN5boost6detail7variant13forced_returnIvEET_v
.type _ZN5boost6detail7variant13forced_returnIvEET_v, #function
_ZN5boost6detail7variant13forced_returnIvEET_v:
.LFB1197:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__, %ecx
movl $49, %edx
movl $.LC0, %esi
movl $.LC1, %edi
call __assert_fail
.cfi_endproc
.LFE1197:
.size _ZN5boost6detail7variant13forced_returnIvEET_v, .-_ZN5boost6detail7variant13forced_returnIvEET_v
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LCOLDE2:
.section .text._ZN5boost6detail7variant13forced_returnIvEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIvEET_v,comdat
.LHOTE2:
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LCOLDB3:
.section .text._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LHOTB3:
.p2align 4,,15
.weak _ZN5boost6detail7variant13forced_returnIiEET_v
.type _ZN5boost6detail7variant13forced_returnIiEET_v, #function
_ZN5boost6detail7variant13forced_returnIiEET_v:
.LFB9757:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__, %ecx
movl $39, %edx
movl $.LC0, %esi
movl $.LC1, %edi
call __assert_fail
.cfi_endproc
.LFE9757:
.size _ZN5boost6detail7variant13forced_returnIiEET_v, .-_ZN5boost6detail7variant13forced_returnIiEET_v
.section .text.unlikely._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LCOLDE3:
.section .text._ZN5boost6detail7variant13forced_returnIiEET_v,"axG",#progbits,_ZN5boost6detail7variant13forced_returnIiEET_v,comdat
.LHOTE3:
.section .text.unlikely,"ax",#progbits
.LCOLDB4:
.text
.LHOTB4:
.p2align 4,,15
.globl _Z3runN5boost7variantI3FooJ3BarEEE
.type _Z3runN5boost7variantI3FooJ3BarEEE, #function
_Z3runN5boost7variantI3FooJ3BarEEE:
.LFB9310:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl (%rdi), %eax
cltd
xorl %edx, %eax
cmpl $19, %eax
ja .L7
jmp *.L9(,%rax,8)
.section .rodata
.align 8
.align 4
.L9:
.quad .L30
.quad .L10
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.quad .L7
.text
.p2align 4,,10
.p2align 3
.L7:
call _ZN5boost6detail7variant13forced_returnIiEET_v
.p2align 4,,10
.p2align 3
.L30:
movl $100, %eax
.L8:
addq $8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L10:
.cfi_restore_state
movl $999, %eax
jmp .L8
.cfi_endproc
.LFE9310:
.size _Z3runN5boost7variantI3FooJ3BarEEE, .-_Z3runN5boost7variantI3FooJ3BarEEE
.section .text.unlikely
.LCOLDE4:
.text
.LHOTE4:
.globl _Z3runN5boost7variantI3FooI3BarEEE
.set _Z3runN5boost7variantI3FooI3BarEEE,_Z3runN5boost7variantI3FooJ3BarEEE
.section .text.unlikely
.LCOLDB5:
.section .text.startup,"ax",#progbits
.LHOTB5:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB9320:
.cfi_startproc
movl $100, %eax
ret
.cfi_endproc
.LFE9320:
.size main, .-main
.section .text.unlikely
.LCOLDE5:
.section .text.startup
.LHOTE5:
.section .rodata
.align 32
.type _ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__, #object
.size _ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__, 58
_ZZN5boost6detail7variant13forced_returnIvEET_vE19__PRETTY_FUNCTION__:
.string "T boost::detail::variant::forced_return() [with T = void]"
.align 32
.type _ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__, #object
.size _ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__, 57
_ZZN5boost6detail7variant13forced_returnIiEET_vE19__PRETTY_FUNCTION__:
.string "T boost::detail::variant::forced_return() [with T = int]"
.ident "GCC: (Ubuntu 5.3.0-3ubuntu1~14.04) 5.3.0 20151204"
.section .note.GNU-stack,"",#progbits
Stripping out the .cfi directives, unused labels, and comment lines is a solved problem: the scripts behind Matt Godbolt's compiler explorer are open source on its github project. It can even do colour highlighting to match source lines to asm lines (using the debug info).
You can set it up locally so you can feed it files that are part of your project with all the #include paths and so on (using -I/...). And so you can use it on private source code that you don't want to send out over the Internet.
Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” shows how to use it (it's pretty self-explanatory but has some neat features if you read the docs on github), and also how to read x86 asm, with a gentle introduction to x86 asm itself for total beginners, and to looking at compiler output. He goes on to show some neat compiler optimizations (e.g. for dividing by a constant), and what kind of functions give useful asm output for looking at optimized compiler output (function args, not int a = 123;).
On the Godbolt compiler explorer, it can be useful to use -g0 -fno-asynchronous-unwind-tables if you want to uncheck the filter option for directives, e.g. because you want to see the .section and .p2align stuff in the compiler output. The default is to add -g to your options to get the debug info it uses to colour-highlight matching source and asm lines, but this means .cfi directives for every stack operation, and .loc for every source line, among other things.
With plain gcc/clang (not g++), -fno-asynchronous-unwind-tables avoids .cfi directives. Possibly also useful: -fno-exceptions -fno-rtti -masm=intel. Make sure to omit -g.
Copy/paste this for local use:
g++ -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -fverbose-asm \
-Wall -Wextra foo.cpp -O3 -masm=intel -S -o- | less
Or -Os can be more readable, e.g. using div for division by non-power-of-2 constants instead of a multiplicative inverse even though that's a lot worse for performance and only a bit smaller, if at all.
But really, I'd recommend just using Godbolt directly (online or set it up locally)! You can quickly flip between versions of gcc and clang to see if old or new compilers do something dumb. (Or what ICC does, or even what MSVC does.) There's even ARM / ARM64 gcc 6.3, and various gcc for PowerPC, MIPS, AVR, MSP430. (It can be interesting to see what happens on a machine where int is wider than a register, or isn't 32-bit. Or on a RISC vs. x86).
For C instead of C++, you can use -xc -std=gnu11 to avoid flipping the language drop-down to C, which resets your source pane and compiler choices, and has a different set of compilers available.
Useful compiler options for making asm for human consumption:
Remember, your code only has to compile, not link: passing a pointer to an external function like void ext(void*p) is a good way to stop something from optimizing away. You only need a prototype for it, with no definition so the compiler can't inline it or make any assumptions about what it does. (Or inline asm like Benchmark::DoNotOptimize can force a compiler to materialize a value in a register, or forget about it being a known constant, if you know GNU C inline asm syntax well enough to use constraints to understand the effect you're having on what you're requiring of the compiler.)
I'd recommend using -O3 -Wall -Wextra -fverbose-asm -march=haswell for looking at code. (-fverbose-asm can just make the source look noisy, though, when all you get are numbered temporaries as names for the operands.) When you're fiddling with the source to see how it changes the asm, you definitely want compiler warnings enabled. You don't want to waste time scratching your head over the asm when the explanation is that you did something that deserves a warning in the source.
To see how the calling convention works, you often want to look at caller and callee without inlining.
You can use __attribute__((noipa)) foo_t foo(bar_t x) { ... } on a definition, or compile with gcc -O3 -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions to disable inlining. (But those command line options don't disable cloning a function for constant-propagation. noipa = no Inter-Procedural Analysis. It's even stronger than __attribute__((noinline,noclone)).) See From compiler perspective, how is reference for array dealt with, and, why passing by value(not decay) is not allowed? for an example.
Or if you just want to see how functions pass / receive args of different types, you could use different names but the same prototype so the compiler doesn't have a definition to inline. This works with any compiler. Without a definition, a function is just a black box to the optimizer, governed only by the calling convention / ABI.
-ffast-math will get many libm functions to inline, some to a single instruction (esp. with SSE4 available for roundsd). Some will inline with just -fno-math-errno, or other "safer" parts of -ffast-math, without the parts that allow the compiler to round differently. If you have FP code, definitely look at it with/without -ffast-math. If you can't safely enable any of -ffast-math in your regular build, maybe you'll get an idea for a safe change you can make in the source to allow the same optimization without -ffast-math.
-O3 -fno-tree-vectorize will optimize without auto-vectorizing, so you can get full optimization without if you want to compare with -O2 (which doesn't enable autovectorization on gcc11 and earlier, but does on all clang).
-Os (optimize for size and speed) can be helpful to keep the code more compact, which means less code to understand. clang's -Oz optimizes for size even when it hurts speed, even using push 1 / pop rax instead of mov eax, 1, so that's only interesting for code golf.
Even -Og (minimal optimization) might be what you want to look at, depending on your goals. -O0 is full of store/reload noise, which makes it harder to follow, unless you use register vars. The only upside is that each C statement compiles to a separate block of instructions, and it makes -fverbose-asm able to use the actual C var names.
clang unrolls loops by default, so -fno-unroll-loops can be useful in complex functions. You can get a sense of "what the compiler did" without having to wade through the unrolled loops. (gcc enables -funroll-loops with -fprofile-use, but not with -O3). (This is a suggestion for human-readable code, not for code that would run faster.)
Definitely enable some level of optimization, unless you specifically want to know what -O0 did. Its "predictable debug behaviour" requirement makes the compiler store/reload everything between every C statement, so you can modify C variables with a debugger and even "jump" to a different source line within the same function, and have execution continue as if you did that in the C source. -O0 output is so noisy with stores/reloads (and so slow) not just from lack of optimization, but forced de-optimization to support debugging. (also related).
To get a mix of source and asm, use gcc -Wa,-adhln -c -g foo.c | less to pass extra options to as. (More discussion of this in a blog post, and another blog.). Note that the output of this isn't valid assembler input, because the C source is there directly, not as an assembler comment. So don't call it a .s. A .lst might make sense if you want to save it to a file.
Godbolt's color highlighting serves a similar purpose, and is great at helping you see when multiple non-contiguous asm instructions come from the same source line. I haven't used that gcc listing command at all, so IDK how well it does, and how easy it is for the eye to see, in that case.
I like the high code density of godbolt's asm pane, so I don't think I'd like having source lines mixed in. At least not for simple functions. Maybe with a function that was too complex to get a handle on the overall structure of what the asm does...
And remember, when you want to just look at the asm, leave out the main() and the compile-time constants. You want to see the code for dealing with a function arg in a register, not for the code after constant-propagation turns it into return 42, or at least optimizes away some stuff.
Removing static and/or inline from functions will produce a stand-alone definition for them, as well as a definition for any callers, so you can just look at that.
Don't put your code in a function called main(). gcc knows that main is special and assumes it will only be called once, so it marks it as "cold" and optimizes it less.
The other thing you can do: If you did make a main(), you can run it and use a debugger. stepi (si) steps by instruction. See the bottom of the x86 tag wiki for instructions. But remember that code might optimize away after inlining into main with compile-time-constant args.
__attribute__((noinline)) may help, on a function that you want to not be inlined. gcc will also make constant-propagation clones of functions, i.e. a special version with one of the args as a constant, for call-sites that know they're passing a constant. The symbol name will be .clone.foo.constprop_1234 or something in the asm output. You can use __attribute__((noclone)) to disable that, too.).
For example
If you want to see how the compiler multiplies two integers: I put the following code on the Godbolt compiler explorer to get the asm (from gcc -O3 -march=haswell -fverbose-asm) for the wrong way and the right way to test this.
// the wrong way, which people often write when they're used to creating a runnable test-case with a main() and a printf
// or worse, people will actually look at the asm for such a main()
int constants() { int a = 10, b = 20; return a * b; }
mov eax, 200 #,
ret # compiles the same as return 200; not interesting
// the right way: compiler doesn't know anything about the inputs
// so we get asm like what would happen when this inlines into a bigger function.
int variables(int a, int b) { return a * b; }
mov eax, edi # D.2345, a
imul eax, esi # D.2345, b
ret
(This mix of asm and C was hand-crafted by copy-pasting the asm output from godbolt into the right place. I find it's a good way to show how a short function compiles in SO answers / compiler bug reports / emails.)
You can always look at the generated assembly from the object file, instead of using the compilers assembly output. objdump comes to mind.
You can even tell objdump to intermix source with assembly, making it easier to figure out what source line corresponds to what instructions. Example session:
$ cat test.cc
int foo(int arg)
{
return arg * 42;
}
$ g++ -g -O3 -std=c++14 -c test.cc -o test.o && objdump -dS -M intel test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z3fooi>:
int foo(int arg)
{
return arg + 1;
0: 8d 47 01 lea eax,[rdi+0x1]
}
3: c3 ret
Explanation of objdump flags:
-d disassembles all executable sections
-S intermixes assembly with source (-g required while compiling with g++)
-M intel choses intel syntax over ugly AT&T syntax (optional)
I like to insert labels that I can easily grep out of the objdump output.
int main() {
asm volatile ("interesting_part_begin%=:":);
do_something();
asm volatile ("interesting_part_end%=:":);
}
I haven't had a problem with this yet, but asm volatile can be very hard on a compiler's optimizer because it tends to leave such code untouched.

assembly output of a simple C++ program

I am trying to understand the assembly output of a simple c++ program. This is my C++ program.
void func()
{}
int main()
{
func();
}
when I use g++ with --save-temps option to get the assembly code for the above program I get the following assembly code.
.file "main.cpp"
.text
.globl _Z4funcv
.type _Z4funcv, #function
_Z4funcv:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size _Z4funcv, .-_Z4funcv
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
call _Z4funcv
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
According to my knowledge on assembly there should be 3 sections of any assembly program which are data, text and bss. Also text section should start with 'global _start'. I can't see any of them in this assembly code.
Can someone please help me to understand the above assembly code. If you can relate to C++ code as well, It would be great.
Any kind of help is greatly appreciated.
Well, here it is line by line...
.file "main.cpp" # Debugging info (not essential)
.text # Start of text section (i.e. your code)
.globl _Z4funcv # Let the function _Z4funcv be callable
# from outside (e.g. from your main routine)
.type _Z4funcv, #function # Debugging info (possibly not essential)
_Z4funcv: # _Z4funcv is effectively the "name" of your
# function (C++ "mangles" the name; exactly
# how depends on your compiler -- Google "C++
# name mangling" for more).
.LFB0: # Debugging info (possibly not essential)
.cfi_startproc # Provides additional debug info (ditto)
pushq %rbp # Store base pointer of caller function
# (standard function prologue -- Google
# "calling convention" or "cdecl")
.cfi_def_cfa_offset 16 # Provides additional debug info (ditto)
.cfi_offset 6, -16 # Provides additional debug info (ditto)
movq %rsp, %rbp # Reset base pointer to a sensible place
# for this function to put its local
# variables (if any). Standard function
# prologue.
.cfi_def_cfa_register 6 # Debug ...
popq %rbp # Restore the caller's base pointer
# Standard function epilogue
.cfi_def_cfa 7, 8 # Debug...
ret # Return from function
.cfi_endproc # Debug...
.LFE0: # Debug...
.size _Z4funcv, .-_Z4funcv # Debug...
.globl main # Declares that the main function
# is callable from outside
.type main, #function # Debug...
main: # Your main routine (name not mangled)
.LFB1: # Debug...
.cfi_startproc # Debug...
pushq %rbp # Store caller's base pointer
# (standard prologue)
.cfi_def_cfa_offset 16 # Debug...
.cfi_offset 6, -16 # Debug...
movq %rsp, %rbp # Reset base pointer
# (standard prologue)
.cfi_def_cfa_register 6 # Debug...
call _Z4funcv # Call `func` (note name mangled)
movl $0, %eax # Put `0` in eax (eax is return value)
popq %rbp # Restore caller's base pointer
# (standard epilogue)
.cfi_def_cfa 7, 8 # Debug...
ret # Return from main function
.cfi_endproc # Debug...
.LFE1:
.size main, .-main # Debug...
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2" # fluff
.section .note.GNU-stack,"",#progbits # fluff
The linker knows to look for main (and not start) if it is using the standard C or C++ library (which it usually is, unless you tell it otherwise). It links some stub code (which contains start) into the final executable.
So, really, the only important bits are...
.text
.globl _Z4funcv
_Z4funcv:
pushq %rbp
movq %rsp, %rbp
popq %rbp
ret
.globl main
main:
pushq %rbp
movq %rsp, %rbp
call _Z4funcv
movl $0, %eax
popq %rbp
ret
If you want to start from scratch, and not have all the complicated standard library stuff getting in the way of your discovery, you can do something like this and achieve the same result as your C++ code:
.text
.globl _func
_func: # Just as above, really
push %ebp
mov %esp, %ebp
pop %ebp
ret
.globl _start
_start: # A few changes here
push %ebp
mov %esp, %ebp
call _func
movl $1, %eax # Invoke the Linux 'exit' syscall
movl $0, %ebx # With a return value of 0 (pick any char!)
int $0x80 # Actual invocation
The exit syscall is a bit painful, but necessary. If you don't have it, it tries to keep going and run the code that is "past" your code. As that could be important code or data, the machine should stop you with a Segmentation Fault error. Having the exit call avoids all this. If you are using the standard library (as will happen automatically in your C++ example) the exit stuff is taken care of by the linker.
Compile with gcc -nostdlib -o test test.s (noting that gcc is specifically told not to use the standard library). I should say that this is for a 32-bit system, and quite likely will not work on 64-bit. I don't have a 64-bit system to test on, but perhaps some helpful StackOverflower will chip in with a 64-bit translation.

Endless loop in C/C++ [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
There are several possibilities to do an endless loop, here are a few I would choose:
for(;;) {}
while(1) {} / while(true) {}
do {} while(1) / do {} while(true)
Is there a certain form which one should choose? And do modern compilers make a difference between the middle and the last statement or does it realize that it is an endless loop and skips the checking part entirely?
Edit: as it has been mentioned I forgot goto, but this was done out of the reason that I don't like it as a command at all.
Edit2: I made some grep on the latest versions taken from kernel.org. I does seems as nothing much changed over time (within the Kernel at least)
The problem with asking this question is that you'll get so many subjective answers that simply state "I prefer this...". Instead of making such pointless statements, I'll try to answer this question with facts and references, rather than personal opinions.
Through experience, we can probably start by excluding the do-while alternatives (and the goto), as they are not commonly used. I can't recall ever seeing them in live production code, written by professionals.
The while(1), while(true) and for(;;) are the 3 different versions commonly existing in real code. They are of course completely equivalent and results in the same machine code.
for(;;)
This is the original, canonical example of an eternal loop. In the ancient C bible The C Programming Language by Kernighan and Ritchie, we can read that:
K&R 2nd ed 3.5:
for (;;) {
...
}
is an "infinite" loop, presumably to be broken by other means, such
as a break or return. Whether to use while or for is largely a matter
of personal preference.
For a long while (but not forever), this book was regarded as canon and the very definition of the C language. Since K&R decided to show an example of for(;;), this would have been regarded as the most correct form at least up until the C standardization in 1990.
However, K&R themselves already stated that it was a matter of preference.
And today, K&R is a very questionable source to use as a canonical C reference. Not only is it outdated several times over (not addressing C99 nor C11), it also preaches programming practices that are often regarded as bad or blatantly dangerous in modern C programming.
But despite K&R being a questionable source, this historical aspect seems to be the strongest argument in favour of the for(;;).
The argument against the for(;;) loop is that it is somewhat obscure and unreadable. To understand what the code does, you must know the following rule from the standard:
ISO 9899:2011 6.8.5.3:
for ( clause-1 ; expression-2 ; expression-3 ) statement
/--/
Both clause-1 and expression-3 can be omitted. An omitted expression-2
is replaced by a nonzero constant.
Based on this text from the standard, I think most will agree that it is not only obscure, it is subtle as well, since the 1st and 3rd part of the for loop are treated differently than the 2nd, when omitted.
while(1)
This is supposedly a more readable form than for(;;). However, it relies on another obscure, although well-known rule, namely that C treats all non-zero expressions as boolean logical true. Every C programmer is aware of that, so it is not likely a big issue.
There is one big, practical problem with this form, namely that compilers tend to give a warning for it: "condition is always true" or similar. That is a good warning, of a kind which you really don't want to disable, because it is useful for finding various bugs. For example a bug such as while(i = 1), when the programmer intended to write while(i == 1).
Also, external static code analysers are likely to whine about "condition is always true".
while(true)
To make while(1) even more readable, some use while(true) instead. The consensus among programmers seem to be that this is the most readable form.
However, this form has the same problem as while(1), as described above: "condition is always true" warnings.
When it comes to C, this form has another disadvantage, namely that it uses the macro true from stdbool.h. So in order to make this compile, we need to include a header file, which may or may not be inconvenient. In C++ this isn't an issue, since bool exists as a primitive data type and true is a language keyword.
Yet another disadvantage of this form is that it uses the C99 bool type, which is only available on modern compilers and not backwards compatible. Again, this is only an issue in C and not in C++.
So which form to use? Neither seems perfect. It is, as K&R already said back in the dark ages, a matter of personal preference.
Personally, I always use for(;;) just to avoid the compiler/analyser warnings frequently generated by the other forms. But perhaps more importantly because of this:
If even a C beginner knows that for(;;) means an eternal loop, then who are you trying to make the code more readable for?
I guess that's what it all really boils down to. If you find yourself trying to make your source code readable for non-programmers, who don't even know the fundamental parts of the programming language, then you are only wasting time. They should not be reading your code.
And since everyone who should be reading your code already knows what for(;;) means, there is no point in making it further readable - it is already as readable as it gets.
It is very subjective. I write this:
while(true) {} //in C++
Because its intent is very much clear and it is also readable: you look at it and you know infinite loop is intended.
One might say for(;;) is also clear. But I would argue that because of its convoluted syntax, this option requires extra knowledge to reach the conclusion that it is an infinite loop, hence it is relatively less clear. I would even say there are more number of programmers who don't know what for(;;) does (even if they know usual for loop), but almost all programmers who knows while loop would immediately figure out what while(true) does.
To me, writing for(;;) to mean infinite loop, is like writing while() to mean infinite loop — while the former works, the latter does NOT. In the former case, empty condition turns out to be true implicitly, but in the latter case, it is an error! I personally didn't like it.
Now while(1) is also there in the competition. I would ask: why while(1)? Why not while(2), while(3) or while(0.1)? Well, whatever you write, you actually mean while(true) — if so, then why not write it instead?
In C (if I ever write), I would probably write this:
while(1) {} //in C
While while(2), while(3) and while(0.1) would equally make sense. But just to be conformant with other C programmers, I would write while(1), because lots of C programmers write this and I find no reason to deviate from the norm.
In an ultimate act of boredom, I actually wrote a few versions of these loops and compiled it with GCC on my mac mini.
the
while(1){} and for(;;) {}
produced same assembly results
while the do{} while(1); produced similar but a different assembly code
heres the one for while/for loop
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
jmp LBB0_1
.cfi_endproc
and the do while loop
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl $0, -4(%rbp)
LBB0_1: ## =>This Inner Loop Header: Depth=1
jmp LBB0_2
LBB0_2: ## in Loop: Header=BB0_1 Depth=1
movb $1, %al
testb $1, %al
jne LBB0_1
jmp LBB0_3
LBB0_3:
movl $0, %eax
popq %rbp
ret
.cfi_endproc
Everyone seems to like while (true):
https://stackoverflow.com/a/224142/1508519
https://stackoverflow.com/a/1401169/1508519
https://stackoverflow.com/a/1401165/1508519
https://stackoverflow.com/a/1401164/1508519
https://stackoverflow.com/a/1401176/1508519
According to SLaks, they compile identically.
Ben Zotto also says it doesn't matter:
It's not faster.
If you really care, compile with assembler output for your platform and look to see.
It doesn't matter. This never matters. Write your infinite loops however you like.
In response to user1216838, here's my attempt to reproduce his results.
Here's my machine:
cat /etc/*-release
CentOS release 6.4 (Final)
gcc version:
Target: x86_64-unknown-linux-gnu
Thread model: posix
gcc version 4.8.2 (GCC)
And test files:
// testing.cpp
#include <iostream>
int main() {
do { break; } while(1);
}
// testing2.cpp
#include <iostream>
int main() {
while(1) { break; }
}
// testing3.cpp
#include <iostream>
int main() {
while(true) { break; }
}
The commands:
gcc -S -o test1.asm testing.cpp
gcc -S -o test2.asm testing2.cpp
gcc -S -o test3.asm testing3.cpp
cmp test1.asm test2.asm
The only difference is the first line, aka the filename.
test1.asm test2.asm differ: byte 16, line 1
Output:
.file "testing2.cpp"
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.text
.globl main
.type main, #function
main:
.LFB969:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
nop
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE969:
.size main, .-main
.type _Z41__static_initialization_and_destruction_0ii, #function
_Z41__static_initialization_and_destruction_0ii:
.LFB970:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
cmpl $1, -4(%rbp)
jne .L3
cmpl $65535, -8(%rbp)
jne .L3
movl $_ZStL8__ioinit, %edi
call _ZNSt8ios_base4InitC1Ev
movl $__dso_handle, %edx
movl $_ZStL8__ioinit, %esi
movl $_ZNSt8ios_base4InitD1Ev, %edi
call __cxa_atexit
.L3:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE970:
.size _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
.type _GLOBAL__sub_I_main, #function
_GLOBAL__sub_I_main:
.LFB971:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $65535, %esi
movl $1, %edi
call _Z41__static_initialization_and_destruction_0ii
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE971:
.size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.section .ctors,"aw",#progbits
.align 8
.quad _GLOBAL__sub_I_main
.hidden __dso_handle
.ident "GCC: (GNU) 4.8.2"
.section .note.GNU-stack,"",#progbits
With -O3, the output is considerably smaller of course, but still no difference.
The idiom designed into the C language (and inherited into C++) for infinite looping is for(;;): the omission of a test form. The do/while and while loops do not have this special feature; their test expressions are mandatory.
for(;;) does not express "loop while some condition is true that happens to always be true". It expresses "loop endlessly". No superfluous condition is present.
Therefore, the for(;;) construct is the canonical endless loop. This is a fact.
All that is left to opinion is whether or not to write the canonical endless loop, or to choose something baroque which involves extra identifiers and constants, to build a superfluous expression.
Even if the test expression of while were optional, which it isn't, while(); would be strange. while what? By contrast, the answer to the question for what? is: why, ever---for ever! As a joke some programmers of days past have defined blank macros, so they could write for(ev;e;r);.
while(true) is superior to while(1) because at least it doesn't involve the kludge that 1 represents truth. However, while(true) didn't enter into C until C99. for(;;) exists in every version of C going back to the language described in the 1978 book K&R1, and in every dialect of C++, and even related languages. If you're coding in a code base written in C90, you have to define your own true for while (true).
while(true) reads badly. While what is true? We don't really want to see the identifier true in code, except when we are initializing boolean variables or assigning to them. true need not ever appear in conditional tests. Good coding style avoids cruft like this:
if (condition == true) ...
in favor of:
if (condition) ...
For this reason while (0 == 0) is superior to while (true): it uses an actual condition that tests something, which turns into a sentence: "loop while zero is equal to zero." We need a predicate to go nicely with "while"; the word "true" isn't a predicate, but the relational operator == is.
I use for(;/*ever*/;).
It is easy to read and it takes a bit longer to type (due to the shifts for the asterisks), indicating I should be really careful when using this type of loop. The green text that shows up in the conditional is also a pretty odd sight—another indication this construct is frowned upon unless absolutely necessary.
They probably compile down to nearly the same machine code, so it is a matter of taste.
Personally, I would chose the one that is the clearest (i.e. very clear that it is supposed to be an infinite loop).
I would lean towards while(true){}.
I would recommend while (1) { } or while (true) { }. It's what most programmers would write, and for readability reasons you should follow the common idioms.
(Ok, so there is an obvious "citation needed" for the claim about most programmers. But from the code I've seen, in C since 1984, I believe it is true.)
Any reasonable compiler would compile all of them to the same code, with an unconditional jump, but I wouldn't be surprised if there are some unreasonable compilers out there, for embedded or other specialized systems.
Is there a certain form which one should choose?
You can choose either. Its matter of choice. All are equivalent. while(1) {}/while(true){} is frequently used for infinite loop by programmers.
Well, there is a lot of taste in this one.
I think people from a C background are more likely to prefer for(;;), which reads as "forever". If its for work, do what the locals do, if its for yourself, do the one that you can most easily read.
But in my experience, do { } while (1); is almost never used.
All are going to perform same function, and it is true to choose what you prefer..
i might think "while(1) or while(true)" is good practice to use.
They are the same. But I suggest "while(ture)" which has best representation.

C++: doubles, precision, virtual machines and GCC

I have the following piece of code:
#include <cstdio>
int main()
{
if ((1.0 + 0.1) != (1.0 + 0.1))
printf("not equal\n");
else
printf("equal\n");
return 0;
}
When compiled with O3 using gcc (4.4,4.5 and 4.6) and run natively (ubuntu 10.10), it prints the expected result of "equal".
However the same code when compiled as described above and run on a virtual machine (ubuntu 10.10, virtualbox image), it outputs "not equal" - this is the case for when the O3 and O2 flags are set however not O1 and below. When compiled with clang (O3 and O2) and run upon the virtual machine I get the correct result.
I understand 1.1 can't be correctly represented using double and I've read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" so please don't point me there, this seems to be some kind of optimisation that GCC does that somehow doesn't seem to work in virtual machines.
Any ideas?
Note: The C++ standard says type promotion in this situations is implementation dependent, could it be that GCC is using a more precise internal representation that when the inequality test is applied holds true - due to the extra precision?
UPDATE1: The following modification of the above piece of code, now results in the correct result. It seems at some point, for whatever reason, GCC turns off floating point control word.
#include <cstdio>
void set_dpfpu() { unsigned int mode = 0x27F; asm ("fldcw %0" : : "m" (*&mode));
int main()
{
set_dpfpu();
if ((1.0 + 0.1) != (1.0 + 0.1))
printf("not equal\n");
else
printf("equal\n");
return 0;
}
UPDATE2: For those asking about the const expression nature of the code, I've changed it as follows, and still fails when compiled with GCC. - but i assume the optimizer may be turning the following into a const expression too.
#include <cstdio>
void set_dpfpu() { unsigned int mode = 0x27F; asm ("fldcw %0" : : "m" (*&mode));
int main()
{
//set_dpfpu(); uncomment to make it work.
double d1 = 1.0;
double d2 = 1.0;
if ((d1 + 0.1) != (d2 + 0.1))
printf("not equal\n");
else
printf("equal\n");
return 0;
}
UPDATE3 Resolution: Upgrading virtualbox to version 4.1.8r75467 resolved the issue. However the their remains one issue, that is: why was the clang build working.
UPDATE: See this posting How to deal with excess precision in floating-point computations?
It address the issues of extended floating point precision. I forgot about the extended precision in x86. I remember a simulation that should have been deterministic, but gave different results on Intel CPUs than on PowePC CPUs. The causes was Intel's extended precision architecture.
This Web page talks about how to throw Intel CPUs into double-precision rounding mode: http://www.network-theory.co.uk/docs/gccintro/gccintro_70.html.
Does virtualbox guarantee that its floating point operations are identical to the the hardware's floating point operations? I could not find a guarantee like that with a quick Google search. I also did not find a promise that vituralbox FP ops conform to IEEE 754.
VMs are emulators that try-- and mostly succeed-- to the emulate a particular instruction set or architecture. They are just emulators, however, and subject to their own implementation quirks or design issues.
If you haven't already, post the question forums.virtualbox.org and see what the community says about it.
Yep that is really strange behavior, but it can actually easily be explained:
On x86 floating point registers internally use more precision (eg 80 instead of 64). This means the computation 1.0 + 0.1 will be computed with more precision (and since 1.1 can't be represented exactly in binary at all those extra bits WILL be used) in the registers. Only when storing the result to memory will it be truncated.
What this means is simple: If you compare a value loaded from memory with a value newly computed in the registers you'll get a "non-equal" back, because one value was truncated while the other wasn't. So that has nothing to do with VM/no VM it just depends on the code the compiler generates which can easily fluctuate as we see there.
Add it to the growing list of floating point surprises..
I can confirm the same behaviour of your non-VM code, but since I don't have a VM I haven't tested the VM part.
However, the compiler, both Clang and GCC will evaluate the constant expression at compile time. See the assembly output below (using gcc -O0 test.cpp -S) :
.file "test.cpp"
.section .rodata
.LC0:
.string "equal"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
.section .note.GNU-stack,"",#progbits
It looks like you understand assembly, but it's clear that there is only the "equal" string, there is no "not equal". So the comparison is not even done at run time, it just prints "equal".
I would try to code the calculation and comparison using assembly and see if you have the same behavior. If you have different behavior on the VM, then it's the way the VM does the calculation.
UPDATE 1: (Based on the "UPDATE 2" in the original question). Below is the gcc -O0 -S test.cpp output assembly (for 64 bit architecture). In it you can see the movabsq $4607182418800017408, %rax line twice. This will be for the two comparison flags, I haven't verified, but I presume the $4607182418800017408 value is 1.1 in floating point terms. It would be interesting to compile this on the VM, if you get the same result (two similar lines) then the VM will be doing something funny at run-time, otherwise it's a combination of VM and compiler.
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movabsq $4607182418800017408, %rax
movq %rax, -16(%rbp)
movabsq $4607182418800017408, %rax
movq %rax, -8(%rbp)
movsd -16(%rbp), %xmm1
movsd .LC1(%rip), %xmm0
addsd %xmm1, %xmm0
movsd -8(%rbp), %xmm2
movsd .LC1(%rip), %xmm1
addsd %xmm2, %xmm1
ucomisd %xmm1, %xmm0
jp .L6
ucomisd %xmm1, %xmm0
je .L7
I see you added another question:
Note: The C++ standard says type promotion in this situations is implementation dependent, could it be that GCC is using a more precise internal representation that when the inequality test is applied holds true - due to the extra precision?
The answer to that one is no. 1.1 is not exactly representable in a binary format, no matter how many bits the format has. You can get close, but not with an infinite number of zeros after the .1.
Or did you mean an entirely new internal format for decimals? No, I refuse to believe that. It wouldn't be very compatible if it did.

Questions re: assembly generated from my C++ by gcc

Compiling this code:
int main ()
{
return 0;
}
using:
gcc -S filename.cpp
...generates this assembly:
.file "heloworld.cpp"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl $0, %eax
popl %ebp
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
My questions:
Is everything after "." a comment?
What is .LFB0:?
What is .LFE0:?
Why is it so big code only for "int main ()" and "return 0;"?
P.S. I read alot of assembly net books, a lot (at least 30) of tutorials and all I can do is copy code and paste it or rewrite it. Now I'm trying a different approach to try to learn it somehow. The problem is I do understand what are movl, pop, etc, but don't understand how to combine these things to make code "flow". I don't know where or how to correctly start writing a program in asm is. I'm still static not dynamic as in C++ but I want to learn assembly.
As other have said, .file, .text, ... are assembler directives and .LFB0, .LFE0 are local labels. The only instruction in the generated code are:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
popl %ebp
ret
The first two instruction are the function prologue. The frame pointer is stored on the stack and updated. The next intruction store 0 in eax register (i386 ABI states that integer return value are returned via the eax register). The two last instructions are function epilogue. The frame pointer is restored, and then the function return to its caller via the ret instruction.
If you compile your code with -O3 -fomit-frame-pointer, the code will be compiled to just two instructions:
xorl %eax,%eax
ret
The first set eax to 0 (it only takes two bytes to encode, while movl 0,%eax take 5 bytes), and the second is the ret instruction. The frame pointer manipulation is there to ease debugging (it is possible to get backtrace without it, but it is more difficult).
.file, .text, etc are assembler directives.
.LFB0, .LFE0 are local labels, which are normally used as branch destinations within a function.
As for the size, there are really only a few actual instructions - most of the above listing consists of directives, etc. For future reference you might also want to turn up the optimisation level to remove otherwise redudant instructions, i.e. gcc -Wall -O3 -S ....
It's just that there's a lot going on behind your simple program.
If you intend to read assembler outputs, by no means compile C++. Use plain C, the output is far clearer for a number of reasons.