I am optimizing some hotspots in my application and compilation is done using gcc-arm.
Now, is there any chance that the following statements result in different assembler code:
static const pixel_t roundedwhite = 4294572537U;
return (packed >= roundedwhite) ? purewhite : packed;
// OR
const pixel_t roundedwhite = 4294572537U;
return (packed >= roundedwhite) ? purewhite : packed;
// OR
return (packed >= 4294572537U) ? purewhite : packed;
Is there any chance that my ARM compiler might produce the unwanted code for the first case or should this get optimized anyway?
I assume that it's pretty the same, but, unfortunately, I am not that sure in what gcc-arm does compared to ordinary gcc and I can't access the disassembly listing.
Thank you very much.
Call gcc with the -S flag and take a look at the assembly:
-S
Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.
I would it out try myself to include in the answer, but I don't have an ARM compiler handy.
One difference is surely that the first version, with static will use up some memory, even if it the value will get inlined in the expression. This would make sense if you want to calculate a more complex expression once and then store the result, but for this simple constant the static is unnecessary. That said, the compiler will very likely inline the value, as this is a very simple optimization and there is no reason for it not to.
Related
Say I have an assert() something like
assert( x < limit );
I took a look at the behaviour of the optimiser in GDC in release and debug builds with the following snippet of code:
uint cxx1( uint x )
{
assert( x < 10 );
return x % 10;
}
uint cxx1a( uint x )
in { assert( x < 10 ); }
body
{
return x % 10;
}
uint cxx2( uint x )
{
if ( !( x < 10 ))
assert(0);
return x % 10;
}
Now when I build in debug mode, the asserts have the very pleasing effect of triggering huge optimisation. GDC gets rid of the horrid code to do the modulo operation entirely, because of its knowledge about the possible range of x due to the assert’s if-condition. But in release mode, the if-condition is discarded, so all of a sudden, the horrid code comes back, and there is no longer any optimisation in cxx1() nor even in cxx1a(). This is very ironic, that release mode generates far worse code than debug code. Of course, no-one wants executable code belonging to the if-tests to be present in release code as we must lose all that overhead.
Now ideally, I would want to express the condition in the sense of communicating information to the compiler, regardless of release / debug builds, about conditions that may always be assumed to be true, and so such assumptions can guide optimisation in very powerful ways.
I believe some C++ compilers have something called __assume() or some such, but memory fails me here. GCC has a __builtin_unreachable() special directive which might be useable to build an assume() feature. Basically if I could build my own assume() directive it would have the effect of asserting certain truths about known values or known ranges and exposing / publishing these to optimisation passes regardless of release / debug mode but without generating any actual code at all for the assume() condition in a release build, while in debug mode it would be exactly the same as assert().
I tried an experiment which you see in cxx2 which triggers optimisation always, so good job there, but it generates what is morally debug code for the assume()'s if-condition even in release mode with a test and a conditional jump to an undefined instruction in order to halt the process.
Does anyone have any ideas about whether this is solvable? Or do you think this is a useful D compiler fantasy wish-list item?
As far as I know __builtin_unreachable is the next best replacement for an assume like function in GCC. In some cases the if condition might still not get optimized out though: "Assume" clause in gcc
The GCC builtins are available in GDC by importing gcc.builtins. Here's an example how to wrap the __builtin_unreachable function:
import gcc.builtins;
void assume()(bool condition)
{
if (!condition)
__builtin_unreachable();
}
bool foo(int a)
{
assume(a > 10);
return a > 10;
}
There are two interesting details here:
We don't need string mixins or similarily complicated stuff. As long as you compile with -O GDC will completely optimize the function call anyway.
For this to work the assume function must get inlined. Unfortunately inlining normal functions is not completely supported when assume is in a different module as the calling function. As a workaround we use a template with 0 template arguments. This should make sure inlining can always work.
You can test and modify this example here:
explore.dgnu.org
Now we (GDC developers) could easily rewrite assert(...) to if(...) __builtin_unreachable() in release mode. But this could break some code so dmd should implement this first.
OK, I really dont know what you want? cxx2 is solution
some more info
I want to avoid one system function executing in a large project. It is impossible to redefine it or add some ifdef logic. So I want to patch the code to just the ret operation.
The functions are:
void __cdecl _wassert(const wchar_t *, const wchar_t *, unsigned);
and:
void __dj_assert(const char *, const char *, int, const char *) __attribute__((__noreturn__));
So I need to patch the first one on Visual C++ compiler, and the second one on GCC compiler.
Can I just write the ret instruction directly at the address of the _wassert/__dj_assert function, for x86/x64?
UPDATE:
I just wanna modify function body like this:
*_wassert = `ret`;
Or maybe copy another function body like this:
void __cdecl _wassert_emptyhar_t *, const wchar_t *, unsigned)
{
}
for (int i = 0; i < sizeof(void*); i++) {
((char*)_wassert)[i] = ((char*)_wassert_empty
}
UPDATE 2:
I really don't understand why there are so many objections against silent asserts. In fact, there is no asserts in the RELEASE mode, but nobody cares. I just want to be able turning on/off the asserts in the DEBUG mode.
You need to understand the calling conventions for your particular processor ISA and system ABI. See this for x86 & x86-64 calling conventions.
Some calling conventions require more than a single ret machine instruction in the epilogue, and you have to count with that. BTW, code of some function usually resides in a read-only code segment, and you'll need some dirty tricks to patch it and write inside it.
You could compile a no-op function of the same signature, and ask the compiler to show the emitted assembler code (e.g. with gcc -O -Wall -fverbose-asm -S if using GCC....)
On Linux you might use dynamic linker LD_PRELOAD tricks. If using a recent GCC you might perhaps consider customizing it with MELT, but I don't think it is worthwhile in your particular case...
However, you apparently have some assert failure. It is very unlikely that your program could continue without any undefined behavior. So practically speaking, your program will very likely crash elsewhere with your proposed "fix", and you'll lose more of your time with it.
Better take enough time to correct the original bug, and improve your development process. Your way is postponing a critical bug correction, and you are extremely likely to spend more time avoiding that bug fix than dealing with it properly (and finding it now, not later) as you should. Avoid increasing your technical debt and making your code base even more buggy and rotten.
My feeling is that you are going nowhere (except to a big failure) with your approach of patching the binary to avoid assert-s. You should find out why there are violated, and improve the code (either remove the obsolete assert, or improve it, or correct the bug elsewhere that assert has detected).
On Gnu/Linux you can use the --wrapoption like this:
gcc source.c -Wl,--wrap,functionToPatch -o prog
and your source must add the wrapper function:
void *__wrap_functionToPatch () {} // simply returns
Parameters and return values as needed for your function.
I have
const int MAX_CONNECTIONS = 500;
//...
if(clients.size() < MAX_CONNECTIONS) {
//...
}
I'm trying to find the "right" choice for MAX_CONNECTIONS. So I fire up gdb and set MAX_CONNECTIONS = 750. But it seems my code isn't responding to this change. I wonder if it's because the const int was resolved at compile time even though it wound up getting bumped at runtime. Does this sound right, and, using GDB is there any way I can bypass this effect without having to edit the code in my program? It takes a while just to warm up to 500.
I suspect that the compiler, seeing that the variable is const, is inlining the constant into the assembly and not having the generated code actually read the value of the MAX_CONNECTIONS variable. The C++ spec is worded in a way where if a variable of primitive type is explicitly marked const, the compiler can make certain assumptions about it for the purposes of optimization, since any attempt to change that constant is either (1) illegal or (2) results in undefined behavior.
If you want to use GDB to do things like this, consider marking the variable volatile rather than const to indicate to the compiler that it shouldn't optimize it. Alternatively, have this information controlled by some other data source (say, a configuration option inside a file) so that you aren't blasting the program's memory out from underneath it in order to change the value.
Hope this helps!
By telling it it's const, you're telling the compiler it has freedom to not load the value, but to build it directly into the code when possible. An allocated copy may still exist for those times when the particular instructions chosen need to load a value rather than having an immediate value, or it could be omitted by the compiler as well. That's a bit of a loose answer short on standardese, but that's the basic idea.
As this post is quite old, my answer is more like a reference to my future self. Assuming you compiled in debug mode, running the following expression in the debugger (lldb in my case) works:
const_cast<int&>(MAX_CONNECTIONS) = 750
In case you have to change the constant often, e.g. in a loop, set a breakpoint and evaluate the expression each time the breakpoint is hit
breakpoint set <location>
breakpoint command add <breakpoint_id>
const_cast<int&>(MAX_CONNECTIONS) = 750
DONE
In my ongoing experimentation with GCC inline assembly, I've run into a new problem regarding labels and inlined code.
Consider the following simple jump:
__asm__
(
"jmp out;"
"out:;"
:
:
);
This does nothing except jump to the out label. As is, this code compiles fine. But if you place it inside a function, and then compile with optimization flags, the compiler complains: "Error: symbol 'out' is already defined".
What seems to be happening is that the compiler is repeating this assembly code every time it inlines the function. This causes the label out to get duplicated, leading to multiple out labels.
So, how do I work around this? Is it really not possible to use labels in inline assembly? This tutorial on GCC inline assembly mentions that:
Thus, you can make put your assembly
into CPP macros, and inline C
functions, so anyone can use it in as
any C function/macro. Inline functions
resemble macros very much, but are
sometimes cleaner to use. Beware that
in all those cases, code will be
duplicated, so only local labels (of
1: style) should be defined in that
asm code.
I tried to find more information about these "local labels", but can't seem to find anything relating to inline assembly. It looks like the tutorial is saying that a local label is a number followed by a colon, (like 1:), so I tried using a label like that. Interestingly, the code compiled, but at run time it simply triggered a Segmentation Fault. Hmm...
So any suggestions, hints, answers...?
A declaration of a local label is indeed a number followed by a colon. But a reference to a local label needs a suffix of f or b, depending on whether you want to look forwards or backwards - i.e. 1f refers to the next 1: label in the forwards direction.
So declaring the label as 1: is correct; but to reference it, you need to say jmp 1f (because you are jumping forwards in this case).
Well, this question isn't getting any younger, but there are two other interesting solutions.
1) This example uses %=. %= in an assembler template is replaced with a number that is "unique to each insn in the entire compilation. This is useful for making local labels that are referred to more than once in a given insn." Note that to use %=, you (apparently) must have at least one input (although you probably don't have to actually use it).
int a = 3;
asm (
"test %0\n\t"
"jnz to_here%=\n\t"
"jz to_there%=\n\t"
"to_here%=:\n\t"
"to_there%=:"
::"r" (a));
This outputs:
test %eax
jnz to_here14
jz to_there14
to_here14:
to_there14:
Alternately, you can use the asm goto (Added in v4.5 I think). This actually lets you jump to c labels instead of just asm labels:
asm goto ("jmp %l0\n"
: /* no output */
: /* no input */
: /* no clobber */
: gofurther);
printf("Didn't jump\n");
// c label:
gofurther:
printf("Jumped\n");
I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.