I'm trying to figure out how to perform all optimizations on an LLVM Module (e.g., all -O3 optimizations). I've tried the following but I'm not sure that all possible optimizations are being applied (e.g., inlining).
//take string "llvm" (LLVM IR) and return "output_llvm" (optimized LLVM IR)
static string optimize(string llvm) {
LLVMContext &ctx = getGlobalContext();
SMDiagnostic err;
Module *ir = ParseIR(MemoryBuffer::getMemBuffer(llvm), err, ctx);
PassManager *pm = new PassManager();
PassManagerBuilder builder;
builder.OptLevel = 3;
builder.populateModulePassManager(*pm);
pm->run(*ir);
delete pm;
string output_llvm;
raw_string_ostream buff(output_llvm);
ir->print(buff, NULL);
return output_llvm;
}
Is there anything else I can do to improve the performance of the output LLVM IR?
EDIT: I have tried to add all of the optimizations from the AddOptimizationPasses() function in opt.cpp, as shown below:
PassManager *pm = new PassManager();
int optLevel = 3;
int sizeLevel = 0;
PassManagerBuilder builder;
builder.OptLevel = optLevel;
builder.SizeLevel = sizeLevel;
builder.Inliner = createFunctionInliningPass(optLevel, sizeLevel);
builder.DisableUnitAtATime = false;
builder.DisableUnrollLoops = false;
builder.LoopVectorize = true;
builder.SLPVectorize = true;
builder.populateModulePassManager(*pm);
pm->run(*module);
Also, I create a FunctionPassManager before I create the PassManager and add several passes like so:
FunctionPassManager *fpm = new FunctionPassManager(module);
// add several passes
fpm->doInitialization();
for (Function &f : *ir)
fpm->run(f);
fpm->doFinalization();
However, the performance is the same as running on the command line with -O1 whereas I can get much better performance on the command line using -O3. Any suggestions?
Follow the logic in the function AddOptimizationPasses in opt.cpp. This is the source of truth.
While looking into LLVM optimization I found this information on pass ordering and I think it's potentially telling for why someone might encounter this situation.
Depending on your language and the optimizations you're expecting, you may need to specifically tune your optimizing passes to your use-cases. In particular, the ordering of those passes may be important. For example, if your better -O3 code was optimizing completely un-optimized code or code that was already partially optimized by your program, it may just be that you need to re-order or duplicate some passes in order to the expected final result.
Given the specific wording here and the fact that Eli's answer was accepted I'm not 100% sure if this is what the OP was seeing but this knowledge may be helpful for others with similar questions who find this answer like I did.
Related
I am in the process of updating a NodeJS package, due to a breakage in NodeJS 14. This library makes use of C++ code. In NodeJS 12 the same code appears as a deprecation warning:
warning: ‘v8::Local<v8::Value> v8::Object::Get(v8::Local<v8::Value>)’ is deprecated: Use maybe version
With the code in question being:
v8::Local<v8::Object> options = v8::Local<v8::Object>::Cast(info[0]);
v8::Local<v8::Value> debug = options->Get(Nan::New<v8::String>("debug").ToLocalChecked());
if (true) {
v8::Local<v8::Value> leds = options->Get(Nan::New<v8::String>("leds").ToLocalChecked());
if (!leds->IsUndefined())
ws2811.channel[0].count = Nan::To<int>(leds).FromMaybe(ws2811.channel[0].count);
else
return Nan::ThrowTypeError("configure(): leds must be defined");
}
I did try the following and while it does compile, runtime suggests this may be wrong, since I get failure which didn't exist before this code change:
v8::Local<v8::Object> options = v8::Local<v8::Object>::Cast(info[0]);
Nan::MaybeLocal<v8::Value> debug = Nan::Get(options, Nan::New<v8::String>("debug").ToLocalChecked());
if (true) {
Nan::MaybeLocal<v8::Value> maybe_leds = Nan::Get(options, Nan::New<v8::String>("leds").ToLocalChecked());
v8::Local<v8::Value> leds;
if (!maybe_leds.IsEmpty() && maybe_leds.ToLocal(&leds))
ws2811.channel[0].count = Nan::To<int>(leds).FromMaybe(ws2811.channel[0].count);
else
return Nan::ThrowTypeError("configure(): leds must be defined");
}
Being pretty rusty with C++ and new to V8, I am a little confused as to what the right replacement for the Get method is, in this context. What I do think I understand is that we need to use MaybeLocal instead of Local. Doing a search turns up a lot of other people with similar issues, but nothing that I can use as a solution.
BTW this project does depend on nan.
The key insight is that most operations that involve JavaScript can throw an exception instead of returning a value. The MaybeLocal convention makes this explicit: a MaybeLocal is either a Local (if the function/operation returned a value), or empty (if there was an exception). If you have a v8::TryCatch or Nan::TryCatch, it will have caught the exception in the latter case.
There are several ways for embedding code to deal with MaybeLocals; the most elegant is the bool-returning .ToLocal(...) method. It is equivalent to checking .IsEmpty(), so you don't need to do both.
So that would give you:
Nan::TryCatch try_catch;
v8::Local<v8::String> leds_string = Nan::New<v8::String>("leds").ToLocalChecked();
Nan::MaybeLocal<v8::Value> maybe_leds = Nan::Get(options, leds_string);
v8::Local<v8::Value> leds;
if (!maybe_leds.ToLocal(&leds)) {
// An exception was thrown while reading `options["leds"]`.
// This is the same as `maybe_leds.IsEmpty() == true`.
// It is also the same as `try_catch.HasCaught() == true`.
return try_catch.ReThrow();
}
// Now follows your original code.
if (leds->IsUndefined()) {
// The `options` object didn't have a `leds` property, or it was undefined.
return Nan::ThrowTypeError("configure(): leds must be defined");
}
// Success case: all good.
ws2811.channel[0].count = Nan::To<int>(leds).FromMaybe(ws2811.channel[0].count);
See documentation at https://github.com/nodejs/nan/blob/master/doc/maybe_types.md.
With some more digging, this seems to be the right approach. This is essentially pieced together base on the Nan docs and snippets I found around the net:
v8::Local<v8::Object> options = v8::Local<v8::Object>::Cast(info[0]);
v8::MaybeLocal<v8::Value> debug = Nan::Get(options, Nan::New<v8::String>("debug").ToLocalChecked());
if (Nan::Has(options, Nan::New<v8::String>("leds").ToLocalChecked()).ToChecked()) {
Nan::MaybeLocal<v8::Value> maybe_leds = Nan::Get(options, Nan::New<v8::String>("leds").ToLocalChecked());
v8::Local<v8::Value> leds;
if (maybe_leds.ToLocal(&leds))
ws2811.channel[0].count = Nan::To<int>(leds).FromMaybe(ws2811.channel[0].count);
else
return Nan::ThrowTypeError("configure(): leds must be defined");
}
It would appear the Nan::Has check is important, otherwise the code does not seem to behave correctly.
I've got a code which parses a file and breaks up if invalid conditions are met.
The code is in C++ and goes like that:
bool ok = true;
if (task1() == false)
ok = false;
if (ok && (task2() == false))
ok = false;
if (ok && (task3() == false))
ok = false;
cleanup();
return ok;
Now I'm looking into cleaner alternatives to get the same result.
As far as I see there are:
using a flag and many conditions as in the code above
There are many redundant tests for the same information.
The effect on the runtime will be negligible and probably
entirely removed by the compiler but it still makes the code
more complicated.
you could wrap the tasks in a method and return from it
This looks much cleaner but you spread your code in multiple
functions. Depending on your context there might be a long
list of parameters. Further more many it is also not the
best to spread returns all over the method.
you could use exceptions
This will give some quite descriptive code but it is also
heavy as you just want to skip some calls. Further more it
might not be exactly an exceptional case.
you could break from a do while(0) or another loop
or switch statement.
Well, it is not really meant for such a task. Other than
that you get a lightweight and compact implementation with
and descriptive keyword.
using a goto statement
That seems to combine most advantages. Still, I am unsure.
Everywhere people are stating, that breaking multiple loops
is the only remaining sensible use for this keyword.
I didn't find a discussion about such a code. Which implementations are generally suggested? Is this case mentioned in any C++ coding guidelines? Are there other practical options?
Edit: My goal does not seem to be clear. I'm looking for the best way how to break from a procedure and not for a way to call three methods. The described problem is more of an example. I'm interested in arguments for and against different syntaxes to do this.
In the code of object each method is a placeholder for a couple of code lines which are similar but differ from each other. There are maybe 50 code blocks. An old code block was looking like the following (I know that there are more things to optimize than just the objective of this question):
if (ok)
{
keyString = "speed";
tmpVal = configuration->getValue(section, keyString, -1);
if (tmpVal != -1)
{
m_speed = tmpVal;
if (m_speed < m_minSpeed)
m_minSpeed = m_speed;
m_needSpeed = true;
}
else if (m_needSpeed)
{
ok = false;
keyErr = keyString;
}
}
Assuming that all of these functions return a bool, it looks to me like the shown code is logically identical to:
bool ok= task1() && task2() && task3();
cleanup();
return ok;
Why does my call to
jit->lookup("test");
hit a failed assert: "Resolving symbol outside this responsibility set"?
It does this when I create my function as:
define double #test() {
begin:
ret double 1.343000e+01
}
But it works fine (i.e., finds it without an assert) when I create the function as
define void #test() {
begin:
ret void
}
It is not a case of not finding the function "test", it has different behavior if I lookup a name that doesn't exist.
Here's the code that hits the assert:
ThreadSafeModule Create_M()
{
auto pCtx = make_unique<LLVMContext>();
LLVMContext& ctx = *pCtx;
auto pM = make_unique<Module>("myModule", ctx);
Module& M = *pM;
IRBuilder<> builder(ctx);
FunctionType* FT = FunctionType::get(Type::getDoubleTy(ctx),false);
Function* testFn = Function::Create(FT,
GlobalValue::LinkageTypes::ExternalLinkage, "test", M);
auto BB = BasicBlock::Create(ctx,"begin",testFn);
builder.SetInsertPoint(BB);
builder.CreateRet(ConstantFP::get(ctx,APFloat(13.43)));
outs() << M; // For debugging
return ThreadSafeModule(std::move(pM), std::move(pCtx));
}
int main()
{
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
// Create an LLJIT instance.
auto jit = ExitOnErr(LLJITBuilder().create());
auto M1 = Create_M();
ExitOnErr(jit->addIRModule(std::move(M1)));
auto testSym = ExitOnErr(jit->lookup("test"));
}
Replace the function creation with these lines and it doesn't have the problem:
FunctionType* FT = FunctionType::get(Type::getVoidTy(ctx),false);
Function* testFn = Function::Create(FT,
GlobalValue::LinkageTypes::ExternalLinkage, "test", M);
auto BB = BasicBlock::Create(ctx,"begin",testFn);
builder.SetInsertPoint(BB);
builder.CreateRetVoid();
I'd like to understand what the assert means, why it asserts in the one case and not the other, and what I need to do for the (*double)() case to get it to work. I did a lot of searching for documentation on LLVM responsibility sets, and found almost nothing. Some mention at https://llvm.org/docs/ORCv2.html, but not enough for me to interpret what it is telling me with this assert.
I'm using the SVN repository version of LLVM as of 20-Aug-2019, building on Visual Studio 2017 15.9.6.
To fix this error, add ObjectLinkingLayer.setAutoClaimResponsibilityForObjectSymbols(true);
Such as:
auto jit = ExitOnErr(LLJITBuilder()
.setJITTargetMachineBuilder(std::move(JTMB))
.setObjectLinkingLayerCreator([&](ExecutionSession &ES, const Triple &TT) {
auto ll = make_unique<ObjectLinkingLayer>(ES,
make_unique<jitlink::InProcessMemoryManager>());
ll->setAutoClaimResponsibilityForObjectSymbols(true);
return move(ll);
})
.create());
This was indeed bug in ORC LLJIT on Windows platform.
See bug record here:
https://bugs.llvm.org/show_bug.cgi?id=44337
Fix commit reference:
https://github.com/llvm/llvm-project/commit/84217ad66115cc31b184374a03c8333e4578996f
For anyone building custom JIT / compiler-layer stack by hand (not using LLJIT), all you need to do is force weak symbol autoclaim when emitting ELF images.
if (JTMB.getTargetTriple().isOSBinFormatCOFF())
{
ObjectLayer.setAutoClaimResponsibilityForObjectSymbols(true);
}
http://llvm.org/doxygen/classllvm_1_1orc_1_1ObjectLinkingLayer.html#aa30bc825696d7254aef0fe76015d10ff
If set, this ObjectLinkingLayer instance will claim responsibility for
any symbols provided by a given object file that were not already in
the MaterializationResponsibility instance.
Setting this flag allows higher-level program representations (e.g.
LLVM IR) to be added based on only a subset of the symbols they
provide, without having to write intervening layers to scan and add
the additional symbols. This trades diagnostic quality for convenience
however: If all symbols are enumerated up-front then clashes can be
detected and reported early (and usually deterministically). If this
option is set, clashes for the additional symbols may not be detected
until late, and detection may depend on the flow of control through
JIT'd code. Use with care.
I'm writing some audio code where basically everything is a tiny loop. Branch prediction failures as I understand them is a big enough performance issue that I struggle to keep the code branch free. But there is only so far that can take me, which got me wondering about the different kinds of branching.
In c++, the conditional branch to fixed target:
int cond_fixed(bool p) {
if (p) return 10;
return 20;
}
And (if I understand this question correctly), the unconditional branch to variable target:
struct base {
virtual int foo() = 0;
};
struct a : public base {
int foo() { return 10; }
};
struct b : public base {
int foo() { return 20; }
};
int uncond_var(base* p) {
return p->foo();
}
Are there performance differences? It seems to me that if one of the two methods were obviously faster than the other, the compiler would simply transform the code to match.
For those cases where branch prediction is of very high importance, what details regarding performance are useful to know?
EDIT: The actual operation of x : 10 ? 20 is merely a place holder. The actual operation following the branch is at least complex enough that doing both is inefficient. Additionally, if I had enough information to sensibly use __builtin_expect, branch prediction would be a non-issue in this case.
Side note: if you have a code like
if (p) a = 20; else a = 10;
then there isn't any branch. The compiler is using a conditional move (see: Why is a conditional move not vulnerable for Branch Prediction Failure?)
You didn't mention your compiler. I once used GCC for a performance critical application (a contest at my university actually) and I remember that GCC has the __builtin_expect macro. I went through all the conditions in my code and ended up with 5-10% speedup, which I found to be amazing, given the fact that I payed attention to pretty much everything I knew (memory-layout etc.) and that I didn't change anything regarding the algorithm itself.
The algorithm was a pretty basic depth-search by the way. And I ran it on a Core 2 Duo, not sure which ones though.
I have a "MyFunction" I keep obsessing over if I should or shouldn't use goto on it and in similar (hopefully rare) circumstances. So I'm trying to establish a hard-and-fast habit for this situation. To-do or not-to-do.
int MyFunction()
{ if (likely_condition)
{
condition_met:
// ...
return result;
}
else /*unlikely failure*/
{ // meet condition
goto condition_met;
}
}
I was intending to net the benefits of the failed conditional jump instruction for the likely case. However I don't see how the compiler could know which to streamline for case probability without something like this.
it works right?
are the benefits worth the confusion?
are there better (less verbose, more structured, more expressive) ways to enable this optimization?
It appears to me that the optimization you're trying to do is mostly obsolete. Most modern processors have branch prediction built in, so (assuming it's used enough to notice) they track how often a branch is taken or not and predict whether the branch is likely to be taken or not based on its past pattern of being taken or not. In this case, speed depends primarily on how accurate that prediction is, not whether the prediction is for taken vs. not taken.
As such, you're probably best off with rather simpler code:
int MyFunction() {
if (!likely_condition) {
meet_condition();
}
// ...
return result;
}
A modern CPU will take that branch either way with equal performance if it makes the correct branch prediction. So if that is in an inner loop, the performance of if (unlikely) { meet condition } common code; will match what you have written.
Also, if you spell out the common code in both branches the compiler will generate code that is identical to what you have written: The common case will be emitted for the if clause and the else clause will jmp to the common code. You see this all the time with simpler terminal cases like *out = whatever; return result;. When debugging it can be hard to tell which return you're looking at because they've all been merged.
It looks like the code should work as you expect as long as condition_met: doesn't skip variable initializations.
No, and you don't even know that the obfuscated version compiles into more optimal code. Compiler optimizations (and processor branch prediction) are getting very smart in recent times.
3.
int MyFunction()
{
if (!likely_condition)
{
// meet condition
}
condition_met:
// ...
return result;
}
or, if it helps your compiler (check the assembly)
int MyFunction()
{
if (likely_condition); else
{
// meet condition
}
condition_met:
// ...
return result;
}
I would highly recommend using the __builtin_expect() macro (GCC) or alike for your particular C++ compiler (see Portable branch prediction hints) instead of using goto:
int MyFunction()
{ if (__builtin_expect(likely_condition))
{
// ...
return result;
}
else /*unlikely failure*/
{ // meet condition
}
}
As others also mentioned goto is error prone and evil from the bones.