Why does my call to
jit->lookup("test");
hit a failed assert: "Resolving symbol outside this responsibility set"?
It does this when I create my function as:
define double #test() {
begin:
ret double 1.343000e+01
}
But it works fine (i.e., finds it without an assert) when I create the function as
define void #test() {
begin:
ret void
}
It is not a case of not finding the function "test", it has different behavior if I lookup a name that doesn't exist.
Here's the code that hits the assert:
ThreadSafeModule Create_M()
{
auto pCtx = make_unique<LLVMContext>();
LLVMContext& ctx = *pCtx;
auto pM = make_unique<Module>("myModule", ctx);
Module& M = *pM;
IRBuilder<> builder(ctx);
FunctionType* FT = FunctionType::get(Type::getDoubleTy(ctx),false);
Function* testFn = Function::Create(FT,
GlobalValue::LinkageTypes::ExternalLinkage, "test", M);
auto BB = BasicBlock::Create(ctx,"begin",testFn);
builder.SetInsertPoint(BB);
builder.CreateRet(ConstantFP::get(ctx,APFloat(13.43)));
outs() << M; // For debugging
return ThreadSafeModule(std::move(pM), std::move(pCtx));
}
int main()
{
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
// Create an LLJIT instance.
auto jit = ExitOnErr(LLJITBuilder().create());
auto M1 = Create_M();
ExitOnErr(jit->addIRModule(std::move(M1)));
auto testSym = ExitOnErr(jit->lookup("test"));
}
Replace the function creation with these lines and it doesn't have the problem:
FunctionType* FT = FunctionType::get(Type::getVoidTy(ctx),false);
Function* testFn = Function::Create(FT,
GlobalValue::LinkageTypes::ExternalLinkage, "test", M);
auto BB = BasicBlock::Create(ctx,"begin",testFn);
builder.SetInsertPoint(BB);
builder.CreateRetVoid();
I'd like to understand what the assert means, why it asserts in the one case and not the other, and what I need to do for the (*double)() case to get it to work. I did a lot of searching for documentation on LLVM responsibility sets, and found almost nothing. Some mention at https://llvm.org/docs/ORCv2.html, but not enough for me to interpret what it is telling me with this assert.
I'm using the SVN repository version of LLVM as of 20-Aug-2019, building on Visual Studio 2017 15.9.6.
To fix this error, add ObjectLinkingLayer.setAutoClaimResponsibilityForObjectSymbols(true);
Such as:
auto jit = ExitOnErr(LLJITBuilder()
.setJITTargetMachineBuilder(std::move(JTMB))
.setObjectLinkingLayerCreator([&](ExecutionSession &ES, const Triple &TT) {
auto ll = make_unique<ObjectLinkingLayer>(ES,
make_unique<jitlink::InProcessMemoryManager>());
ll->setAutoClaimResponsibilityForObjectSymbols(true);
return move(ll);
})
.create());
This was indeed bug in ORC LLJIT on Windows platform.
See bug record here:
https://bugs.llvm.org/show_bug.cgi?id=44337
Fix commit reference:
https://github.com/llvm/llvm-project/commit/84217ad66115cc31b184374a03c8333e4578996f
For anyone building custom JIT / compiler-layer stack by hand (not using LLJIT), all you need to do is force weak symbol autoclaim when emitting ELF images.
if (JTMB.getTargetTriple().isOSBinFormatCOFF())
{
ObjectLayer.setAutoClaimResponsibilityForObjectSymbols(true);
}
http://llvm.org/doxygen/classllvm_1_1orc_1_1ObjectLinkingLayer.html#aa30bc825696d7254aef0fe76015d10ff
If set, this ObjectLinkingLayer instance will claim responsibility for
any symbols provided by a given object file that were not already in
the MaterializationResponsibility instance.
Setting this flag allows higher-level program representations (e.g.
LLVM IR) to be added based on only a subset of the symbols they
provide, without having to write intervening layers to scan and add
the additional symbols. This trades diagnostic quality for convenience
however: If all symbols are enumerated up-front then clashes can be
detected and reported early (and usually deterministically). If this
option is set, clashes for the additional symbols may not be detected
until late, and detection may depend on the flow of control through
JIT'd code. Use with care.
Related
I am playing with llvm (and antlr), working vaguely along the lines of the Kaleidoscope tutorial. I successfully created LLVM-IR code from basic arithmetic expressions both on top-level and as function definitions, which corresponds to the tutorial chapters up to 3.
Now I would like to incrementally add JIT support, starting with the top-level arithmetic expressions. Here is my problem:
Basic comparison makes it seem as if I follow the same sequence of function calls as the tutorial, only with a simpler code organization
The generated IR code looks good
The function definition is apparently found, since otherwise the code would exit (i verified this by intentionally looking for a wrongly spelled function name)
However the call of the function pointer created by JIT evaluation always returns zero.
These snippets (excerpt) are executed as part of the antlr visitor of the main/entry-node of my grammar:
//Top node main -- top level expression
antlrcpp::Any visitMain(ExprParser::MainContext *ctx)
{
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetAsmParser();
TheJIT = ExitOnErr( llvm::orc::KaleidoscopeJIT::Create() );
InitializeModuleAndPassManager();
// ... Code which visits the child nodes ...
}
InitializeModuleAndPassManager() is the same as in the tutorial:
static void InitializeModuleAndPassManager()
{
// Open a new context and module.
TheContext = std::make_unique<llvm::LLVMContext>();
TheModule = std::make_unique<llvm::Module>("commandline", *TheContext);
TheModule->setDataLayout(TheJIT->getDataLayout());
// Create a new builder for the module.
Builder = std::make_unique<llvm::IRBuilder<>>(*TheContext);
// Create a new pass manager attached to it.
TheFPM = std::make_unique<llvm::legacy::FunctionPassManager>(TheModule.get());
// Do simple "peephole" optimizations and bit-twiddling optzns.
TheFPM->add(llvm::createInstructionCombiningPass());
// Reassociate expressions.
TheFPM->add(llvm::createReassociatePass());
// Eliminate Common SubExpressions.
TheFPM->add(llvm::createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
TheFPM->add(llvm::createCFGSimplificationPass());
TheFPM->doInitialization();
}
This is the function which handles the top-level expression and which is also supposed to do JIT evaluation:
//Bare expression without function definition -- create anonymous function
antlrcpp::Any visitBareExpr(ExprParser::BareExprContext *ctx)
{
string fName = "__anon_expr";
llvm::FunctionType *FT = llvm::FunctionType::get(llvm::Type::getDoubleTy(*TheContext), false);
llvm::Function *F = llvm::Function::Create(FT, llvm::Function::ExternalLinkage, fName, TheModule.get());
llvm::BasicBlock *BB = llvm::BasicBlock::Create(*TheContext, "entry", F);
Builder->SetInsertPoint(BB);
llvm::Value* Expression=visit(ctx->expr()).as<llvm::Value* >();
Builder->CreateRet(Expression);
llvm::verifyFunction(*F);
//TheFPM->run(*F);//outcommented this because i wanted to try JIT before optimization-
//it causes a compile error right now because i probably lack some related code.
//However i do not assume that a missing optimization run will cause the problem that i have
F->print(llvm::errs());
// Create a ResourceTracker to track JIT'd memory allocated to our
// anonymous expression -- that way we can free it after executing.
auto RT = TheJIT->getMainJITDylib().createResourceTracker();
auto TSM = llvm::orc::ThreadSafeModule(move(TheModule), move(TheContext));
ExitOnErr(TheJIT->addModule(move(TSM), RT));
InitializeModuleAndPassManager();
// Search the JIT for the __anon_expr symbol.
auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));
// Get the symbol's address and cast it to the right type (takes no
// arguments, returns a double) so we can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
double ret = FP();
fprintf(stderr, "Evaluated to %f\n", ret);
// Delete the anonymous expression module from the JIT.
ExitOnErr(RT->remove());
return F;
}
Now this is what happens as an example:
[robert#robert-ux330uak test4_expr_llvm_2]$ ./testmain '3*4'
define double #__anon_expr() {
entry:
ret float 1.200000e+01
}
Evaluated to 0.000000
I would be thankful for any ideas about what I might be doing wrong.
I am compiling C++ code with Clang. (Apple clang version 12.0.5 (clang-1205.0.22.11)).
Clang can give tips in case you misspell a variable:
#include <iostream>
int main() {
int my_int;
std::cout << my_it << std::endl;
}
spellcheck-test.cpp:5:18: error: use of undeclared identifier 'my_it'; did you mean 'my_int'?
std::cout << my_it << std::endl;
^~~~~
my_int
spellcheck-test.cpp:4:9: note: 'my_int' declared here
int my_int;
^
1 error generated.
My question is:
What is the criterion Clang uses to determine when to suggest another variable?
My experimentation suggests it is quite sophisticated:
If there is another similarly named variable that you might have meant (e.g. int my_in;) it does not give a suggestion
If the suggested variable has the wrong type for the operation (e.g. by trying to print my_it.size() instead) it does not give a suggestion
Whether or not it gives the suggestion depends on a non-trivial comparison of variable names: it allows for both deletions and insertions of characters, and longer variable names allow for more insertion/deletions to be considered "similar".
You will not likely find it documented, but as Clang is open-source you can turn to the source to try to figure it out.
Clangd?
The particular diagnostic (from DiagnosticSemaKinds.td):
def err_undeclared_var_use_suggest : Error<
"use of undeclared identifier %0; did you mean %1?">;
is ever only referred to from clang-tools-extra/clangd/IncludeFixer.cpp:
// Try to fix unresolved name caused by missing declaration.
// E.g.
// clang::SourceManager SM;
// ~~~~~~~~~~~~~
// UnresolvedName
// or
// namespace clang { SourceManager SM; }
// ~~~~~~~~~~~~~
// UnresolvedName
// We only attempt to recover a diagnostic if it has the same location as
// the last seen unresolved name.
if (DiagLevel >= DiagnosticsEngine::Error &&
LastUnresolvedName->Loc == Info.getLocation())
return fixUnresolvedName();
Now, clangd is a language server and t.b.h. I don't know how whether this is actually used by the Clang compiler frontend to yield certain diagnostics, but you're free to continue down the rabbit hole to tie together these details. The fixUnresolvedName above eventually performs a fuzzy search:
if (llvm::Optional<const SymbolSlab *> Syms = fuzzyFindCached(Req))
return fixesForSymbols(**Syms);
If you want to dig into the details, I would recommend starting with the fuzzyFindCached function:
llvm::Optional<const SymbolSlab *>
IncludeFixer::fuzzyFindCached(const FuzzyFindRequest &Req) const {
auto ReqStr = llvm::formatv("{0}", toJSON(Req)).str();
auto I = FuzzyFindCache.find(ReqStr);
if (I != FuzzyFindCache.end())
return &I->second;
if (IndexRequestCount >= IndexRequestLimit)
return llvm::None;
IndexRequestCount++;
SymbolSlab::Builder Matches;
Index.fuzzyFind(Req, [&](const Symbol &Sym) {
if (Sym.Name != Req.Query)
return;
if (!Sym.IncludeHeaders.empty())
Matches.insert(Sym);
});
auto Syms = std::move(Matches).build();
auto E = FuzzyFindCache.try_emplace(ReqStr, std::move(Syms));
return &E.first->second;
}
along with the type of its single function parameter, FuzzyFindRequest in clang/index/Index.h:
struct FuzzyFindRequest {
/// A query string for the fuzzy find. This is matched against symbols'
/// un-qualified identifiers and should not contain qualifiers like "::".
std::string Query;
/// If this is non-empty, symbols must be in at least one of the scopes
/// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"
/// is provided, the matched symbols must be defined in namespace xyz but not
/// namespace xyz::abc.
///
/// The global scope is "", a top level scope is "foo::", etc.
std::vector<std::string> Scopes;
/// If set to true, allow symbols from any scope. Scopes explicitly listed
/// above will be ranked higher.
bool AnyScope = false;
/// The number of top candidates to return. The index may choose to
/// return more than this, e.g. if it doesn't know which candidates are best.
llvm::Optional<uint32_t> Limit;
/// If set to true, only symbols for completion support will be considered.
bool RestrictForCodeCompletion = false;
/// Contextually relevant files (e.g. the file we're code-completing in).
/// Paths should be absolute.
std::vector<std::string> ProximityPaths;
/// Preferred types of symbols. These are raw representation of `OpaqueType`.
std::vector<std::string> PreferredTypes;
bool operator==(const FuzzyFindRequest &Req) const {
return std::tie(Query, Scopes, Limit, RestrictForCodeCompletion,
ProximityPaths, PreferredTypes) ==
std::tie(Req.Query, Req.Scopes, Req.Limit,
Req.RestrictForCodeCompletion, Req.ProximityPaths,
Req.PreferredTypes);
}
bool operator!=(const FuzzyFindRequest &Req) const { return !(*this == Req); }
};
Other rabbit holes?
The following commit may be another leg to start from:
[Frontend] Allow attaching an external sema source to compiler instance and extra diags to TypoCorrections
This can be used to append alternative typo corrections to an existing
diag. include-fixer can use it to suggest includes to be added.
Differential Revision: https://reviews.llvm.org/D26745
from which we may end up in clang/include/clang/Sema/TypoCorrection.h, which sounds like a more reasonably used feature by the compiler frontend than that of the (clang extra tool) clangd. E.g.:
/// Gets the "edit distance" of the typo correction from the typo.
/// If Normalized is true, scale the distance down by the CharDistanceWeight
/// to return the edit distance in terms of single-character edits.
unsigned getEditDistance(bool Normalized = true) const {
if (CharDistance > MaximumDistance || QualifierDistance > MaximumDistance ||
CallbackDistance > MaximumDistance)
return InvalidDistance;
unsigned ED =
CharDistance * CharDistanceWeight +
QualifierDistance * QualifierDistanceWeight +
CallbackDistance * CallbackDistanceWeight;
if (ED > MaximumDistance)
return InvalidDistance;
// Half the CharDistanceWeight is added to ED to simulate rounding since
// integer division truncates the value (i.e. round-to-nearest-int instead
// of round-to-zero).
return Normalized ? NormalizeEditDistance(ED) : ED;
}
used in clang/lib/Sema/SemaDecl.cpp:
// Callback to only accept typo corrections that have a non-zero edit distance.
// Also only accept corrections that have the same parent decl.
class DifferentNameValidatorCCC final : public CorrectionCandidateCallback {
public:
DifferentNameValidatorCCC(ASTContext &Context, FunctionDecl *TypoFD,
CXXRecordDecl *Parent)
: Context(Context), OriginalFD(TypoFD),
ExpectedParent(Parent ? Parent->getCanonicalDecl() : nullptr) {}
bool ValidateCandidate(const TypoCorrection &candidate) override {
if (candidate.getEditDistance() == 0)
return false;
// ...
}
// ...
};
I would recommend checking out this 10-year old blog by Chris Lattner for a general idea of Clang error recovery mechanisms.
On Clang's Spell Checker, he writes:
One of the more visible things that Clang includes is a spell checker (also on reddit). The spell checker kicks in when you use an identifier that Clang doesn't know: it checks against other close identifiers and suggests what you probably meant.
...
Clang uses the well known Levenshtein distance function to compute the best match out of the possible candidates.
I’m trying to JIT compile some functions in an existing C/C++ program at runtime, but I’m running into some trouble with global variable initialization. Specifically, the approach I’ve taken is to use Clang to precompile the program into IR bitcode modules in addition to the executable. At runtime, the program loads the modules, transforms them (program specialization), compiles and executes them. As it turns out, I have some global variables that get initialized and modified during execution of the “host” program. Currently, these globals are also getting initialized in the JIT compiled code, whereas I’d like them to be mapped to the host global variables instead. Can someone help me with this?
A small repro is excerpted below. Full source code is here. The file somefunc.cpp gets precompiled during build, and is loaded in the main() function in testCompile.cpp. The global variable xyz is initialized to point to 25 in somefunc.cpp, but I’d like it to point to 10 as in main() instead. In other words, the assertion in main() should succeed.
I tried a few different ways to solve this problem. The ChangeGlobal() function attempts (unsuccessfully) to achieve this updateGlobalMapping(). The second, more hacky approach uses a new global variable initialized appropriately. I can get this latter approach to work for some types of globals, but is there a more elegant approach than this?
//————— somefunc.h ————————
extern int *xyz;
//—————— somefunc.cpp ——————
int abc = 25;
int *xyz = &abc;
int somefunc() {
return *xyz;
}
//—————— testCompile.cpp ——————
class JitCompiler {
public:
JitCompiler(const std::string module_file);
void LoadModule(const std::string& file);
template <typename FnType>
FnType CompileFunc(FnType fn, const std::string& fn_name);
void ChangeGlobal();
private:
std::unique_ptr<LLVMContext> context_;
Module *module_;
std::unique_ptr<ExecutionEngine> engine_;
};
void JitCompiler::ChangeGlobal() {
// ----------------- #1: UpdateGlobalMapping -----------------
//auto g = engine_->FindGlobalVariableNamed("xyz");
//engine_->updateGlobalMapping(g, &xyz);
//assert(engine_->getGlobalValueAddress("xyz") == (uint64_t) &xyz);
// ----------------- #2: Replace with new global ————————
// ------- Ugly hack that works for globals of type T** ----------
auto g = engine_->FindGlobalVariableNamed("xyz");
Constant *addr_i = ConstantInt::get(*context_, APInt(64, (uint64_t) xyz));
auto addr = ConstantExpr::getIntToPtr(
addr_i, g->getType()->getPointerElementType());
GlobalVariable *n = new GlobalVariable(
*module_,
g->getType()->getPointerElementType(),
g->isConstant(),
g->getLinkage(),
addr,
g->getName() + "_new");
g->replaceAllUsesWith(n);
n->takeName(g);
g->eraseFromParent();
}
int main() {
xyz = new int (10);
JitCompiler jit("somefunc.bc");
jit.ChangeGlobal();
auto fn = jit.CompileFunc(&somefunc, "somefunc");
assert(somefunc() == fn());
}
A better approach is the combination of the two you presented, that is, to create a new global with external linkage mapped to &xyz and substitute it for the original:
auto g = engine_->FindGlobalVariableNamed("xyz");
GlobalVariable *n = new GlobalVariable(
g->getType()->getPointerElementType(),
g->isConstant(),
ExternalLinkage
nullptr,
g->getName() + "_new");
engine_->updateGlobalMapping(n, &xyz);
g->replaceAllUsesWith(n);
n->takeName(g);
g->eraseFromParent();
Is it possible to write some f() template function that takes a type T and a pointer to member function of signature void(T::*pmf)() as (template and/or function) arguments and returns a const char* that points to the member function's __func__ variable (or to the mangled function name)?
EDIT: I am asked to explain my use-case. I am trying to write a unit-test library (I know there is a Boost Test library for this purpose). And my aim is not to use any macros at all:
struct my_test_case : public unit_test::test {
void some_test()
{
assert_test(false, "test failed.");
}
};
My test suite runner will call my_test_case::some_test() and if its assertion fails, I want it log:
ASSERTION FAILED (&my_test_case::some_test()): test failed.
I can use <typeinfo> to get the name of the class but the pointer-to-member-function is just an offset, which gives no clue to the user about the test function being called.
It seems like what you are trying to achieve, is to get the name of the calling function in assert_test(). With gcc you can use
backtace to do that. Here is a naive example:
#include <iostream>
#include <execinfo.h>
#include <cxxabi.h>
namespace unit_test
{
struct test {};
}
std::string get_my_caller()
{
std::string caller("???");
void *bt[3]; // backtrace
char **bts; // backtrace symbols
size_t size = sizeof(bt)/sizeof(*bt);
int ret = -4;
/* get backtrace symbols */
size = backtrace(bt, size);
bts = backtrace_symbols(bt, size);
if (size >= 3) {
caller = bts[2];
/* demangle function name*/
char *name;
size_t pos = caller.find('(') + 1;
size_t len = caller.find('+') - pos;
name = abi::__cxa_demangle(caller.substr(pos, len).c_str(), NULL, NULL, &ret);
if (ret == 0)
caller = name;
free(name);
}
free(bts);
return caller;
}
void assert_test(bool expression, const std::string& message)
{
if (!expression)
std::cout << "ASSERTION FAILED " << get_my_caller() << ": " << message << std::endl;
}
struct my_test_case : public unit_test::test
{
void some_test()
{
assert_test(false, "test failed.");
}
};
int main()
{
my_test_case tc;
tc.some_test();
return 0;
}
Compiled with:
g++ -std=c++11 -rdynamic main.cpp -o main
Output:
ASSERTION FAILED my_test_case::some_test(): test failed.
Note: This is a gcc (linux, ...) solution, which might be difficult to port to other platforms!
TL;DR: It is not possible to do this in a reasonably portable way, other than using macros. Using debug symbols is really a hard solution, which will introduce a maintenance and architecture problem in the future, and a bad solution.
The names of functions, in any form, is not guaranteed to be stored in the binary [or anywhere else for that matter]. Static free functions certainly won't have to expose their name to the rest of the world, and there is no real need for virtual member functions to have their names exposed either (except when the vtable is formed in A.c and the member function is in B.c).
It is also entirely permissible for the linker to remove ALL names of functions and variables. Names MAY be used by shared libraries to find functions not present in the binary, but the "ordinal" way can avoid that too, if the system is using that method.
I can't see any other solution than making assert_test a macro - and this is actually a GOOD use-case of macros. [Well, you could of course pass __func__ as a an argument, but that's certainly NOT better than using macros in this limited case].
Something like:
#define assert_test(x, y) do_assert_test(x, y, __func__)
and then implment do_assert_test to do what your original assert_test would do [less the impossible bit of figuring out the name of the function].
If it's unit tests, and you can be sure that you will always do this with debug symbols, you could solve it in a very non-portable way by building with debug symbols and then using the debug interface to find the name of the function you are currently in. The reason I say it's non-portable is that the debug API for a given OS is not standard - Windows does it one way, Linux another, and I'm not sure how it works in MacOS - and to make matters worse, my quick search on the subject seems to indicate that reading debug symbols doesn't have an API as such - there is a debug API that allows you to inspect the current process and figure out where you are, what the registers contain, etc, but not to find out what the name of the function is. So that's definitely a harder solution than "convince whoever needs to be convinced that this is a valid use of a macro".
I'm trying to figure out how to perform all optimizations on an LLVM Module (e.g., all -O3 optimizations). I've tried the following but I'm not sure that all possible optimizations are being applied (e.g., inlining).
//take string "llvm" (LLVM IR) and return "output_llvm" (optimized LLVM IR)
static string optimize(string llvm) {
LLVMContext &ctx = getGlobalContext();
SMDiagnostic err;
Module *ir = ParseIR(MemoryBuffer::getMemBuffer(llvm), err, ctx);
PassManager *pm = new PassManager();
PassManagerBuilder builder;
builder.OptLevel = 3;
builder.populateModulePassManager(*pm);
pm->run(*ir);
delete pm;
string output_llvm;
raw_string_ostream buff(output_llvm);
ir->print(buff, NULL);
return output_llvm;
}
Is there anything else I can do to improve the performance of the output LLVM IR?
EDIT: I have tried to add all of the optimizations from the AddOptimizationPasses() function in opt.cpp, as shown below:
PassManager *pm = new PassManager();
int optLevel = 3;
int sizeLevel = 0;
PassManagerBuilder builder;
builder.OptLevel = optLevel;
builder.SizeLevel = sizeLevel;
builder.Inliner = createFunctionInliningPass(optLevel, sizeLevel);
builder.DisableUnitAtATime = false;
builder.DisableUnrollLoops = false;
builder.LoopVectorize = true;
builder.SLPVectorize = true;
builder.populateModulePassManager(*pm);
pm->run(*module);
Also, I create a FunctionPassManager before I create the PassManager and add several passes like so:
FunctionPassManager *fpm = new FunctionPassManager(module);
// add several passes
fpm->doInitialization();
for (Function &f : *ir)
fpm->run(f);
fpm->doFinalization();
However, the performance is the same as running on the command line with -O1 whereas I can get much better performance on the command line using -O3. Any suggestions?
Follow the logic in the function AddOptimizationPasses in opt.cpp. This is the source of truth.
While looking into LLVM optimization I found this information on pass ordering and I think it's potentially telling for why someone might encounter this situation.
Depending on your language and the optimizations you're expecting, you may need to specifically tune your optimizing passes to your use-cases. In particular, the ordering of those passes may be important. For example, if your better -O3 code was optimizing completely un-optimized code or code that was already partially optimized by your program, it may just be that you need to re-order or duplicate some passes in order to the expected final result.
Given the specific wording here and the fact that Eli's answer was accepted I'm not 100% sure if this is what the OP was seeing but this knowledge may be helpful for others with similar questions who find this answer like I did.