Segfault when derefing iterator - c++

I'm following the example here. My complete code is
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/IR/InstIterator.h"
using namespace llvm;
namespace {
struct Hello2 : public FunctionPass {
static char ID; // Pass identification, replacement for typeid
Hello2() : FunctionPass(ID) {}
bool runOnFunction(Function &F) override {
inst_iterator iter = inst_begin(F); // causes a segfault
return false;
}
};
}
char Hello2::ID = 0;
static RegisterPass<Hello2> Y("hello2", "Hello World Pass");
following the "HelloWorld" pass example explained here. When I followed the example exactly, it worked fine, but with the modified pass code above, I get the segfault when I run opt.
(I'm using the same "hello.c" file for input as in the "HelloWorld" pass example, compiling it with clang, running make and calling my library with opt just as in the example.)
What's causing my segfault, and is there any way to test for it/avoid it?
EDIT
I traced the segfault to line 61 of InstIterator.h: BI = BB->begin();. When my code reaches that point, BBs is non-NULL, but BB=BBs->begin() is NULL. Thus the dereference of BB results in a segfault. The question of why BB is NULL (and why the constructor doesn't check for that) remains.

This problem was resolved when my system did an automatic update and got a new version of several llvm-3.5 packages (today). Previously, opt --version returned
LLVM version 3.5
Optimized build.
Built Mar 23 2014 (21:41:30).
Default target: x86_64-pc-linux-gnu
Host CPU: corei7
now it returns
LLVM version 3.5.0
Optimized build.
Built Jan 27 2015 (00:14:48).
Default target: x86_64-pc-linux-gnu
Host CPU: corei7

Related

How can I speed up compile times when developing a Clang-based tool?

I'm developing a Clang-based tool
(i.e., a ClangTool),
but the time to recompile a source file is longer than I would like.
Specifically, for the small example program below, it takes about 7
seconds to compile this one file, and I've only #included a bare
minimum. Once I start pulling in AST matchers, compile times climb to
20 seconds and more. That's a lot of friction in the development cycle,
especially when learning a new API and hence fixing a large number of
syntax errors.
I found some slides called
Building Clang/LLVM efficiently
by Tilmann Scheller from a talk at EuroLLVM 2015 which show modest
improvement (less than a factor of 2) by changing which compiler is used
(specifically, compiling with a clang that has been compiled by
gcc). I'm ignoring the linking improvements because linking is
tolerably fast in my case.
I tried using GCC precompiled headers
(PCH), but the results are mixed (details below). While compilation with PCH takes
about half as long, generating the PCH in the first place takes more
than twice as long as ordinary compilation, produces a huge PCH file
(300-500 MB), and is somewhat of a hassle to use (I have to reorganize
#includes across my project, manage additional dependencies and files,
and carefully monitor the process to be sure the PCHs are being used).
Ideally, I would like to get compile times under one second. After all,
my source file is only about 100 lines of code (LOC). Now, the
preprocessor output has about 200K LOC, and the word template appears
in over 5000 of those lines, so I'm fully aware the compiler is doing
more than is immediately apparent. But my code is not directly using
more than a tiny fraction of that.
As a point of comparison, when using the Python bindings, it takes about
0.05 seconds to run a script of comparably minimal complexity. I
understand that the C++ bindings have richer functionality, but I'm not
using (yet?) anything beyond what the Python bindings have, yet suffer
orders of magnitude more delay due to using C++ instead of Python. (No,
I don't want to switch to Python.)
Might there be, or could one reasonably create, an alternative set of
C++ bindings with capabilities similar to the Python interface but
vastly improved compilation time? Or is there some other feasible way
to hit the goal of compilation in under a second?
I am aware of libclang,
the C bindings for Clang. But I don't want to write my tool in C any
more than I want to write it in Python (that is, not at all). If there
were something like libclang, but with an idiomatic C++ interface,
that might solve the problem, but I don't see anything like that.
Some measurements of different variations, median-of-5 measurement with GCC-9.3:
Options
No PCH
Gen PCH
Use PCH
PCH Size
g++
6.8 s
14 s
2.5 s
342 MB
g++ -O2
6.9 s
14 s
2.5 s
345 MB
g++ -g
8.6 s
18 s
4.1 s
472 MB
g++ -g -O2
8.9 s
18 s
4.3 s
478 MB
The g++ -g configuration is the one I care most about.
#HolyBlackCat suggested trying with clang as the compiler. Here are the times for Clang+LLVM-14.0.0:
Options
No PCH
Gen PCH
Use PCH
PCH Size
clang
5.6 s
5.1 s
2.3 s
62 MB
clang -O2
5.6 s
5.2 s
2.5 s
62 MB
clang -g
5.6 s
5.2 s
2.4 s
62 MB
clang -g -O2
5.7 s
5.1 s
2.6 s
62 MB
That's certainly an improvement over g++, and may well be what I do going forward, although it falls short of my sub-second goal.
For completeness, I also measured GCC-12.1, and found its to be about 2% worse than GCC-9.3 for all measured times.
My setup:
Using binary distribution of clang+llvm-14.0.0 downloaded from github.com.
Compiling with GCC-9.3.0 on Linux Mint 20.1 for x86_64.
Quad-core Intel i5-6600K CPU # 3.50HGz.
8 GB RAM.
Linux is running inside VMware Workstation 12. (The VM is not the
problem. Many C++ projects compile very quickly in the VM, even
faster than compilation under the Windows host.)
Host is Windows 10 with 32 GB physical RAM.
Source file to compile:
// print-tu.cc
// Print contents of a Translation Unit.
// clang
#include "clang/AST/ASTConsumer.h" // clang::ASTConsumer
#include "clang/AST/ASTContext.h" // clang::ASTContext
#include "clang/AST/DeclBase.h" // clang::Decl
#include "clang/AST/DeclGroup.h" // clang::DeclGroupRef
#include "clang/Basic/SourceLocation.h" // clang::SourceLocation
#include "clang/Frontend/CompilerInstance.h" // clang::CompilerInstance
#include "clang/Frontend/CompilerInvocation.h" // clang::CompilerInvocation
#include "clang/Frontend/FrontendAction.h" // clang::FrontendAction, clang::ASTFrontendAction
#include "clang/Tooling/CommonOptionsParser.h" // clang::tooling::CommonOptionsParser
#include "clang/Tooling/Tooling.h" // clang::tooling::ClangTool, clang::tooling::newFrontendActionFactory
// llvm
#include "llvm/ADT/StringRef.h" // llvm::StringRef
#include "llvm/Support/CommandLine.h" // llvm::cl::extrahelp
// libc++
#include <iostream> // std::cout, etc.
// libc
#include <assert.h> // assert
using clang::tooling::CommonOptionsParser;
using clang::tooling::ClangTool;
using clang::tooling::newFrontendActionFactory;
using clang::ASTConsumer;
using clang::ASTContext;
using clang::ASTFrontendAction;
using clang::CompilerInstance;
using clang::CompilerInvocation;
using clang::Decl;
using clang::DeclGroupRef;
using clang::DiagnosticConsumer;
using clang::FileManager;
using clang::FrontendAction;
using clang::PCHContainerOperations;
using clang::SourceLocation;
using clang::TargetOptions;
using llvm::StringRef;
using std::cout;
// Apply a custom category to all command-line options so that they are the
// only ones displayed.
static llvm::cl::OptionCategory MyToolCategory("my-tool options");
// CommonOptionsParser declares HelpMessage with a description of the common
// command-line options related to the compilation database and input files.
// It's nice to have this help message in all tools.
static llvm::cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
// A help message for this specific tool can be added afterwards.
static llvm::cl::extrahelp MoreHelp("\nMore help text...\n");
// Implement ASTConsumer
class MyASTConsumer : public ASTConsumer {
public: // data
// The context for the TU, as established by 'Initialize'.
ASTContext *m_astContext;
public: // methods
MyASTConsumer()
: m_astContext(NULL)
{}
// This is called at the start of the TU.
virtual void Initialize(ASTContext &ctx) override
{
m_astContext = &ctx;
}
// This is called at the end of the TU.
virtual void HandleTranslationUnit(ASTContext &ctx) override
{
cout << "in HandleTranslationUnit\n";
assert(m_astContext == &ctx);
}
virtual bool HandleTopLevelDecl(DeclGroupRef declGroup) override
{
cout << "in HandleTopLevelDecl\n";
assert(m_astContext);
for (Decl const *decl : declGroup) {
SourceLocation loc = decl->getLocation();
cout << " decl at "
<< loc.printToString(m_astContext->getSourceManager()) << '\n';
cout << " kind name: " << decl->getDeclKindName() << '\n';
}
return true;
}
};
// Implement the FrontendAction interface.
//
// Inheriting from ASTFrontendAction provides definitions for
// 'ExecuteAction' and 'usesPreprocessorOnly'.
class MyFrontendAction : public ASTFrontendAction {
public:
virtual std::unique_ptr<ASTConsumer> CreateASTConsumer(
CompilerInstance &ci,
StringRef inFile) override
{
cout << "in CreateASTConsumer\n";
TargetOptions const &targetOptions = ci.getTargetOpts();
cout << " target options triple: " << targetOptions.Triple << '\n';
cout << " inFile: " << inFile.str() << '\n';
return std::unique_ptr<ASTConsumer>(new MyASTConsumer());
}
};
int main(int argc, const char **argv)
{
auto ExpectedParser = CommonOptionsParser::create(argc, argv, MyToolCategory);
if (!ExpectedParser) {
// Fail gracefully for unsupported options.
llvm::errs() << ExpectedParser.takeError();
return 2;
}
CommonOptionsParser& OptionsParser = ExpectedParser.get();
ClangTool Tool(OptionsParser.getCompilations(),
OptionsParser.getSourcePathList());
// Arrange to run 'MyFrontendAction' on each TU.
return Tool.run(newFrontendActionFactory<MyFrontendAction>().get());
}
// EOF
Compilation command (without -g or -O2):
g++ -c -o print-tu.o print-tu.cc $(llvm-config --cxxflags)

Undefined symbol when Calling function in C++ Random Char added

In NodeJS I'm building an interface to a shared object in C. I have the following code:
#include <node.h>
#include "libcustom_encryption.h"
namespace demo {
using v8::Exception;
using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::Number;
using v8::Object;
using v8::String;
using v8::Value;
//
// This is the implementation of the "add" method
// Input arguments are passed using the
// const FunctionCallbackInfo<Value>& args struct
//
void DeviceGetVersion(const FunctionCallbackInfo<Value>& args)
{
char ver[10] = {0};
unsigned int ver_size = 0;
device_get_version(ver, ver_size);
Isolate* isolate = args.GetIsolate();
//
// 1. Save the value in to a isolate thing
//
Local<Value> str = String::NewFromUtf8(isolate, "Test");
//
// 2. Set the return value (using the passed in
// FunctionCallbackInfo<Value>&)
//
args.GetReturnValue().Set(str);
}
void Init(Local<Object> exports)
{
NODE_SET_METHOD(exports, "devicegetversion", DeviceGetVersion);
}
NODE_MODULE(addon, Init)
}
node-gyp configure: works
node-gyp build: works
LD_LIBRARY_PATH=libs/ node index.js: doesn't work
I get the following error:
node: symbol lookup error: /long_path/build/Release/app.node: undefined symbol: _Z18device_get_versionPcS_Phj
The when the function is called it gets prepended and appended with random characters. I'm assuming this is random data are some noise from memory. It seams as if the size brakes to call the function is bigger then it should.
I'm not that experienced with mixing C++ and C, I would love to get an explanation on what is happening hear.
Tech specs:
GCC Version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
NodeJS Version: v6.2.0
the function is called it gets prepended and appended with random characters
It is called
name mangling that happens in C++.
The actual error here is that compiled module can not link to function device_get_version().
Your possible actions:
add the implementation of device_get_version to your module
properly link this function
simply remove that line and the error will disappear
UPD.
device_get_version may actually be a C function which is treated as a C++ function (you can tell it by mangled name it has).
Make sure your function declared as
extern "C" {
void device_get_version(...);
}

What is the difference between object.operator bool() and (bool) object?

I have a class for which I have overloaded the operator bool explicitly like this :-
class Foo {
explicit operator bool() {
// return_something_here
}
};
However, when I run the following two in gdb I get :-
gdb) p fooobj.operator bool()
$7 = true
gdb) p (bool)(fooobj)
$8 = false
What's the difference between the two invocations and why do they return different things?
Edit :- I'm using the clang compiler.
Note :- The second value (false) is the correct value that I want to be returned using the first syntax. I'm using a codegen so I don't have complete control over what c++ gets generated in case anyone is curious why I don't just use the second syntax.
Even in that case, the difference between the two would still be an unanswered question.
I just ran a few quick tests, and it appears to be that gdb doesn't handle code compiled with clang well. Here is a test program:
#include <iostream>
using namespace std;
class Foo {
public:
Foo() : m_Int(0) {}
operator bool() {
return true; // also tried false here
}
private:
int m_Int;
};
int main()
{
Foo f;
if (f.operator bool()) cout << "operator bool is true.\n";
if ((bool)f) cout << "(bool)f is true.\n";
return 0;
}
When the binary is run, the output is as expected, i.e. (bool)f is the same as f.operator bool(), regardless of the compiler. However, if gdb is used with code build using g++, then the p command behaves correctly. Yet when gdb is run on code built using clang++, I get:
(gdb) print f.operator bool()
Couldn't find method Foo::operatorbool
(gdb)
I'm running clang v. 3.4, gcc v. 4.8.4 on Ubuntu 14.04.
In fact, a quick search revealed this: Is it possible to debug a gcc-compiled program using lldb, or debug a clang-compiled program using gdb?. So, I tried lldb, and it worked as expected. This is consistent with the comment that was added as I was investigating.

Load LLVM bitcode into module: cannot convert ‘std::unique_ptr<llvm::Module>’ to ‘llvm::Module*’

I try to compile the minimal example from here
#include <llvm/IR/Module.h>
#include <llvm/IRReader/IRReader.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/Support/SourceMgr.h>
using namespace llvm;
int main()
{
LLVMContext context;
SMDiagnostic error;
Module *m = parseIRFile("hello.bc", error, context);
if(m)
{
m->dump();
}
return 0;
}
using
g++ myFile.cpp `llvm-config --cxxflags --ldflags --libs all --system-libs` -std=c++11 -ldl -lpthread
and get
error: cannot convert ‘std::unique_ptr’ to ‘llvm::Module*’ in initialization
All examples and the llvm source itself everywhere uses llvm::Module *; so why do I get this error?
Note I use: LLVMVersion=3.6.0svn LLVM_CONFIGTIME= Thu Dec 18 10:51:37 CET 2014
Is it a problem with the 3.6 trunk? Should I opt for 3.5 branch?
Thx
Alex
The issue is that parseIRFile gives you back a unique_ptr<Module> and there is no implicit conversion from unique_ptr<Module> to Module* (which is good!!). To fix, just use the correct type:
std::unique_ptr<Module> m = parseIRFile(..);
auto m = parseIRFile(..); // avoid all future type issues
Using unique_ptr for memory management is much smarter than using raw pointers - and this interface makes clear that you are responsible for ownership of m. This way, you don't have to remember to delete it.
If you really really want to use a raw pointer, just call release on the returned object so that it no longer owns it:
Module* m = parseIRFile(..).release();
I only present that for completeness though - really prefer to keep your object a unique_ptr.

Googletest Parametrized tests crash

I've just learned about value-parametrized unit tests in googletest and would like to use them in my project.
I wrote a simple parametrized test.
Header:
#include <gtest/gtest.h>
namespace EnsembleClustering {
class ParametrizedGTest: public testing::TestWithParam<int> {
public:
ParametrizedGTest();
virtual ~ParametrizedGTest();
};
} /* namespace EnsembleClustering */
Source:
#include "ParametrizedGTest.h"
namespace EnsembleClustering {
ParametrizedGTest::ParametrizedGTest() {
// TODO Auto-generated constructor stub
}
ParametrizedGTest::~ParametrizedGTest() {
// TODO Auto-generated destructor stub
}
TEST_P(ParametrizedGTest, testParameter) {
int n = GetParam();
EXPECT_EQ(n, GetParam());
}
INSTANTIATE_TEST_CASE_P(ParametrizedGTestInstance,
ParametrizedGTest,
::testing::Values(100));
} /* namespace EnsembleClustering */
Now, when I run googletest as usual, the program crashes without any output. The gdb stack trace is
EnsembleClustering-D [C/C++ Application]
EnsembleClustering
Thread [1] (Suspended : Signal : EXC_BAD_ACCESS:Could not access memory)
__gnu_debug::_Safe_sequence_base::_M_attach_single() at 0x100528add
__gnu_debug::_Safe_sequence_base::_M_attach() at 0x100528a74
__gnu_debug::_Safe_iterator_base::_M_attach() at 0x100528bfe
__gnu_debug::_Safe_iterator_base::_Safe_iterator_base() at safe_base.h:90 0x1000016e9
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<testing::internal::ParameterizedTestCaseInfoBase**, std::__cxx1998::vector<testing::internal::ParameterizedTestCaseInfoBase*, std::allocator<testing::internal::ParameterizedTestCaseInfoBase*> > >, std::__debug::vector<testing::internal::ParameterizedTestCaseInfoBase*, std::allocator<testing::internal::ParameterizedTestCaseInfoBase*> > >::_Safe_iterator() at safe_iterator.h:154 0x100002e9c
std::__debug::vector<testing::internal::ParameterizedTestCaseInfoBase*, std::allocator<testing::internal::ParameterizedTestCaseInfoBase*> >::begin() at vector:207 0x100001fbe
testing::internal::ParameterizedTestCaseRegistry::GetTestCasePatternHolder<EnsembleClustering::ParametrizedGTest>() at gtest-param-util.h:574 0x1000025b0
EnsembleClustering::ParametrizedGTest_testParameter_Test::AddToRegistry() at ParametrizedGTest.cpp:22 0x100001d3f
__static_initialization_and_destruction_0() at ParametrizedGTest.cpp:22 0x100001349
_GLOBAL__sub_I_ParametrizedGTest.cpp() at ParametrizedGTest.cpp:32 0x100001424
<...more frames...>
gdb
Am I doing something wrong or is this a bug in googletest? Can you reproduce this error?
EDIT: I am on Mac OS X 10.8.
From looking at the source code of gtest the only case if there are no parametrized tests available is on Windows using VC7.1 with disabled exceptions:
// We don't support MSVC 7.1 with exceptions disabled now. Therefore
// all the compilers we care about are adequate for supporting
// value-parameterized tests.
#define GTEST_HAS_PARAM_TEST 1
So, you'll need to check how your MinGW was built and probably update it? And can you run the gtest unit tests to see if they execute the typed parameters test?
More information on MinGW:
On their FAQ they report that when using MinGW the following compile option for building gtest is required: PATH/TO/configure CC="gcc -mno-cygwin" CXX="g++ -mno-cygwin".
Complete Example:
#include <gtest/gtest.h>
namespace EnsembleClustering {
class ParametrizedGTest: public testing::TestWithParam<int> {
public:
ParametrizedGTest();
virtual ~ParametrizedGTest();
};
ParametrizedGTest::ParametrizedGTest() {
}
ParametrizedGTest::~ParametrizedGTest() {
}
TEST_P(ParametrizedGTest, testParameter) {
int n = GetParam();
EXPECT_EQ(n, GetParam());
}
INSTANTIATE_TEST_CASE_P(ParametrizedGTestInstance,
ParametrizedGTest,
::testing::Values(100));
} /* namespace EnsembleClustering */
int main(int argc, char* argv[]) {
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
I compiled this code using the following compiler call on Mac OS X 10.8:
g++ -IGTEST_INCLUDE_DIR -LGTEST_LIB_DIR -lgtest -o tt2 tt2.cpp
Where GTEST_INCLUDE_DIR and GTEST_LIB_DIR are the path where header and library files are stored. When you compile and execute, what happens?
Thanks #ChristianStaudt and #grundprinzip
I would like to point future readers to following link that explains this problem.
http://libcwd.sourceforge.net/reference-manual/group__enable__glibcxx__debug.html
This is a link to the documentation for GLIBCXX_DEBUG flag. It states the following important points.
"Note that this flag changes the sizes and behavior of standard class templates such as std::vector, and therefore you can only link code compiled with debug mode and code compiled without debug mode if no instantiation of a container is passed between the two translation units."
"When to use it
It is a good idea to use this if you suspect problems related to iterators."
Now, if you look at the stack trace posted originally, the crash happens due to vector<testing::internal::ParameterizedTestCaseInfoBase*> as gtest tries to get an iterator on this container, using begin() method.
In my case, gtest lib was compiled without GLICXX_DEBUG flag, but my test code was compiled with this flag. The test code worked like a charm when I compiled without this flag.