Iterate over all structs in a module - c++

I'm writing a ModulePass and I need to analyze every struct defined in the given module.
I understand that identified structs with a name are inserted in the ValueSymbolTable, but how can I iterate over all the other structs (identified with no name and literal structs)?

The correct way of doing this is:
#include "llvm/IR/TypeFinder.h"
llvm::TypeFinder StructTypes;
StructTypes.run(M, true);
for (auto *STy : StructTypes)
STy->dump();
You should not use any private/opaque types (like LLVMContextImpl) whose headers are not published.

The LLVMContextImpl instance associated with your current context should have two data structures, one containing all the identified structs in the current context (whether or not they have an explicit name), and the other containing all the literal structs.
To get the LLVMContextImpl instance:
Module& M = ...
LLVMContextImpl* C = M.getContext().pImpl;
For the identified structs:
C->NamedStructTypes
For the literal structs:
C->AnonStructTypes
Both return iterable types (StringMap for the first, DenseMap for the second), allowing you to iterate over them and get all the types out.

bool runOnModule(Module &M) override
{
for(auto *S : M.getIdentifiedStructTypes())
{
S->dump();
}
return false;
}

Complementing Oak's answer, here's a more complete code example:
Module& M = ...
LLVMContextImpl* C = M.getContext().pImpl;
for (StringMap<StructType *>::iterator i = C->NamedStructTypes.begin(); i != C->NamedStructTypes.end(); ++i)
{
StructType *t = i->getValue();
t->dump(); fprintf(stderr, "\n");
}
LLVMContextImpl.h, being the header for a private implementation, isn't one of LLVM's public headers. You can get it from the LLVM source code — either copy it from there into your header search path or, for quick & dirty testing, do:
#include "/path/to/llvm/src/lib/VMCore/LLVMContextImpl.h"

Related

How to use custom C++ attributes with Clang libTooling without modifying Clang code?

I'm writing some kind of tool that extracts the interface definitions of C++ code.
In process of writing, I decided to restrict the parser to process only the code that was explicitly marked for processing, and I thought that C++ attributes are the best way to do it.
I'd prefer to add e.g. [[export]] annotations to entities I want to export, but I realised that libTooling is unable to see custom attributes without registering them in Clang code itself (I mean adding the attribute to tools/clang/include/clang/Basic/Attr.td).
Thus, my question is: is there a way to register the attribute without modifying that file (e.g. by registering the attribute programmatically or writing own Attr.td file)?
UPD: I'm using ASTMatchers library for source code analysis, so visitor-based approach probably does not work for me.
From what I can tell it is not possible to register custom attributes without directly modifying libtooling.
If you're willing to use pre-processor macros instead of attributes there is a workaround that I've done in the past. The basics are that we'll declare an empty macro, write a pre-processor callback to identify the location of the macro and store it in a queue, then in an AST visitor we'll visit records for either classes, methods, or variables, and check to see if preceeding the entity is our macro.
For the preprocessor you'll need to extend clang::PPCallbacks and implement the MacroExpands method.
void MyPreProcessorCallback::MacroExpands(const clang::Token& MacroNameTok, const clang::MacroDefinition&, const clang::SourceRange Range, const clang::MacroArgs* const Args)
{
// Macro is not named for some reason.
if(!MacroNameTok.isAnyIdentifier())
{ return; }
if(MacroNameTok.getIdentifierInfo()->getName() == "EXPORT")
{
// Save Range into a queue.
}
else
{
return;
}
// If you want arguments you can declare the macro to have varargs and loop
// through the tokens, you can have any syntax you want as they're raw tokens.
// /* Get the argument list for this macro, because it's a
// varargs function all arguments are stored in argument 0. */
// const ::clang::Token* token = Args->getUnexpArgument(0u);
// // All tokens for the argument are stored in sequence.
// for(; token->isNot(::clang::tok::eof); ++token)
// {
// }
}
Inside your RecursiveAstVisitor you can implement visitors that will pop off the top of the queue and check to see if the top macro is before in the translation unit. IIRC visitors of a type are all executed in order of declaration, so the queue should maintain the order. It is worth noting that all Decl's of a type are visited in order, so care has to be taken when distinguishing between function, variables, and classes.
bool MyAstVisitor::VisitFunctionDecl(::clang::FunctionDecl* const function)
{
if(::llvm::isa<::clang::CXXMethodDecl>(function))
{
// If you want to handle class member methods separately you
// can check here or implement `VisitCXXMethodDecl` and fast exit here.
}
if(ourExportTags.empty())
{
return true;
}
const ::clang::SourceLocation tagLoc = ourExportTags.front().getBegin();
const ::clang::SourceLocation declLoc = function->getBeginLoc();
if(getAstContext().getSourceManager().isBeforeInTranslationUnit(tagLoc, declLoc))
{
ourExportTags.pop_front();
// Handle export;
}
return true;
}
EDIT
I haven't used ASTMatchers before, but you could probably accomplish a similar result by writing a matcher, storing all of the declarations to a list, sorting based on location, and then comparing to the original export tag queue.
DeclarationMatcher matcher = functionDecl().bind("funcDecl");
class MyFuncMatcher : public clang::ast_matchers::MatchFinder::MatchCallback
{
public:
virtual void run(const clang::ast_matchers::MatchFinder::MatchResult& Result)
{
if(const FunctionDecl* func = Result.Nodes.getNodeAs<clang::FunctionDecl>("funcDecl"))
{
// Add to list for future processing
}
}
};
void joinTagsToDeclarations()
{
// Sort the declaration list
for(auto decl : myDeclList)
{
if(ourExportTags.empty())
{
break;
}
const ::clang::SourceLocation tagLoc = ourExportTags.front().getBegin();
const ::clang::SourceLocation declLoc = decl->getBeginLoc();
if(getAstContext().getSourceManager().isBeforeInTranslationUnit(tagLoc, declLoc))
{
ourExportTags.pop_front();
// Handle export;
}
}
}

C++ template to read value from member variable or member function

I am writing code generator and using flatbuffers for generating classes. The rest of the code generator will work with these classes in C++.
I have not been able to figure out how to keep the API consistent for reading data for two different types of classes that flatbuffer may generate. I am using the object api (testRecordT) in the example for whenever an object needs to be written to (and can be read back as well) and flatbuffer overlay for when the data can only be read from.
I have not been able to get any template or free functions to work to give me a consistent api that would work in both the cases.
Below is a snippet of what I am trying to get to work.
struct testRecordT {
int32_t field1;
std::string field2;
};
struct testRecord {
int32_t field1() const {
return 0;
// flatbuffer generated - return GetField<int32_t>(VT_FIELD1, 0);
}
const flatbuffers::String *field2() const {
return nullptr;
// flatbuffer generated - return GetPointer<const flatbuffers::String *>(VT_FIELD3);
}
};
void Test() {
testRecordT * members; // assume pointers are valid
testRecord * memberFunctions;
// Need to be able to create a read function/template that would work. This would simplify the code generation a lot. I can generate either one below, as long as it is consistent in both cases.
auto r = read(members->field1); // or read(members,field1)
auto v = read(memberFunctions->field1); // or read(memberFunctions,field1)
}
The read functions or template functions should be consistent. Any pointers or thoughts would be helpful. I am using C++17 with gcc 7.3.1 .
You can use std::invoke for this. It can both call functions or access members.
auto r = std::invoke(&testRecordT::field1, members);
auto v = std::invoke(&testRecord::field1, memberFunctions);
You can use std::invoke() for this problem.

reduce code duplication using macros

I was wondering if someone out there could give me a pointer to reducing duplication when coding.
im required to call a function a number of times to populate a structure, for example:
typedef struct {
uint16_t u16_a;
bool b_org;
char* c_c;
uint16_t u16_d;
} TEntry;
I need to populate each value of these with a function call, although the return values vary, the same function is used for all.
Would a macro be sufficient to create a template in some way, so that the return type would be dependent on the specific parameter ("string")
for example:
Trrelevant::Trrelevant()
{
TPoint* u_apoint = Insufficient::FindValue("A");
if (u_bpoint != NULL) {
int a = u_apoint;
}
TPoint* p_apoint = Insufficient::FindValue("borg");
if (p_bpoint != NULL) {
bool b = p_bpoint;
}
TPoint* p_cpoint = Insufficient::FindValue("C");
if (etc != NULL) {
char* c = etc;
}
TEct* etc = Insufficient::FindValue("ETC");
if (etc != ETC) {
etc = etc;
}
TEntry entry = {a,
b,
c,
etc};
}
this code is not compiled or accurate, im just trying to illustrate. Im weak in C++ and new to macros, but would anyone know a way to have a macro solve this?
Thank you for your time
You could do something like this, although I don't know what it really buys you.
#define QuickFindValue(NAME, TYPE, FUNCTION) \
TYPE *NAME##Value = Insufficient::FindValue(#NAME); \
if (NAME##Value == NULL) { FUNCTION; }
You would use it like so:
QuickFindValue(C, TPoint, {
char *c = CValue;
// Do stuff..
});
Recently I had the same kind of issue, I'm not sure what kind of source you use for your inputs.
Personnaly, I used XML as input.
Then I have A Builder class that parses the XML call a factory funciton to build every struct in the c++ using the data from the parser.
I don't think that MACRO or templtes would be of any help (or it would be a bad solution).
Note that an external resource (like xml) is nice if ever you want to change without recompiling.
Best

Referencing members of an object within a class type vector

Ok guys this problem has been bugging me all day and I cant seem find a solution. I know Its a long post but I would be very grateful for any help you can offer.
I am working on a chatbot program that reads in from a .dat file to populate a library of keywords. Taking an object orientated approach I defined a class called "Keyword", the class definition is shown below:
class Keyword
{
public:
//_Word holds keyword
vector<string> _Word;
//_Resp holds strings of responses
vector<string> _Resp;
//_Antymn holds words that are the opposite to keyword
vector<string> _Antymn;
// Constructor
Keyword()
{
// Clears members when new instance created
_Word.clear();
_Resp.clear();
_Antymn.clear();
}
};
Therefore every time a new keyword is found in the .dat file, a new instance of the class keyword must be created. To store all these instances of keyword I create another vector but this time of type Keyword and call it library:
typedef vector<Keyword> Lib;
Lib library;// this is the same as saying vector<Keyword> library
Now this is the problem I have: After a user inputs a string I need to check if that string contains a keyword from the library i.e. I need to see if the string in _Word appears in the user input. Looking at it from a hierarchy of vectors you have:
The top level --> libary //*starting point
--> Keyword
--> _Word
-->"A single string" <-- I want to reference this one
--> _Resp
-->"Many strings"
--> _Antymn
-->"Many strings"
Phew! I hope that made sense.
This is the code I started to write:
size_t User::findKeyword(Lib *Library)
{
size_t found;
int count = 0;
for(count = 0; count<Library->size(); count++)
{
found = _Input.find(Library->at(count)); // this line needs to reference _Word from each keyword instance within library
if(found!= string.npos)
return found;
}
return 0;
}
I have also tried to use the "operator[]" method but that doesnt seem to do what I want either.
Does anyone have any idea ? I would be very suprised if it couldn't be done. Thank you in advance.
A bunch of issues first:
identifiers beginning with an underscore followed by a capital
letter are reserved in any namespace
the clear() call in the Keyword constructor are pointless and possibly
harmful to optimization
Why is word_ a vector? I though it is one keyword.
struct Keyword
{
// real words as identifiers, no underscores
//anywhere if they are public
std::string word;
std::vector<std::string> respones;
std::vector<std::string> antonym;
};
typedef std::vector<Keyword> Lib;
/// finding a keyword
#include <algorithm>
Lib::iterator findKeyword(const Lib& l, const std::string& x) {
return std::find_if(begin(l), end(l),
[](const Keyword& kw) { return kw.word == x; })
// if stuck on non C++11 compiler use a Functor
}
You have to change your code to this:
for(count = 0; count<Library->size(); count++)
{
for(int j = 0; j < Library->at(count)._Word.size(); ++j){
found = _Input.find(Library->at(count)._Word[j]);
^^^^^^^^^
if(found!= string.npos)
return found;
}
}
in order to access the member variable and to iterate through your vector of strings. Library->at(count) is an object of class the Keyword.
I assume that _Input.find() takes a string as argument.
If your Keyword instance stores just one keyword, you might as well change it to string _Word, so that you wold not need the second loop.
for(count = 0; count<Library->size(); count++)
{
found = _Input.find(Library->at(count)._Word);
if(found!= string.npos)
return found;
}
And to enforce the other comments: you should not use the preliminary _-underscore in your variable names since they are reserved by the implementation.

Ensuring each struct has a unique ordinal number

I want to be able to create structs with each having a member that indicates the struct's (not the object's) order. There should be no run-time overhead, and I should be able to use the ordinal at compile-time.
The simples approach doesn't work because for some reason static variables don't work at compile-time:
int nextOrdinal() {
static int ordinal;
return ordinal++;
}
struct S1 {
enum ordinal = nextOrdinal();
}
struct S2 {
enum ordinal = nextOrdinal();
}
How the structs are created isn't important to me at this moment. The problem seems to be that it's not possible to retain a state at compile-time, am I correct?
--Inspired by Boost.units dimensional analysis.
There are no variables at compile-time (excepting the very special case of inside of a CTFE function)--everything must be constant. Further, allowing CTFE variables to go static and pollute the interpreted environment would be a pretty iffy design choice.
Part of the problem is that the compiler doesn't make any guarantees (to my knowledge) about the order of compilation of various code units and may even (in the future) be able to compile pieces in parallel. In general you need to treat compile-time programming as a very strict functional environment with small pockets of flexible mutability (inside CTFE functions). To ensure consistency, CTFE-able functions must be pure and "Ex­e­cuted ex­pres­sions may not ref­er­ence any global or local sta­tic vari­ables." http://dlang.org/function.html#interpretation
In short, I don't think there's any way to have the compiler store this state for you.
I don't know of a reliable way to do this, but if you want to order them based on their location in the source file you could do this:
import std.conv;
import std.stdio;
size_t nextOrdinal(size_t line = __LINE__)()
{
return line;
}
struct S1 {
enum ordinal = nextOrdinal();
}
struct S2 {
enum ordinal = nextOrdinal();
}
void main()
{
writeln(S1.ordinal);
writeln(S2.ordinal);
}
If you have multiple files that call nextOrdinal you could end up with struct definitions which have the same ordinal value. You might consider encoding the file name too:
size_t nextOrdinal(string file = __FILE__, size_t line = __LINE__)()
{
size_t res;
foreach (ch; file)
res += ch;
return res + line;
}