Is there any way that I could define an unordered_map<*,*> var and depending a case or other redefine it with the appropriate types?
I'm reading some binaries files and the format is different for each so depending the formats it can by <int, string>, <short, string>, <int, int>, etc..
The only way I can think of is to define it <char *, char *> but I would have to define the hashing and other thing to work like that.
Is there any other option?
EDIT. ADD MORE CONTEXT FOR THE PROBLEM:
I will iterate another lists and get the values from the ordered_maps, I will know what type of data I'm using for the key and use that to generate a JSON string as result.
For more context the format of the files are like these:
INT number of fields to use. Example: 3
-- now there is a for from 1 to 3 as we have 3 fields
CHAR type of data (1 = int8, 2 = int16, 3 = int32, 4=string)
STRING name of the field
STRING alias of the field
-- end for
-- now I do a while not EOF
-- for each field
read value from file (int8, int16, int32, string) depending the type of field
first item of the for will be the KEY
if item != first add the value to an unoredered_map using the first as key
-- end for
-- end while
What are you going to store inside the map and how are you going to choose it?
There are two practical solution to your problem:
Parametric Polymorphism
This is how you should try to solve your problem in the first place. By keeping the arguments of your unordered_map generic.
This is mostly done by having a structure like
class Reader {
virtual void readFile(const std::string& name) = 0;
};
template<typename K, typename V>
class RealReader {
private:
std::unordered_map<K,V> data;
public:
void readFile(const std::string& name) override {
K key = // read key;
V value = // read value
data[key] = value;
}
};
Subtype Polymorphism
Define your own Key and/or Value classes so that you can define a std::unordered_map<Key*,Value*> and then subtype these custom types with your required types.
Without knowing how these are going to be used it's difficult to tell what's best.
I ended up using a self defined type and void * for the data.
So I set the type of the var in the struct and the data for it.
Here is the result:
struct fieldVariant {
char type;
void * data;
fieldVariant(char _type, void * _data) {
type = _type;
data = _data;
}
};
struct fieldHash {
inline size_t operator()(const fieldVariant * val) const
{
unsigned long h = 0;
unsigned long varSize = 0;
switch (val->type) {
case INT8:
varSize = 1;
break;
case INT16:
varSize = 2;
break;
case INT32:
varSize = 4;
break;
case INT64:
varSize = 8;
break;
case INT128:
varSize = 16;
break;
case CHAR2:
varSize = ((string *)val->data)->length();
break;
}
for (int i=0; i < varSize; i++)
h = 5 * h + *(char *)(val->data + i);
return size_t(h);
}
};
struct fieldEql {
inline bool operator()(const fieldVariant *s1,const fieldVariant *s2) const {
unsigned long varSize = 0;
switch (s1->type) {
case INT8:
varSize = 1;
break;
case INT16:
varSize = 2;
break;
case INT32:
varSize = 4;
break;
case INT64:
varSize = 8;
break;
case INT128:
varSize = 16;
break;
case CHAR2:
return *((string *)s1->data) == *((string *)s2->data);
}
return memcmp(s1->data, s2->data, varSize) == 0;
}
};
unordered_map<fieldVariant *, fieldVariant *, fieldHash, fieldEql> data;
void add(fieldVariant * key, fieldVariant * value) {data[key] = value;};
I have an instance of CXCursor of kind CXCursor_CXXMethod. I want to find out if the function is const or volatile, for example:
class Foo {
public:
void bar() const;
void baz() volatile;
void qux() const volatile;
};
I could not find anything useful in the documentation of libclang. I tried clang_isConstQualifiedType and clang_isVolatileQualifiedType but these always seem to return 0 on C++ member function types.
I can think of two approaches:
Using the libclang lexer
The code which appears in this SO answer works for me; it uses the libclang tokenizer to break a method declaration apart, and then records any keywords outside of the method parentheses.
It does not access the AST of the code, and as far as I can tell doesn't involve the parser at all. If you are sure the code you investigate is proper C++, I believe this approach is safe.
Disadvantages: This solution does not appear to take into account preprocessing directives, so the code has to be processed first (e.g., passed through cpp).
Example code (the file to parse must be the first argument to your program, e.g. ./a.out bla.cpp):
#include "clang-c/Index.h"
#include <string>
#include <set>
#include <iostream>
std::string GetClangString(CXString str)
{
const char* tmp = clang_getCString(str);
if (tmp == NULL) {
return "";
} else {
std::string translated = std::string(tmp);
clang_disposeString(str);
return translated;
}
}
void GetMethodQualifiers(CXTranslationUnit translationUnit,
std::set<std::string>& qualifiers,
CXCursor cursor) {
qualifiers.clear();
CXSourceRange range = clang_getCursorExtent(cursor);
CXToken* tokens;
unsigned int numTokens;
clang_tokenize(translationUnit, range, &tokens, &numTokens);
bool insideBrackets = false;
for (unsigned int i = 0; i < numTokens; i++) {
std::string token = GetClangString(clang_getTokenSpelling(translationUnit, tokens[i]));
if (token == "(") {
insideBrackets = true;
} else if (token == "{" || token == ";") {
break;
} else if (token == ")") {
insideBrackets = false;
} else if (clang_getTokenKind(tokens[i]) == CXToken_Keyword &&
!insideBrackets) {
qualifiers.insert(token);
}
}
clang_disposeTokens(translationUnit, tokens, numTokens);
}
int main(int argc, char *argv[]) {
CXIndex Index = clang_createIndex(0, 0);
CXTranslationUnit TU = clang_parseTranslationUnit(Index, 0,
argv, argc, 0, 0, CXTranslationUnit_None);
// Set the file you're interested in, and the code location:
CXFile file = clang_getFile(TU, argv[1]);
int line = 5;
int column = 6;
CXSourceLocation location = clang_getLocation(TU, file, line, column);
CXCursor cursor = clang_getCursor(TU, location);
std::set<std::string> qualifiers;
GetMethodQualifiers(TU, qualifiers, cursor);
for (std::set<std::string>::const_iterator i = qualifiers.begin(); i != qualifiers.end(); ++i) {
std::cout << *i << std::endl;
}
clang_disposeTranslationUnit(TU);
clang_disposeIndex(Index);
return 0;
}
Using libclang's Unified Symbol Resolution (USR)
This approach involves using the parser itself, and extracting qualifier information from the AST.
Advantages: Seems to work for code with preprocessor directives, at least for simple cases.
Disadvantages: My solution parses the USR, which is undocumented, and might change in the future. Still, it's easy to write a unit-test to guard against that.
Take a look at $(CLANG_SRC)/tools/libclang/CIndexUSRs.cpp, it contains the code that generates a USR, and therefore contains the information required to parse the USR string. Specifically, lines 523-529 (in LLVM 3.1's source downloaded from www.llvm.org) for the qualifier part.
Add the following function somewhere:
void parseUsrString(const std::string& usrString, bool* isVolatile, bool* isConst, bool *isRestrict) {
size_t bangLocation = usrString.find("#");
if (bangLocation == std::string::npos || bangLocation == usrString.length() - 1) {
*isVolatile = *isConst = *isRestrict = false;
return;
}
bangLocation++;
int x = usrString[bangLocation];
*isConst = x & 0x1;
*isVolatile = x & 0x4;
*isRestrict = x & 0x2;
}
and in main(),
CXString usr = clang_getCursorUSR(cursor);
const char *usr_string = clang_getCString(usr);
std::cout << usr_string << "\n";
bool isVolatile, isConst, isRestrict;
parseUsrString(usr_string, &isVolatile, &isConst, &isRestrict);
printf("restrict, volatile, const: %d %d %d\n", isRestrict, isVolatile, isConst);
clang_disposeString(usr);
Running on Foo::qux() from
#define BLA const
class Foo {
public:
void bar() const;
void baz() volatile;
void qux() BLA volatile;
};
produces the expected result of
c:#C#Foo#F#qux#5
restrict, volatile, const: 0 1 1
Caveat: you might have noticed that libclang's source suggets my code should be isVolatile = x & 0x2 and not 0x4, so it might be the case you should replace 0x4 with 0x2. It's possible my implementation (OS X) has them replaced.
I have a couple of array's:
const string a_strs[] = {"cr=1", "ag=2", "gnd=U", "prl=12", "av=123", "sz=345", "rc=6", "pc=12345"};
const string b_strs[] = {"cr=2", "sz=345", "ag=10", "gnd=M", "prl=11", "rc=6", "cp=34", "cv=54", "av=654", "ct=77", "pc=12345"};
which i then need to parse out for '=' and then put the values in the struct. (the rc key maps to the fc key in the struct), which is in the form of:
struct predict_cache_key {
pck() :
av_id(0),
sz_id(0),
cr_id(0),
cp_id(0),
cv_id(0),
ct_id(0),
fc(0),
gnd(0),
ag(0),
pc(0),
prl_id(0)
{ }
int av_id;
int sz_id;
int cr_id;
int cp_id;
int cv_id;
int ct_id;
int fc;
char gnd;
int ag;
int pc;
long prl_id;
};
The problem I am encountering is that the array's are not in sequence or in the same sequence as the struct fields. So, I need to check each and then come up with a scheme to put the same into the struct.
Any help in using C or C++ to solve the above?
Probably I didn't get it correctly, but obvious solutions is to split each array element into key and value and then write lo-o-ong if-else-if-else ... sequence like
if (!strcmp(key, "cr"))
my_struct.cr = value;
else if (!strcmp(key, "ag"))
my_struct.ag = value;
...
You can automate the creation of such sequence with the help of C preprocessor, e.g.
#define PROC_KEY_VALUE_PAIR(A) else if (!strcmp(key,#A)) my_struct.##A = value
Because of leading else you write the code this way:
if (0);
PROC_KEY_VALUE_PAIR(cr);
PROC_KEY_VALUE_PAIR(ag);
...
The only problem that some of you struct fields have _id sufffix - for them you'd need to create a bit different macro that will paste _id suffix
This shouldn't be too hard. Your first problem is that you don't have a fixed sized array, so you'd have to pass the size of the array, or what I'd prefer you make the arrays NULL-terminated, e.g.
const string a_strs[] = {"cr=1", "ag=2", "gnd=U", NULL};
Then I would write a (private) helper function that parse the string:
bool
parse_string(const string &str, char *buffer, size_t b_size, int *num)
{
char *ptr;
strncpy(buffer, str.c_str(), b_size);
buffer[b_size - 1] = 0;
/* find the '=' */
ptr = strchr(buffer, '=');
if (!ptr) return false;
*ptr = '\0';
ptr++;
*num = atoi(ptr);
return true;
}
then you can do what qrdl has suggested.
in a simple for loop:
for (const string *cur_str = array; *cur_str; cur_str++)
{
char key[128];
int value = 0;
if (!parse_string(*cur_string, key, sizeof(key), &value)
continue;
/* and here what qrdl suggested */
if (!strcmp(key, "cr")) cr_id = value;
else if ...
}
EDIT: you should probably use long instead of int and atol instead of atoi, because your prl_id is of the type long. Second if there could be wrong formated numbers after the '=', you should use strtol, which can catch errors.
I've written some little code that allows you to initialize fields, without having to worry too much about whether your fields are going out of order with the initialization.
Here is how you use it in your own code:
/* clients using the above classes derive from lookable_fields */
struct predict_cache_key : private lookable_fields<predict_cache_key> {
predict_cache_key(std::vector<std::string> const& vec) {
for(std::vector<std::string>::const_iterator it = vec.begin();
it != vec.end(); ++it) {
std::size_t i = it->find('=');
set_member(it->substr(0, i), it->substr(i + 1));
}
}
long get_prl() const {
return prl_id;
}
private:
/* ... and define the members that can be looked up. i've only
* implemented int, char and long for this answer. */
BEGIN_FIELDS(predict_cache_key)
FIELD(av_id);
FIELD(sz_id);
FIELD(gnd);
FIELD(prl_id);
END_FIELDS()
int av_id;
int sz_id;
char gnd;
long prl_id;
/* ... */
};
int main() {
std::string const a[] = { "av_id=10", "sz_id=10", "gnd=c",
"prl_id=1192" };
predict_cache_key haha(std::vector<std::string>(a, a + 4));
}
The framework is below
template<typename T>
struct entry {
enum type { tchar, tint, tlong } type_name;
/* default ctor, so we can std::map it */
entry() { }
template<typename R>
entry(R (T::*ptr)) {
set_ptr(ptr);
}
void set_ptr(char (T::*ptr)) {
type_name = tchar;
charp = ptr;
};
void set_ptr(int (T::*ptr)) {
type_name = tint;
intp = ptr;
};
void set_ptr(long (T::*ptr)) {
type_name = tlong;
longp = ptr;
};
union {
char (T::*charp);
int (T::*intp);
long (T::*longp);
};
};
#define BEGIN_FIELDS(CLASS) \
friend struct lookable_fields<CLASS>; \
private: \
static void init_fields_() { \
typedef CLASS parent_class;
#define FIELD(X) \
lookable_fields<parent_class>::entry_map[#X].set_ptr(&parent_class::X)
#define END_FIELDS() \
}
template<typename Derived>
struct lookable_fields {
protected:
lookable_fields() {
(void) &initializer; /* instantiate the object */
}
void set_member(std::string const& member, std::string const& value) {
typename entry_map_t::iterator it = entry_map.find(member);
if(it == entry_map.end()) {
std::ostringstream os;
os << "member '" << member << "' not found";
throw std::invalid_argument(os.str());
}
Derived * derived = static_cast<Derived*>(this);
std::istringstream ss(value);
switch(it->second.type_name) {
case entry_t::tchar: {
/* convert to char */
ss >> (derived->*it->second.charp);
break;
}
case entry_t::tint: {
/* convert to int */
ss >> (derived->*it->second.intp);
break;
}
case entry_t::tlong: {
/* convert to long */
ss >> (derived->*it->second.longp);
break;
}
}
}
typedef entry<Derived> entry_t;
typedef std::map<std::string, entry_t> entry_map_t;
static entry_map_t entry_map;
private:
struct init_helper {
init_helper() {
Derived::init_fields_();
}
};
/* will call the derived class's static init function */
static init_helper initializer;
};
template<typename T>
std::map< std::string, entry<T> > lookable_fields<T>::entry_map;
template<typename T>
typename lookable_fields<T>::init_helper lookable_fields<T>::initializer;
It works using the lesser known data-member-pointers, which you can take from a class using the syntax &classname::member.
Indeed, like many answered, there is a need to separate the parsing problem from the object construction problem. The Factory pattern is suited well for that.
The Boost.Spirit library also solves the parse->function problem in a very elegant way (uses EBNF notation).
I always like to separate the 'business logic' from the framework code.
You can achieve this by start writing "what you want to do" in a very convenient way and work to "how do you do it" from there.
const CMemberSetter<predict_cache_key>* setters[] =
#define SETTER( tag, type, member ) new TSetter<predict_cache_key,type>( #tag, &predict_cache_key::##member )
{ SETTER( "av", int, av_id )
, SETTER( "sz", int, sz_id )
, SETTER( "cr", int, cr_id )
, SETTER( "cp", int, cp_id )
, SETTER( "cv", int, cv_id )
, SETTER( "ct", int, ct_id )
, SETTER( "fc", int, fc )
, SETTER( "gnd", char, gnd )
, SETTER( "ag", int, ag )
, SETTER( "pc", int, pc )
, SETTER( "prl", long, prl_id )
};
PCKFactory<predict_cache_key> factory ( setters );
predict_cache_key a = factory.factor( a_strs );
predict_cache_key b = factory.factor( b_strs );
And the framework to achieve this:
// conversion from key=value pair to "set the value of a member"
// this class merely recognises a key and extracts the value part of the key=value string
//
template< typename BaseClass >
struct CMemberSetter {
const std::string key;
CMemberSetter( const string& aKey ): key( aKey ){}
bool try_set_value( BaseClass& p, const string& key_value ) const {
if( key_value.find( key ) == 0 ) {
size_t value_pos = key_value.find( "=" ) + 1;
action( p, key_value.substr( value_pos ) );
return true;
}
else return false;
}
virtual void action( BaseClass& p, const string& value ) const = 0;
};
// implementation of the action method
//
template< typename BaseClass, typename T >
struct TSetter : public CMemberSetter<BaseClass> {
typedef T BaseClass::*TMember;
TMember member;
TSetter( const string& aKey, const TMember t ): CMemberSetter( aKey ), member(t){}
virtual void action( BaseClass& p, const std::string& valuestring ) const {
// get value
T value ();
stringstream ( valuestring ) >> value;
(p.*member) = value;
}
};
template< typename BaseClass >
struct PCKFactory {
std::vector<const CMemberSetter<BaseClass>*> aSetters;
template< size_t N >
PCKFactory( const CMemberSetter<BaseClass>* (&setters)[N] )
: aSetters( setters, setters+N ) {}
template< size_t N >
BaseClass factor( const string (&key_value_pairs) [N] ) const {
BaseClass pck;
// process each key=value pair
for( const string* pair = key_value_pairs; pair != key_value_pairs + _countof( key_value_pairs); ++pair )
{
std::vector<const CMemberSetter<BaseClass>*>::const_iterator itSetter = aSetters.begin();
while( itSetter != aSetters.end() ) { // optimalization possible
if( (*itSetter)->try_set_value( pck, *pair ) )
break;
++itSetter;
}
}
return pck;
}
};
The problem is you dont have the metainformation to refer to the struct elements at run time (Something like structVar.$ElementName = ..., where $ElementName is not the element name but a (char?)variable containing the element name which should be used).
My solution would be to add this metainformation.
This should be an array with the offset of the elements in the struct.
Quick-n-Dirty solution: you add an array with the strings, the resulting code should look like this:
const char * wordlist[] = {"pc","gnd","ag","prl_id","fc"};
const int offsets[] = { offsetof(mystruct, pc), offsetof(mystruct, gnd), offsetof(mystruct, ag), offsetof(mystruct, prl_id), offsetof(mystruct, fc)};
const int sizes[] = { sizeof(mystruct.pc), sizeof(mystruct.gnd), sizeof(mystruct.ag), sizeof(mystruct.prl_id), sizeof(mystruct.fc)}
to enter something you would then something like this:
index = 0;
while (strcmp(wordlist[index], key) && index < 5)
index++;
if (index <5)
memcpy(&mystructvar + offsets[index], &value, sizes[index]);
else
fprintf(stderr, "Key not valid\n");
This loop for the inserts can get costly if you have bigger structures, but C doenst allow array indexing with strings. But the computer science found a solution for this problem: perfect hashes.
So it would afterwards look like this:
hash=calc_perf_hash(key);
memcpy(&mystruct + offsets[hash], &value, sizes[hash]);
But how to obtain these perfect hash functions (I called it calc_perf_hash)?
There exist algorithms for it where you just stuff your keywords in, and the functions comes out, and luckily someone even programmed them: look for the "gperf" tool/package in your faviourite OS/distribution.
There you would just input the 6 element names and he outputs you the ready to use C code for a perfect hash function (in generates per default a function "hash" which returnes the hash, and an "in_word_set" function which decides if a given key is in the word list).
Because the hash is in different order, you have of course to initilize the offsetof and size arrays in the order of the hashes.
Another problem you have (and which the other answers doesnt take into account) is the type conversion. The others make an assignment, I have (not better) memcopy.
Here I would suggest you change the sizes array into another array:
const char * modifier[]={"%i","%c", ...
Where each string describes the sscanf modifier to read it in. This way you can replace the assignment/copy by
sscanf(valueString, modifier[hash], &mystructVar + offsets(hash));
Cf course you can vary here, by including the "element=" into the string or similar. So you can put the complete string into value and dont have to preprocess it, I think this depends strongly on the rest of you parse routine.
Were I to do this in straight C, I wouldn't use the mother of all if's. Instead, I would do something like this:
typedef struct {
const char *fieldName;
int structOffset;
int fieldSize;
} t_fieldDef;
typedef struct {
int fieldCount;
t_fieldDef *defs;
} t_structLayout;
t_memberDef *GetFieldDefByName(const char *name, t_structLayout *layout)
{
t_fieldDef *defs = layout->defs;
int count = layout->fieldCount;
for (int i=0; i < count; i++) {
if (strcmp(name, defs->fieldName) == 0)
return defs;
defs++;
}
return NULL;
}
/* meta-circular usage */
static t_fieldDef metaFieldDefs[] = {
{ "fieldName", offsetof(t_fieldDef, fieldName), sizeof(const char *) },
{ "structOffset", offsetof(t_fieldDef, structOffset), sizeof(int) },
{ "fieldSize", offsetof(t_fieldDef, fieldSize), sizeof(int) }
};
static t_structLayout metaFieldDefLayout =
{ sizeof(metaFieldDefs) / sizeof(t_fieldDef), metaFieldDefs };
This lets you look up the field by name at runtime with a compact collection of the struct layout. This is fairly easy to maintain, but I don't like the sizeof(mumble) in the actual usage code - that requires that all struct definitions get labeled with comments saying, "don't effing change the types or content without changing them in the t_fieldDef array for this structure". There also needs to be NULL checking.
I'd also prefer that the lookup be either binary search or hash, but this is probably good enough for most cases. If I were to do hash, I'd put a pointer to a NULL hashtable into the t_structLayout and on first search, build the hash.
tried your idea and got an
error: ISO C++ forbids declaration of ‘map’ with no type
in linux ubuntu eclipse cdt.
I wish to notify that one should include <map> in the "*.h" file
in order to use your code without this error message.
#include <map>
// a framework
template<typename T>
struct entry {
enum type { tchar, tint, tlong } type_name;
/* default ctor, so we can std::map it */
entry() { }
template<typename R>
entry(R (T::*ptr)) {
etc' etc'......