How to validate input based on CFG? - c++

Consider this grammar:
expr ::= LP expr RP
| expr PLUS|MINUS expr
| expr STAR|SLASH expr
| term
term ::= INTEGER|FLOAT
Context-free grammar is defined as G = ( V, Σ, R, S ), where (in this case):
V = { expr, term }
Σ = { LP, RP, PLUS, MINUS, STAR, SLASH, INTEGER, FLOAT }
R = //was defined above
S = expr
Now let's define a small class called Parser which definition is (code samples are provided in C++):
class Parser
{
public:
Parser();
void Parse();
private:
void parseRecursive(vector<string> rules, int ruleIndex, int startingTokenIndex, string prevRule);
bool isTerm(string token); //returns true if token is in upper case
vector<string> split(...); //input: string; output: vector of words splitted by delim
map<string, vector<string>> ruleNames; //contains grammar definition
vector<int> tokenList; //our input set of tokens
};
To make it easier to go between rules, every grammar rule is split into 2 parts: a key (before ::=) and its rules (after ::=), so for my grammar from above the following map takes place:
std::map<string, vector<string>> ruleNames =
{
{ "expr", {
"LP expr RP",
"expr PLUS|MINUS expr",
"expr STAR|SLASH expr",
"term"
}
},
{ "term", { "INTEGER", "FLOAT" } }
};
For testing purposes, string (2 + 3) * 4 has been tokenized to the following set
{ TK_LP, TK_INTEGER, TK_PLUS, TK_INTEGER, TK_RP, TK_STAR, TK_INTEGER }
and been used as an input data for Parser.
Now for the hardest part: the algorithm. From what I understand, I was thinking about this:
1) Taking first rule from starting symbol vector (LP expr RP) and split it into words.
2) Check if first word in rule is terminal.
If the word is terminal, compare it with first token.
If they are equal, increase token index and move to next word in rule
If they are not equal, keep token index and move to next rule
If the word is not terminal and it was not used in previous recursion, increase token index and go into recursive parsing (passing new rules and non-terminal word)
While I am not sure in this algorithm, I still tried to make and implementation of it (of course, unsuccessful):
1) Outter Parse function that initiates recursion:
void Parser::Parse()
{
int startingTokenIndex = 0;
string word = this->startingSymbol;
for (int ruleIndex = 0; ruleIndex < this->ruleNames[word].size(); ruleIndex++)
{
this->parseRecursive(this->ruleNames[word], ruleIndex, startingTokenIndex, "");
}
}
2) Recursive function:
void Parser::parseRecursive(vector<string> rules, unsigned ruleIndex, unsigned startingTokenIndex, string prevRule)
{
printf("%s - %s\n", rules[ruleIndex].c_str(), this->tokenNames[this->tokenList[startingTokenIndex]].c_str());
vector<string> temp = this->split(rules[ruleIndex], ' ');
vector<vector<string>> ruleWords;
bool breakOutter = false;
for (unsigned wordIndex = 0; wordIndex < temp.size(); wordIndex++)
{
ruleWords.push_back(this->split(temp[wordIndex], '|'));
}
for (unsigned wordIndex = 0; wordIndex < ruleWords.size(); wordIndex++)
{
breakOutter = false;
for (unsigned subWordIndex = 0; subWordIndex < ruleWords[wordIndex].size(); subWordIndex++)
{
string word = ruleWords[wordIndex][subWordIndex];
if (this->isTerm(word))
{
if (this->tokenNames[this->tokenList[startingTokenIndex]] == this->makeToken(word))
{
printf("%s ", word.c_str());
startingTokenIndex++;
} else {
breakOutter = true;
break;
}
} else {
if (prevRule != word)
{
startingTokenIndex++;
this->parseRecursive(this->ruleNames[word], 0, startingTokenIndex, word);
prevRule = word;
}
}
}
if (breakOutter)
break;
}
}
What changes should I perform to my algorithm to make it work?

Depending on what you want to implement a one-time parser or compiler compiler, different methods are used. For compiler compilers are used mainly LR, for manual implementation of LL.
Basically, for LL, a manual implementation uses recursive descent (for each non-terminal, a function is created that implements it).
For example, for grammar:
S -> S + A | A
A -> a | b
Let us kill the left recursion and the left factorization (LL grammars do not work with the left recursion):
S -> As
s -> + As | epsilon
A -> a | b
Such an implementation is possible:
void S (void)
{
    A ();
    s ();
}
void s (void)
{
    if (get_next_token (). value! = '+')
        return;
    A ();
    s ();
}
void A (void)
{
    token * tok = get_next_token ();
    if (tok.value! = 'a' && tok.value! = 'b')
            syntax_error ();
}
If you want to add SDD, then the inherited attributes are passed through the arguments, and the synthesized attributes as output values.
Comment:
do not collect all the tokens at one time, get them as needed: get_next_token ().

Related

Permutations of String Using Stack C++

The program I have below finds all the permutations of a given string using a stack without recursion. I am having some trouble understanding what the place in the struct is for and how it plays into the logic for the algorithm. Could anyone help me understand this code? I have a struct that only has two entities:
class Node
{
public:
string word; // stores the word in the node
Node *next;
};
I would just like to understand why the place entity is needed.
Here is the code that finds all the permutations of a given string:
struct State
{
State (std::string topermute_, int place_, int nextchar_, State* next_ = 0)
: topermute (topermute_)
, place (place_)
, nextchar (nextchar_)
, next (next_)
{
}
std::string topermute;
int place;
int nextchar;
State* next;
};
std::string swtch (std::string topermute, int x, int y)
{
std::string newstring = topermute;
newstring[x] = newstring[y];
newstring[y] = topermute[x]; //avoids temp variable
return newstring;
}
void permute (std::string topermute, int place = 0)
{
// Linked list stack.
State* top = new State (topermute, place, place);
while (top != 0)
{
State* pop = top;
top = pop->next;
if (pop->place == pop->topermute.length () - 1)
{
std::cout << pop->topermute << std::endl;
}
for (int i = pop->place; i < pop->topermute.length (); ++i)
{
top = new State (swtch (pop->topermute, pop->place, i), pop->place + 1, i, top);
}
delete pop;
}
}
int main (int argc, char* argv[])
{
if (argc!=2)
{
std::cout<<"Proper input is 'permute string'";
return 1;
}
else
{
permute (argv[1]);
}
return 0;
}
Place helps you to know where is going to be the next character "swap". As you can see, it increments inside the for loop. As you can see, inside that for loop, it behaves like a pivot and i increments in order to behave like a permutator (by swapping characters)

bad_alloc when attempting to print string that was assigned to member of $$ struct

During our compiler's intermediate code generation phase, and more specifically while testing the arithmetic expressions and assignment rules, I noticed that although the respective quads are constructed successfully, when printing them out sometimes we'll get a bad_alloc exception. After tracing it, it looks like it's cause by the printQuads() method and specifically the following string access of key:
if(q.result != nullptr && q.result->sym != nullptr) {
cout << "quad " << opcodeStrings[q.op] << " inside if key check for" << opcodeStrings[q.op] << endl;
resultKey = q.result->sym->key;
}
I'll try to include the code that's relevant instead of dumping 500 lines of code here.
So, below you can see our assignmentexpr and basic arithmetic expression rules and actions:
expr: assignexpr
| expr PLUS expr
{
bool isExpr1Arithm = check_arith($1);
bool isExpr2Arithm = check_arith($3);
if(!isExpr1Arithm || !isExpr2Arithm)
{
//string msg = !isExpr1Arithm ? "First operand isn\'t a number in addition!" : "Second operand isn\'t a number in addition!";
yyerror(token_node, "Both addition operands must be numbers!");
} else
{
double result = $1->numConst + $3->numConst;
$$ = newexpr(arithmetic_e);
$$->sym = newtemp(scope);
$$->numConst = result;
emit(add, $1, $3, $$, nextquadlabel(), yylineno);
}
}
| expr MIN expr
{
bool isExpr1Arithm = check_arith($1);
bool isExpr2Arithm = check_arith($3);
if(!isExpr1Arithm || !isExpr2Arithm)
{
//string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
yyerror(token_node, "Both suctraction operands must be numbers!");
} else
{
double result = $1->numConst - $3->numConst;
$$ = newexpr(arithmetic_e);
$$->sym = newtemp(scope);
$$->numConst = result;
emit(sub, $1, $3, $$, nextquadlabel(), yylineno);
}
}
| expr MUL expr
{
bool isExpr1Arithm = check_arith($1);
bool isExpr2Arithm = check_arith($3);
if(!isExpr1Arithm || !isExpr2Arithm)
{
//string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
yyerror(token_node, "Both multiplication operands must be numbers!");
} else
{
double result = $1->numConst * $3->numConst;
$$ = newexpr(arithmetic_e);
$$->sym = newtemp(scope);
$$->numConst = result;
emit(mul, $1, $3, $$, nextquadlabel(), yylineno);
}
}
| expr DIV expr
{
bool isExpr1Arithm = check_arith($1);
bool isExpr2Arithm = check_arith($3);
if(!isExpr1Arithm || !isExpr2Arithm)
{
//string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
yyerror(token_node, "Both division operands must be numbers!");
} else
{
if($3->numConst == 0) {
yyerror(token_node, "division by 0!");
} else {
double result = $1->numConst / $3->numConst;
$$ = newexpr(arithmetic_e);
$$->sym = newtemp(scope);
$$->numConst = result;
emit(div_op, $1, $3, $$, nextquadlabel(), yylineno);
}
}
}
| expr MOD expr
{
bool isExpr1Arithm = check_arith($1);
bool isExpr2Arithm = check_arith($3);
if(!isExpr1Arithm || !isExpr2Arithm)
{
//string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
yyerror(token_node, "Both modulus operands must be numbers!");
} else
{
if($3->numConst == 0) {
yyerror(token_node, "division by 0!");
} else {
double result = fmod($1->numConst,$3->numConst);
$$ = newexpr(arithmetic_e);
$$->sym = newtemp(scope);
$$->numConst = result;
emit(mod_op, $1, $3, $$, nextquadlabel(), yylineno);
}
}
}
...
assignexpr: lvalue ASSIGN expr { if ( isMemberOfFunc )
{
isMemberOfFunc=false;
}
else{ if ( islocalid==true ){
islocalid = false;
}else{
if ( isLibFunc($1->sym->key) ) yyerror(token_node,"Library function \"" + $1->sym->key + "\" is not lvalue!");
if (SymTable_lookup(symtab,$1->sym->key,scope,false) && isFunc($1->sym->key,scope)) yyerror(token_node,"User function \"" + $1->sym->key + "\" is not lvalue!");
}
}
if($1->type == tableitem_e)
{
// lvalue[index] = expr
emit(tablesetelem,$1->index,$3,$1,nextquadlabel(),yylineno);
$$ = emit_iftableitem($1,nextquadlabel(),yylineno, scope);
$$->type = assignment;
} else
{
emit(assign,$3,NULL,$1,nextquadlabel(),yylineno); //lval = expr;
$$ = newexpr(assignment);
$$->sym = newtemp(scope);
emit(assign, $1,NULL,$$,nextquadlabel(),yylineno);
}
}
;
The printQuads method is the following:
void printQuads() {
unsigned int index = 1;
cout << "quad#\t\topcode\t\tresult\t\targ1\t\targ2\t\tlabel" <<endl;
cout << "-------------------------------------------------------------------------------------------------" << endl;
for(quad q : quads) {
string arg1_type = "";
string arg2_type = "";
cout << "quad before arg1 type check" << endl;
if(q.arg1 != nullptr) {
switch (q.arg1->type) {
case const_bool:
arg1_type = "\'" + BoolToString(q.arg1->boolConst) + "\'";
break;
case const_string:
arg1_type = "\"" + q.arg1->strConst + "\"";
break;
case const_num:
arg1_type = to_string(q.arg1->numConst);
break;
case var:
arg1_type = q.arg1->sym->key;
break;
case nil_e:
arg1_type = "nil";
break;
default:
arg1_type = q.arg1->sym->key;
break;
}
}
cout << "quad before arg2 type check" << endl;
if(q.arg2 != nullptr) {
switch (q.arg2->type) {
case const_bool:
arg2_type = "\'" + BoolToString(q.arg2->boolConst) + "\'";
break;
case const_string:
arg2_type = "\"" + q.arg2->strConst + "\"";
break;
case const_num:
arg2_type = to_string(q.arg2->numConst);
break;
case nil_e:
arg2_type = "nil";
break;
default:
arg2_type = q.arg2->sym->key;
break;
}
}
string label = "";
if(q.op == if_eq || q.op == if_noteq || q.op == if_lesseq || q.op == if_greatereq
|| q.op == if_less || q.op == if_greater || q.op == jump) label = q.label;
string resultKey = "";
cout << "quad before key check" << endl;
if(q.result != nullptr && q.result->sym != nullptr) {
cout << "quad " << opcodeStrings[q.op] << " inside if key check for" << opcodeStrings[q.op] << endl;
resultKey = q.result->sym->key;
}
cout << "quad after key check" << endl;
cout << index << ":\t\t" << opcodeStrings[q.op] << "\t\t" << resultKey << "\t\t" << arg1_type << "\t\t" << arg2_type << "\t\t" << label << "\t\t" << endl;
index++;
}
}
The quads variable is just a vector of quads. Here is the quad struct:
enum expr_t {
var,
tableitem_e,
user_func,
lib_func,
arithmetic_e,
assignment,
newtable_e,
const_num,
const_bool,
const_string,
nil_e,
bool_e
};
struct expr {
expr_t type;
binding* sym;
expr* index;
double numConst;
string strConst;
bool boolConst;
expr* next;
};
struct quad {
iopcode op;
expr* result;
expr* arg1;
expr* arg2;
unsigned int label;
unsigned int line;
};
The binding* is defined as follows and is a symbol table binding:
enum SymbolType{GLOBAL_, LOCAL_, FORMAL_, USERFUNC_, LIBFUNC_, TEMP};
struct binding{
std::string key;
bool isactive = true;
SymbolType sym;
//vector<binding *> formals;
scope_space space;
unsigned int offset;
unsigned int scope;
int line;
};
Here are the emit(), newtemp & newexpr() methods:
void emit(
iopcode op,
expr* arg1,
expr* arg2,
expr* result,
unsigned int label,
unsigned int line
){
quad p;
p.op = op;
p.arg1 = arg1;
p.arg2 = arg2;
p.result = result;
p.label = label;
p.line = line;
currQuad++;
quads.push_back(p);
}
binding *newtemp(unsigned int scope){
string name = newTempName();
binding* sym = SymTable_get(symtab,name,scope);
if (sym== nullptr){
SymTable_put(symtab,name,scope,TEMP,-1);
binding* sym = SymTable_get(symtab,name,scope);
return sym;
}else return sym;
}
string newTempName(){
string temp = "_t" + to_string(countertemp) + " ";
countertemp++;
return temp;
}
expr* newexpr(expr_t exprt){
expr* current = new expr;
current->sym = NULL;
current->index = NULL;
current->numConst = 0;
current->strConst = "";
current->boolConst = false;
current->next = NULL;
current->type = exprt;
return current;
}
unsigned int countertemp = 0;
unsigned int currQuad = 0;
Symbol table cpp file:
#include <algorithm>
bool isHidingBindings = false;
/* Return a hash code for pcKey.*/
static unsigned int SymTable_hash(string pcKey){
size_t ui;
unsigned int uiHash = 0U;
for (ui = 0U; pcKey[ui] != '\0'; ui++)
uiHash = uiHash * HASH_MULTIPLIER + pcKey[ui];
return (uiHash % DEFAULT_SIZE);
}
/*If b contains a binding with key pcKey, returns 1.Otherwise 0.
It is a checked runtime error for oSymTable and pcKey to be NULL.*/
int Bucket_contains(scope_bucket b, string pcKey){
vector<binding> current = b.entries[SymTable_hash(pcKey)]; /*find the entry binding based on the argument pcKey*/
for (int i=0; i<current.size(); i++){
binding cur = current.at(i);
if (cur.key==pcKey) return 1;
}
return 0;
}
/*epistrefei to index gia to bucket pou antistixei sto scope 'scope'.Se periptwsh pou den uparxei
akoma bucket gia to en logw scope, ean to create einai true dhmiourgei to antistoixo bucket sto
oSymTable kai epistrefei to index tou.Diaforetika epistrefei thn timh -1.*/
int indexofscope(SymTable_T &oSymTable, unsigned int scope, bool create){
int index=-1;
for(int i=0; i<oSymTable.buckets.size(); i++) if (oSymTable.buckets[i].scope == scope) index=i;
if ( index==-1 && create ){
scope_bucket newbucket;
newbucket.scope = scope;
oSymTable.buckets.push_back(newbucket);
index = oSymTable.buckets.size()-1;
}
return index;
}
/*If there is no binding with key : pcKey in oSymTable, puts a new binding with
this key and value : pvvValue returning 1.Otherise, it just returns 0.
It is a checked runtime error for oSymTable and pcKey to be NULL.*/
int SymTable_put(SymTable_T &oSymTable, string pcKey,unsigned int scope, SymbolType st, unsigned int line){
int index = indexofscope(oSymTable,scope, true);
if(index==-1) cerr<<"ERROR"<<endl;
scope_bucket *current = &oSymTable.buckets.at(index);
if ( Bucket_contains(*current, pcKey) && st != FORMAL_ && st != LOCAL_) return 0; /*If the binding exists in oSymTable return 0.*/
binding newnode;
newnode.key = pcKey;
newnode.isactive = true;
newnode.line = line;
newnode.sym = st;
newnode.scope = scope;
current->entries[SymTable_hash(pcKey)].push_back(newnode);
return 1;
}
/*Pairnei ws orisma to oSymTable kai to scope pou theloume na apenergopoihsoume.
An to sugkekrimeno scope den uparxei sto oSymTable epistrefei -1.Diaforetika 0*/
void SymTable_hide(SymTable_T &oSymTable, unsigned int scope){
isHidingBindings = true;
for(int i=scope; i >= 0; i--) {
if(i == 0) return;
int index = indexofscope(oSymTable,i,false);
if(index == -1) continue;
scope_bucket *current = &oSymTable.buckets.at(index);
for (int i=0; i<DEFAULT_SIZE; i++) {
for (int j=0; j<current->entries[i].size(); j++) {
if(current->entries[i].at(j).sym == LOCAL_ || current->entries[i].at(j).sym == FORMAL_)
current->entries[i].at(j).isactive = false;
}
}
}
}
void SymTable_show(SymTable_T &oSymTable, unsigned int scope){
isHidingBindings = false;
for(int i=scope; i >= 0; i--) {
if(i == 0) return;
int index = indexofscope(oSymTable,i,false);
if(index == -1) continue;
scope_bucket *current = &oSymTable.buckets.at(index);
for (int i=0; i<DEFAULT_SIZE; i++) {
for (int j=0; j<current->entries[i].size(); j++) {
if(current->entries[i].at(j).sym == LOCAL_ || current->entries[i].at(j).sym == FORMAL_)
current->entries[i].at(j).isactive = true;
}
}
}
}
bool SymTable_lookup(SymTable_T oSymTable, string pcKey, unsigned int scope, bool searchInScopeOnly){
for(int i=scope; i >= 0; i--) {
if(searchInScopeOnly && i != scope) break;
int index = indexofscope(oSymTable,i,false);
if(index == -1) continue;
scope_bucket current = oSymTable.buckets[index];
for(vector<binding> entry : current.entries) {
for(binding b : entry) {
if(b.key == pcKey && b.isactive) return true;
else if(b.key == pcKey && !b.isactive) return false;
}
}
}
return false;
}
binding* SymTable_lookupAndGet(SymTable_T &oSymTable, string pcKey, unsigned int scope) noexcept{
for ( int i=scope; i >= 0; --i ){
int index = indexofscope(oSymTable,i,false );
if (index==-1) continue;
scope_bucket &current = oSymTable.buckets[index];
for (auto &entry : current.entries) {
for (auto &b : entry ){
if ( b.key == pcKey ) return &b;
}
}
}
return nullptr;
}
/*Lamvanei ws orisma to oSymTable, kleidh tou tou desmou pou psaxnoume kai to scope tou desmou.
H sunarthsh telika epistrefei to value tou tou desmou.Diaforetika epistrefei 0*/
binding* SymTable_get(SymTable_T &oSymTable, const string pcKey, unsigned int scope){
for ( int i=scope; i >= 0; --i )
{
const int index = indexofscope( oSymTable, i, false );
if ( index == -1 )
{
continue;
}
scope_bucket& current = oSymTable.buckets[index];
for ( auto& entry : current.entries)
{
for ( auto& b : entry )
{
if ( b.key == pcKey )
{
return &b;
}
}
}
}
return nullptr;
}
When run with the following test file, the issue occurs at the z5 = 4 / 2; expression's assign quad:
// simple arithmetic operations
z1 = 1 + 2;
z10 = 1 + 1;
z2 = 1 - 3;
z3 = 4 * 4;
z4 = 5 / 2;
What's confusing is that if I print out the sym->key after each emit() in the arithmetic-related actions, I can see the keys just fine. But once I try to access them inside the printQuads it will fail (for the div operation at least so far). This has me thinking that maybe we are shallow copying the binding* sym thus losing the key? But how come the rest of them are printed normally?
I'm thinking that the issue (which has occured again in the past at various stages) could be caused by us using a ton of copy-by-value instead of by-reference but I can't exactly confirm this because most of the time it works (I'm guessing that means that this is undefined behavior?).
I'm sure this is very difficult to help debug but maybe someone will eyeball something that I can't see after this many hours.
Debugging by eyeballing your code is probably a useful skill, but it's far from the most productive form of debugging. These days, it's much less necessary, since there are lots of good tools which you can use to detect problems. (Here, I do mean "you", specifically. I can't use any of those tools because I don't have your complete project in front of me. And nor do I particularly want it; this is not a request for you to paste hundreds of lines of code).
You're almost certainly right that your problem is related to some kind of undefined behaviour. If you're correct about the bad_alloc exception being thrown by what is effectively a copy of a std::string, then it's most likely the result of the thing being copied from not being a valid std::string. Perhaps it's an actual std::string object whose internal members have been corrupted; perhaps the pointer is not actually pointing to an active std::string (which I think is the real problem, see below). Or perhaps it's something else.
Either way, the error occurred long before the bug manifests itself, so you're only going to stumble upon where it happened by blind luck. On the other hand, there are a variety of memory error detection tools available which may be able to pinpoint the precise moment in which you violated the contract by reading or writing to memory which didn't belong to you. These include Valgrind and AddressSanitizer (also known as ASan); one or both of these is certainly available for the platform on which you are developing your project. (I say that confidently even without knowing what that platform is, but you'll have to do a little research to find the one which works best for your particular environment. Both of those names can be looked up on Wikipedia.) These tools are very easy to use, and extraordinarily useful; they can save you hours or days of debugging and a lot of frustration. As an extra added bonus, they can detect bugs you don't even know you have, saving you the embarrassment of shipping a program which will blow up in the hands of the customer or the person who is marking your assignment. So I strongly recommend learning how to use them.
I probably should leave it at that, because it's better motivation to learn to use the tools. Still, I can't resist making a guess about where the problem lies. But honestly, you will learn a lot more by ignoring what I'm about to say and trying to figure out the problem yourself.
Anyway, you don't include much in the way of information about your SymTable_T class, and the inconsistent naming convention makes me wonder if you even wrote its code; perhaps it was part of the skeleton code you were given for this assignment. From what I can see in SymTable_put and SymTable_get, the SymTable_T includes something like a hash table, but doesn't use the C++ standard library associative containers. (That's a mistake from the beginning, IMHO. This assignment is about learning how to generate code, not how to write a good hash table. The C++ standard library associative containers are certainly adequate for your purposes, whether or not they are the absolute ideal for your use case, and they have the enormous advantages of already being thoroughly documented and debugged.)
It's possible that SymTable_T was not originally written in C++ at all. The use of free-standing functions like SymTable_put and SymTable_get rather than class methods is difficult to explain unless the functions were originally written in C, which doesn't allow object methods. On the other hand, they appear to use C++ standard library collections, as evidenced by the call to push_back in SymTable_put:
current->entries[SymTable_hash(pcKey)].push_back(newnode);
That suggests that entries is a std::vector (although there are other possibilities), and if it is, it should raise a red flag when you combine it with this, from SymTable_get (whitespace-edited to save screen space here):
for ( auto& entry : current.entries) {
for ( auto& b : entry ) {
if ( b.key == pcKey )
return &b;
}
}
To be honest, I don't understand that double loop. To start with, you seem to be ignoring the fact that there is a hash table somewhere in that data structure, but beyond that, it seems to me that entry should be a binding (that's what SymTable_put pushes onto the entries container), and I don't see where a binding is an iterable object. Perhaps I'm not reading that correctly.)
Regardless, evidently SymTable_get is returning a reference to something which is stored in a container, probably a std::vector, and that container is modified from time to time by having new elements pushed onto it. And pushing a new element onto the end of a std::vector invalidates all existing references to every element of the vector. (See https://en.cppreference.com/w/cpp/container/vector/push_back)
Thus, newtemp, which returns a binding* acquired from SymTable_get, is returning a pointer which may be invalidated in the future by some call to SymTable_put (though not by every call to that function; only the ones where the stars unline unhappily). That pointer is then stored into a data object which will (much later) be given to printQuads, which will attempt to use the pointer to make a copy of a string which it will attempt to print. And, as I mentioned towards the beginning of this treatise, trying to use an object which is pointed to by a dangling pointer is Undefined Behaviour.
As a minor note, making a copy of a string in order to print it out is completely unnecessary. A reference would work just fine, and save a bunch of unnecessary memory allocations. But that won't fix the problem (if my guess turns out to be correct) because printing through a dangling pointer is just as Undefined Behaviour as making a copy through a dangling pointer, and will likely manifest in some other mysterious way.

Elegant Way To Generate Composite Permutations in C++

I have a class Test which contains two vectors of a Letter class, a user defined type for which the less-than operator (<) has been implemented. How can I best generate all possible permutations of Test?
class Test
{
vector<Letter> letter_box_a;
vector<Letter> letter_box_b;
}
So if letter_box_a contains the letters A and B and letter_box_b contains C and D the valid permutations of Test would be (AB)(CD), (BA)(CD), (AB)(DC) and (BA)(DC).
Although I am able to brute force it, I was trying to write a better (more elegant/efficient) function which would internally call std::next_permutation on the underlying containers allowing me to do
Test test;
while (test.set_next_permutation())
{
// Do the stuff
}
but it appears to be a bit trickier than I first anticipated. I don't necessarily need an STL solution but would like an elegant solution.
I would think you could do something like
bool Test::set_next_permutation() {
auto &a = letter_box_a, &b = letter_box_b; // entirely to shorten the next line
return std::next_permutation(a.start(), a.end()) || std::next_permutation(b.start(), b.end());
}
(Of course, a while loop will skip the initial permutation in any case. You want a do...while loop instead.)
If you want to use std::next_permutation, you need a nested loop for each vector you are permuting:
std::string s0 = "ab";
std::string s1 = "cd";
do
{
do
{
cout << s0 << "" << s1 << endl;
} while (std::next_permutation(s0.begin(), s0.end()));
} while (std::next_permutation(s1.begin(), s1.end()));
Output:
abcd
bacd
abdc
badc
And, in the class:
class Foo
{
public:
Foo(std::string_view arg_a, std::string_view arg_b)
: a(arg_a)
, b(arg_b)
, last(false)
{ }
void reset_permutations()
{
last = false;
}
bool next_permutation(std::string& r)
{
if (last)
return false;
if (not std::next_permutation(a.begin(), a.end()))
if (not std::next_permutation(b.begin(), b.end()))
last = true;
r = a + b;
return true;
}
private:
std::string a, b;
bool last;
};
int main(int argc, const char *argv[])
{
Foo foo("ab", "cd");
string s;
while (foo.next_permutation(s))
cout << s << endl;
return 0;
}

C++ std::set<string> Alphanumeric custom comparator

I'm solving a problem with a sorting non-redundant permutation of String Array.
For example, if input string is "8aC", then output should be order like {"Ca8","C8a", "aC8", "a8C", "8Ca", "9aC"}.I chose C++ data structure set because each time I insert the String into std:set, set is automatically sorted and eliminating redundancy. The output is fine.
But I WANT TO SORT SET IN DIFFERENT ALPHANUMERIC ORDER which is different from default alphanumeric sorting order. I want to customize the comparator of set the order priority like: upper case> lower case > digit.
I tried to customize comparator but it was quite frustrating. How can I customize the sorting order of the set? Here's my code.
set<string, StringCompare> setl;
for (i = 0; i < f; i++)
{
setl.insert(p[i]); //p is String Array. it has the information of permutation of String.
}
for (set<string>::iterator iter = setl.begin(); iter != setl.end(); ++iter)
cout << *iter << endl; //printing set items. it works fine.
struct StringCompare
{
bool operator () (const std::string s_left, const std::string s_right)
{
/*I want to use my character comparison function in here, but have no idea about that.
I'm not sure about that this is the right way to customize comparator either.*/
}
};
int compare_char(const char x, const char y)
{
if (char_type(x) == char_type(y))
{
return ( (int) x < (int) y) ? 1 : 0 ;
}
else return (char_type(x) > char_type(y)) ? 1 : 0;
}
int char_type(const char x)
{
int ascii = (int)x;
if (ascii >= 48 && ascii <= 57) // digit
{
return 1;
}
else if (ascii >= 97 && ascii <= 122) // lowercase
{
return 2;
}
else if (ascii >= 48 && ascii <= 57) // uppercase
{
return 3;
}
else
{
return 0;
}
}
You are almost there, but you should compare your string lexicographically.
I roughly added small changes to your code.
int char_type( const char x )
{
if ( isupper( x ) )
{
// upper case has the highest priority
return 0;
}
if ( islower( x ) )
{
return 1;
}
if ( isdigit( x ) )
{
// digit has the lowest priority
return 2;
}
// something else
return 3;
}
bool compare_char( const char x, const char y )
{
if ( char_type( x ) == char_type( y ) )
{
// same type so that we are going to compare characters
return ( x < y );
}
else
{
// different types
return char_type( x ) < char_type( y );
}
}
struct StringCompare
{
bool operator () ( const std::string& s_left, const std::string& s_right )
{
std::string::const_iterator iteLeft = s_left.begin();
std::string::const_iterator iteRight = s_right.begin();
// we are going to compare each character in strings
while ( iteLeft != s_left.end() && iteRight != s_right.end() )
{
if ( compare_char( *iteLeft, *iteRight ) )
{
return true;
}
if ( compare_char( *iteRight, *iteLeft ) )
{
return false;
}
++iteLeft;
++iteRight;
}
// either of strings reached the end.
if ( s_left.length() < s_right.length() )
{
return true;
}
// otherwise.
return false;
}
};
Your comparator is right. I would turn parameters to const ref like this
bool operator () (const std::string &s_left, const std::string &s_right)
and start by this simple implementation:
return s_left < s_right
This will give the default behaviour and give you confidence you are on the right track.
Then start comparing one char at the time with a for loop over the shorter between the length of the two strings. You can get chars out the string simply with the operator[] (e.g. s_left[i])
You're very nearly there with what you have.
In your comparison functor you are given two std::strings. What you need to do is to find the first position where the two strings differ. For that, you can use std::mismatch from the standard library. This returns a std::pair filled with iterators pointing to the first two elements that are different:
auto iterators = std::mismatch(std::begin(s_left), std::end(s_left),
std::begin(s_right), std::end(s_right));
Now, you can dereference the two iterators we've been given to get the characters:
char c_left = *iterators.first;
char c_right = *iterators.second;
You can pass those two characters to your compare_char function and it should all work :-)
Not absoloutely sure about this, but you may be able to use an enumerated class towards your advantage or an array and choose to read from certain indices in which ever order you like.
You can use one enumerated class to define the order you would like to output data in and another that contains the data to be outputed, then you can set a loop that keeps on looping to assign the value to the output in a permuted way!
namespace CustomeType
{
enum Outs { Ca8= 0,C8a, aC8, a8C, 8Ca, 9aC };
enum Order{1 = 0 , 2, 3 , 4 , 5};
void PlayCard(Outs input)
{
if (input == Ca8) // Enumerator is visible without qualification
{
string[] permuted;
permuted[0] = Outs[0];
permuted[1] = Outs[1];
permuted[2] = Outs[2];
permuted[3] = Outs[3];
permuted[4] = Outs[4];
}// else use a different order
else if (input == Ca8) // this might be much better
{
string[] permuted;
for(int i = 0; i<LessThanOutputLength; i++)
{
//use order 1 to assign values from Outs
}
}
}
}
This should work :
bool operator () (const std::string s_left, const std::string s_right)
{
for(int i = 0;i < s_left.size();i++){
if(isupper(s_left[i])){
if(isupper(s_right[i])) return s_left[i] < s_right[i];
else if(islower(s_right[i]) || isdigit(s_right[i]))return true;
}
else if(islower(s_left[i])){
if(islower(s_right[i])) return s_left[i] < s_right[i];
else if(isdigit(s_right[i])) return true;
else if(isupper(s_right[i])) return false;
}
else if(isdigit(s_left[i])){
if(isdigit(s_right[i])) return s_left[i] < s_right[i];
else if(islower(s_right[i]) || isupper(s_right[i])) return false;
}
}
}

How to find out whether a member function is const or volatile with libclang?

I have an instance of CXCursor of kind CXCursor_CXXMethod. I want to find out if the function is const or volatile, for example:
class Foo {
public:
void bar() const;
void baz() volatile;
void qux() const volatile;
};
I could not find anything useful in the documentation of libclang. I tried clang_isConstQualifiedType and clang_isVolatileQualifiedType but these always seem to return 0 on C++ member function types.
I can think of two approaches:
Using the libclang lexer
The code which appears in this SO answer works for me; it uses the libclang tokenizer to break a method declaration apart, and then records any keywords outside of the method parentheses.
It does not access the AST of the code, and as far as I can tell doesn't involve the parser at all. If you are sure the code you investigate is proper C++, I believe this approach is safe.
Disadvantages: This solution does not appear to take into account preprocessing directives, so the code has to be processed first (e.g., passed through cpp).
Example code (the file to parse must be the first argument to your program, e.g. ./a.out bla.cpp):
#include "clang-c/Index.h"
#include <string>
#include <set>
#include <iostream>
std::string GetClangString(CXString str)
{
const char* tmp = clang_getCString(str);
if (tmp == NULL) {
return "";
} else {
std::string translated = std::string(tmp);
clang_disposeString(str);
return translated;
}
}
void GetMethodQualifiers(CXTranslationUnit translationUnit,
std::set<std::string>& qualifiers,
CXCursor cursor) {
qualifiers.clear();
CXSourceRange range = clang_getCursorExtent(cursor);
CXToken* tokens;
unsigned int numTokens;
clang_tokenize(translationUnit, range, &tokens, &numTokens);
bool insideBrackets = false;
for (unsigned int i = 0; i < numTokens; i++) {
std::string token = GetClangString(clang_getTokenSpelling(translationUnit, tokens[i]));
if (token == "(") {
insideBrackets = true;
} else if (token == "{" || token == ";") {
break;
} else if (token == ")") {
insideBrackets = false;
} else if (clang_getTokenKind(tokens[i]) == CXToken_Keyword &&
!insideBrackets) {
qualifiers.insert(token);
}
}
clang_disposeTokens(translationUnit, tokens, numTokens);
}
int main(int argc, char *argv[]) {
CXIndex Index = clang_createIndex(0, 0);
CXTranslationUnit TU = clang_parseTranslationUnit(Index, 0,
argv, argc, 0, 0, CXTranslationUnit_None);
// Set the file you're interested in, and the code location:
CXFile file = clang_getFile(TU, argv[1]);
int line = 5;
int column = 6;
CXSourceLocation location = clang_getLocation(TU, file, line, column);
CXCursor cursor = clang_getCursor(TU, location);
std::set<std::string> qualifiers;
GetMethodQualifiers(TU, qualifiers, cursor);
for (std::set<std::string>::const_iterator i = qualifiers.begin(); i != qualifiers.end(); ++i) {
std::cout << *i << std::endl;
}
clang_disposeTranslationUnit(TU);
clang_disposeIndex(Index);
return 0;
}
Using libclang's Unified Symbol Resolution (USR)
This approach involves using the parser itself, and extracting qualifier information from the AST.
Advantages: Seems to work for code with preprocessor directives, at least for simple cases.
Disadvantages: My solution parses the USR, which is undocumented, and might change in the future. Still, it's easy to write a unit-test to guard against that.
Take a look at $(CLANG_SRC)/tools/libclang/CIndexUSRs.cpp, it contains the code that generates a USR, and therefore contains the information required to parse the USR string. Specifically, lines 523-529 (in LLVM 3.1's source downloaded from www.llvm.org) for the qualifier part.
Add the following function somewhere:
void parseUsrString(const std::string& usrString, bool* isVolatile, bool* isConst, bool *isRestrict) {
size_t bangLocation = usrString.find("#");
if (bangLocation == std::string::npos || bangLocation == usrString.length() - 1) {
*isVolatile = *isConst = *isRestrict = false;
return;
}
bangLocation++;
int x = usrString[bangLocation];
*isConst = x & 0x1;
*isVolatile = x & 0x4;
*isRestrict = x & 0x2;
}
and in main(),
CXString usr = clang_getCursorUSR(cursor);
const char *usr_string = clang_getCString(usr);
std::cout << usr_string << "\n";
bool isVolatile, isConst, isRestrict;
parseUsrString(usr_string, &isVolatile, &isConst, &isRestrict);
printf("restrict, volatile, const: %d %d %d\n", isRestrict, isVolatile, isConst);
clang_disposeString(usr);
Running on Foo::qux() from
#define BLA const
class Foo {
public:
void bar() const;
void baz() volatile;
void qux() BLA volatile;
};
produces the expected result of
c:#C#Foo#F#qux#5
restrict, volatile, const: 0 1 1
Caveat: you might have noticed that libclang's source suggets my code should be isVolatile = x & 0x2 and not 0x4, so it might be the case you should replace 0x4 with 0x2. It's possible my implementation (OS X) has them replaced.