How to build a recursive descent parser - c++

I've been working on a recursive descent parser for a simple calculator. When something is declared it is either declared as an int or a float. Currently I am saving the strings into two different vectors one for int and one for float. At this point I don't care what the numbers associated are I simply care that the string is declared before it is used.
My issue is that I have to be able to output a warning message if an int and float are being used in an operation such as float + int.
So if expression is term+expression or term-expression or term. In recursive descent how could I possibly check to see if an int is being used in an operation with a float. Sorry if the explanation is not clear. I'm finding it a bit difficult to explain. I have add some code if necessarily, I just didn't want to flood the question with code.
edit:
there is still a bunch of code missing, I figured just grab the important part, but I can upload the entire thing if need-be. I see some people didn't understand what the main question was. One of the requirements is "When integer and float values are mixed in +, -, * and /, the integer is converted to a float. print a message indicating the line number and that a conversion would be required." At the moment the program reads from a file. if you say "int x;" the program will currently save x in the int vector, then when you say something such as x=5; it will acknowledge that x has been declared and the assignment will pass. my issue is where if you say int x; float y; int z; x=5; y=7.5; z=x+y; how would I be able to check for that as at the moment my program only saves the type for variables and not the value. Essentially I'm wondering if it would be possible to do something like scan the completed parse as if it was a string or some other method of finding out of an operation using int and float is being done.
the lex scanner was created with flex
class Token {
Tokentype type;
string value;
int linenum;
public:
Token(Tokentype t, string v="") {
type = t;
value = v;
}
Tokentype getType() { return type; }
string getValue() { return value; }
int getLinenum() { return linenum; }
};
vector<string> int_list;
vector<string> float_list;
class PTree {
PTreeNodetype type;
PTree *left;
PTree *right;
public:
PTree(PTreeNodetype t, PTree *l=0, PTree *r=0) {
type = t;
left = l;
right = r;
}
PTreeNodetype getType(){ return type;}
};
// expr ::= term PLUS expr | term MINUS expr | term
PTree *
Expr() {
PTree *term = Term();
Token *t;
if (!term)
return 0;
t = getToken();
if (t == NULL){
delete t;
return 0;
}
if(t->getType() != T_SC)
{
if (t->getType() == T_RPAREN){
pushbacktoken(t);
return new PTree(EXPR, term);
}
if (t->getType() != T_PLUS && t->getType() != T_MINUS)
{
cout << t->getLinenum() << ":" << "Error: expected + or -" << endl;
pushbacktoken(t);
delete t;
return 0;
}
delete t;
PTree *expr = Expr();
if (!expr)
return 0;
return new PTree(EXPR, term, expr);
}
pushbacktoken(t);
return new PTree(EXPR, term);
}

I think you need to explain the structure of your code a little more.
In an interpreter like you are talking about normally there are three things going on:
A lexer/scanner is generating a token stream
A parser is taking the token and building semantic objects
A interpreter is consuming the semantic object tree and executing them
Stage 1 doesn't need to care that you are adding an int and a float. Stage 2 can populate an a warning field in your semantic object/struct that interpreter will print when it sees populated or the interpreter can recognize this warning condition itself.
To provide you any more detail or use more specific terminology we would need to see how you are representing operations.

Two options that I see, depending on what you are doing.
First. Don't worry about it while you are building the parse tree. Later, when you walk the tree, you can easily check this and throw an error.
Second. Use different rules for int and float. So you would have a rule for adding two ints and a rule for adding two floats. This also means you wouldn't have a number rule, which I am guessing you do, that mixes both ints and floats.
I definitely recommend the first way.

Calculators don't traditionally "declare" things, so its unclear what your calcualtor knows when it is parsing an expression.
If I assume that you "declare i int, r real" before the expression "i*r" is parsed, you seem to have several questions:
a) how do you know as you parse whether i and r have been declared? The technical answer is that during parsing you don't have to know; you can parse, build up a tree, and do such checking later. On a practical level, people often weave symbol lookups into the parsing process (this gets messier as your language gets bigger, so it isn't recommended for other than calculators [you'll discover that most C compilers do this, adding to their messiness]). The answer is easy: keep a list of defined symbol strings around, and when you encounter an identifier, look to see if its in the list.
b) how do you know the type of "i" or "r"? Easy. Associate with the symbol string, the declared type, e.g., , . Associated sets of declarations are commonly called symbol tables.
c) how do you know if operations are operating on the same ("the right") kind of values? Here you need to associate with every operand, its "type". Constants have obvious type; 1.0 is real, 1 is integer. "i" is integer, and your parser knows it because it looked up the type (above); similarly for "r". Each expression term then has to check its operands for compatibility. What might not be obvious is that each expression has to compute it result type, e.g., 3 * 4.0 is real, not integer. So in parallel to the parsing machinery, you need to propagate a type.

+1 to voidlogic. His answer should give you a basic idea of how to build a recursive descent parser. If you are having trouble with a certain part of yours, it would be nice to get a little more detail about how you are structuring your code.
If you would like to see an example of one, look at this implementation.

Here is a book that may help:
Compilers: Principles, Techniques and Tools ("Dragon Book") by A. Aho, M. Lam and R. Sethi.
Here is a set of tools that may help you:
GNU flex
GNU bison

Related

How do I identify a variable name in a piece of code

I am trying to write a halstead complexity measure in X++ (language isn't important) and I think the best way of doing this is by using regex on the source.
I have managed to do 90% of it but am struggling on variable names.
How do I identify a variable name in a piece of code.
Given the following piece of code
public void main()
{
int a, b, c, av;
className class;
strFmt("%1 %2 %3", a, b, c);
av = (a + b + c) / 3;
info("avg = %1");*/
if(a)
{
a++;
class.add(a);
}
else
{
b++;
class.subtract(b)
}
this.main();
}
I expect to be returned "a" "b" "c" "av" "class"
With the halstead it needs to count the instances of them. The way I was thinking is by storing the above in a list and then using whatever is in the list in a regex query. Catering for all possible uses of a variable would be insane.
I think you would have to reflect on the AOT in order to get the different variables.
You could use reflection with TreeNode or maybe you could use the XPPCompiler to get info on the objects you're processing to help:
info(strFmt("%1", new xppCompiler().dumpClass('salesformletter')));
This question made me sort of curious about how to do this and I came across this great post that has a custom AX tool to measure complexity plus a 175 page grad paper written about it.
http://bojanjovicic.com/complexity-tool-dynamics-ax-2009/
I'm experimenting with it now and looking how I can build onto it.
I'm back with the actual answer! Use the SysScannerClass and TreeNode object to correctly parse the code. Here's a beautiful sample I wrote that should make it cake.
static void JobParseSourceCode(Args _args)
{
TreeNode treeNode = TreeNode::findNode(#'\Data Dictionary\Tables\SalesTable\Methods\find');
SysScannerClass sysScannerClass = new SysScannerClass(treeNode);
int symbol;
int curLine;
str lineStr;
setPrefix("Scanning " + treeNode.treeNodePath());
for (symbol = sysScannerClass.firstSymbol(); symbol; symbol = sysScannerClass.nextSymbol())
{
if (curLine != sysScannerClass.line())
{
curLine = sysScannerClass.line();
lineStr = sysScannerClass.sourceLine(curLine);
}
// NOTE: symbol corresponds to macros in #TokenTypes
info(strFmt("Line %1: %2\t(Col %3): '%4' - MacroValue: %5, [%6]", curLine, lineStr, sysScannerClass.col(), sysScannerClass.strValue(), symbol, xppScanner::symbolClass(symbol)));
}
}
Well, the example does not quite qualify as X++ source, because class is a reserved word and cannot be used for a variable name.
Besides that a crude search for [a-zA-Z_][a-zA-Z_0-9]+ would give you all strings which could be a variable name. But without a full parser you would have trouble determining whether it is a keyword, class name, table name et cetera or a genuine variable name.
You could also use TextBuffer to tokenize your source:
static void TokenTest(Args _args)
{
str src = #'
public void main()
{
int a = 7, b = 11, c = 13, av;
info(strFmt("%1 %2 %3", a, b, c));
av = (a + b + c) / 3;
info(strFmt("avg = %1"));
this.main();
}
';
TextBuffer t = new TextBuffer();
t.ignoreCase(true);
t.setText(src); // Set the text to break in to tokens
while (t.nextToken(false,' (){}.,:;!=+-*/\n')) // The delimiters to search
{
info(t.token());
}
}
This will not work with strings and comments of course.
There is even an undocumented Keywords kernel class to play with!
Maybe the best choice would be to integrate with the cross reference tool, it has done the splitting for you!
I am afraid your remaining 10% may take 90% of your time!
You can use regex101.com to play with regex. I think you can play with the look-ahead (?=...) and the look-behind(?<=...) groups:
This regex will match all your variables:
/(?!void)(?<=[ \(])[a-z]+(?=[, ;+*\/\)])/
And here the proof:
http://regex101.com/r/hS9dQ6/2
I ended up cheating with the solution. I had already got all of the operators information such as int/public/methods etc... so I just used substituion on the source and then ran the following regex which found me the operands for the metric.
'_?\w+(?=([^"]*"[^"]*")*[^"]*$)|".+"'
There were some really good answers on here so I am going to look into using a hybrid of them to improve the implementation at a later date but for now we are getting the information we need and it seems to work for all cases we have tested it on.
If anyone is interested the regex I used for the operators is the following
(?i)\(|\{|\w+(?=(\(|:|\.))|\w+(?=\s\w)|(break|continue|return|true|false|retry|asc|breakpoint|desc|null|pause|throw|ttsAbort|ttsBegin|ttsCommit)(?=;)|((try|catch|else|by|do)(?=\n))|(\+=|-=|>=|<=|==|!=|=|\+\+|--|<<|>>|&&|\|\||\*|\/|\+|-|~|&|\^|\||>|<|!|\?|::|:|\.)+(?=([^"]*"[^"]*")*[^"]*$)
It has all the reserved keywords which are not covered by the first 4 statements, I also got the list of operators which x++ use.
It will need some modification to be used in other languages but considering other languages have better ways of dealing with these things you probably don't need it.
Thanks for all your answers

How to verify a data type input

I have a queue and I need to verify the input data type and handle the exception in case the data input isn't the same as the data type in the queue, how can I do this?
MAIN.cpp
try {
cout << "Insert character: ";
cin >> ch;
prova.push(ch);
}
catch (wrong_insert& k) {
k.allert();
};
This is the push function:
template <class t>
void queue<t>::push(const t& entry)
{
if(*I need this condition*) throw wrong_insert();
if(empty())
{
head_insert(front_ptr, entry);
rear_ptr = front_ptr;
}
else
{
insert(rear_ptr, entry);
rear_ptr = rear_ptr->link();
}
++count;
cout << "Inserted!" << endl;
}
and this is the exception class:
class wrong_insert
{
public:
wrong_insert() : message("Wrong data inserted!"){};
void allert(){ cout << message;};
private:
string message;
};
I would like to add this as a comment but it does not allow me to do so since i don't have 50 reputations yet.
I think if the type is wrong it will not compile in the first place.
If I understand your question right you are asking how you can check that no-one tries to insert say an integer into your queue of chars.
The answer is that you cannot, and need not do that at runtime. As C++ is a strongly typed language you will not be able to compile code where the data types are incompatible. Converting a char to an int is perfectly allowed and will not generate an error, but passing an int as a char may trigger an error/warning depending on the situation.
Note however, that you are in some cases allowed to pass an integer to a char if (a) asking specifically for it with a cast, or (b) if the compiler can know that the value of the integer is small enough to fit into a char. This happens at compile time. (NB Depending on your version of C++ such automatic conversions may give you either a warning or an error, so pay attention to your warnings.)
If you are planning to use this queue with classes that inherit from each other, your question becomes more relevant, as you are allowed to pass an object of a derived type as a reference or pointer of the base-class type. Here you can use the typeid operator, but know that that will trigger RTTI (RunTime Type Identification) and clutter up your executable with some extra code.
C++ is all about "pay for it only if you use it", and if you need it (and the performance doesn't suffer too much) use it.

if then else in ocamlyacc

Can anyone brief how can I implement if then else in Ocamlyacc. I've defined tokens from lexical analyzer(ocamllex) namely IF, THEN, ELSE. For the if statement, I have defined tokens: GREATERTHAN, LESSERTHAN, EQUALTO for integers. I've searched many tutorials but to no avail!
UPDATE:
I want to interpret the result and return the value of the expression dictated by the if-else statement.
You have to define rules :
ifthenelse :
| IF condition THEN statement ELSE statement { IfThenElse($1,$2,$3) }
condition :
| INT EQUALTO INT { Cond(EqualTo,$1,$3) }
| INT LESSERTHAN INT { Cond(LesserThan,$1,$3) }
| INT GREATERTHAN INT { Cond(GeaterThan,$1,$3) }
Don't forget to define regular expression for int, in your lex fil
Perhaps you've seen it, but the OCaml manual gives a complete ocamllex/ocamlyacc example that calculates the values of expressions: Desk Calculator Example.
The example shows that you can calculate your result in the ocamlyacc actions if you want to. For a simple example, it's not at all hard to follow. In a more realistic case, you would probably want to construct an abstract syntax tree for later processing (such as evaluation). The code has a similar flavor except that the cases are given by the different constructors of your AST type rather than by the different grammar rules.

Infix equation solver c++ while loop stack

Im creating an infix problem solver and it crashes in the final while loop to finish the last part a of the equations.
I call a final while loop in main to solve whats left on the stack and it hangs there and if i pop the last element from the stack it will leave the loop and return the wrong answer.
//
//
//
//
//
#include <iostream>
#include<stack>
#include<string>
#include <ctype.h>
#include <stdlib.h>
#include <stdio.h>
#include <sstream>
using namespace std;
#define size 30
int count=0;
int count2=0;
int total=0;
stack< string > prob;
char equ[size];
char temp[10];
string oper;
string k;
char t[10];
int j=0;
char y;
int solve(int f,int s, char o)
{
cout<<"f="<<f<<endl;
cout<<"s="<<s<<endl;
cout<<"o="<<o<<endl;
int a;
if (o== '*')//checks the operand stack for operator
{
cout << f << "*" << s << endl;
a= f*s;
}
if (o == '/')//checks the operand stack for operator
{
cout << f << "/" << s << endl;
if(s==0)
{
cout<<"Cant divide by 0"<<endl;
}
else
a= f/s;
}
if (o == '+')//checks the operand stack for operator
{
cout << f << "+" << s << endl;
a= f+s;
}
if (o == '-')//checks the operand stack for operator
{
cout << f << "-" << s << endl;
a= f-s;
}
return a;
}
int covnum()
{
int l,c;
k=prob.top();
for(int i=0;k[i]!='\n';i++)t[i]=k[i];
return l=atoi(t);
}
char covchar()
{
k=prob.top();
for(int i=0;k[i]!='\n';i++)t[i]=k[i];
return t[0];
}
void tostring(int a)
{
stringstream out;
out << a;
oper = out.str();
}
void charstack(char op)
{
oper=op;
prob.push(oper);
}
void numstack(char n[])
{
oper=n;
prob.push(oper);
}
void setprob()
{
int f,s;
char o;
char t;
int a;
int i;
t=covchar();
if(ispunct(t))
{
if(t=='(')
{
prob.pop();
}
if(t==')')
{
prob.pop();
}
else if(t=='+'||'-')
{
y=t;
prob.pop();
}
else if(t=='/'||'*')
{
y=t;
prob.pop();
}
}
cout<<"y="<<y<<endl;
i=covnum();
cout<<"i="<<i<<endl;
s=i;
prob.pop();
t=covchar();
cout<<"t="<<t<<endl;
if(ispunct(t))
{
o=t;
prob.pop();
}
i=covnum();
cout<<"i="<<i<<endl;
f=i;
prob.pop();
t=covchar();
if (t=='('||')')
{
prob.pop();
}
a=solve(f,s, o);
tostring(a);
prob.push(oper);
cout<<"A="<<prob.top()<<endl;
}
void postfix()
{
int a=0;
char k;
for(int i=0;equ[i]!='\0';i++)
{
if(isdigit(equ[i]))//checks array for number
{
temp[count]=equ[i];
count++;
}
if(ispunct(equ[i]))//checks array for operator
{
if(count>0)//if the int input is done convert it to a string and push to stack
{
numstack(temp);
count=0;//resets the counter
}
if(equ[i]==')')//if char equals the ')' then set up and solve that bracket
{
setprob();
i++;//pushes i to the next thing in the array
total++;
}
while(equ[i]==')')//if char equals the ')' then set up and solve that bracket
{
i++;
}
if(isdigit(equ[i]))//checks array for number
{
temp[count]=equ[i];
count++;
}
if(ispunct(equ[i]))
{
if(equ[i]==')')//if char equals the ')' then set up and solve that bracket
{
i++;
}
charstack(equ[i]);
}
if(isdigit(equ[i]))//checks array for number
{
temp[count]=equ[i];
count++;
}
}
}
}
int main()
{
int a=0;
char o;
int c=0;
cout<<"Enter Equation: ";
cin>>equ;
postfix();
while(!prob.empty())
{
setprob();
a=covnum();
cout<<a<<" <=="<<endl;
prob.pop();
cout<<prob.top()<<"<top before c"<<endl;
c=covnum();
a=solve(c,a,y);
}
cout<<"Final Awnser"<<a<<endl;
system ("PAUSE");
return 0;
}
Hope this isn't too harsh but it appears like the code is riddled with various problems. I'm not going to attempt to address all of them but, for starters, your immediate crashes deal with accessing aggregates out of bounds.
Example:
for(int i=0;k[i]!='\n';i++)
k is an instance of std::string. std::string isn't null-terminated. It keeps track of the string's length, so you should be do something like this instead:
for(int i=0;i<k.size();i++)
Those are the more simple kind of errors, but I also see some errors in the overall logic. For example, your tokenizer (postfix function) does not handle the case where the last part of the expression is an operand. I'm not sure if that's an allowed condition, but it's something an infix solver should handle (and I recommend renaming this function to something like tokenize as it's really confusing to have a function called 'postfix' for an infix solver).
Most of all, my advice to you is to make some general changes to your approach.
Learn the debugger. Can't stress this enough. You should be testing your code as you're writing it and using the debugger to trace through it and make sure that state variables are correctly set.
Don't use any global variables to solve this problem. It might be tempting to avoid passing things around everywhere, but you're going to make it harder to do #1 and you're also limiting the generality of your solution. That small time you saved by not passing variables is easily going to cost you much more time if you get things wrong. You can also look into making a class which stores some of these things as member variables which you can avoid passing in the class methods, but especially for temporary states like 'equ' which you don't even need after you tokenize it, just pass it into the necessary tokenize function and do away with it.
initialize your variables as soon as you can (ideally when they are first defined). I see a lot of obsolete C-style practices where you're declaring all your variables at the top of a scope. Try to limit the scope in which you use variables, and that'll make your code safer and easier to get correct. It ties in with avoiding globals (#2).
Prefer alternatives to macros when you can, and when you can't, use BIG_UGLY_NAMES for them to distinguish them from everything else. Using #define to create a preprocessor definition for 'size' actually prevents the code above using the string's 'size' method from working. That can and should be a simple integral constant or, better yet, you can simply use std::string for 'equ' (aside from making it not a file scope global).
Prefer standard C++ library headers when you can. <ctype.h> should be <cctype>, <stdlib.h> should be <cstdlib>, and <stdio.h> should be <stdio>. Mixing non-standard headers with .h extension and standard headers in the same compilation unit can cause problems in some compilers and you'll also miss out on some important things like namespace scoping and function overloading.
Finally, take your time with the solution and put some care and love into it. I realize that it's homework and you're under a deadline, but you'll be facing even tougher deadlines in the real world where this kind of coding just won't be acceptable. Name your identifiers properly, format your code legibly, document what your functions do (not just how each line of code works which is something you actually shouldn't be doing so much later as you understand the language better). Some coding TLC will take you a long way. Really think about how to design solutions to a problem (if we're taking a procedural approach, decompose the problem into procedures as general units of work and not a mere chopped up version of your overall logic). #2 will help with this.
** Example: rather than a function named 'postfix' which works with some global input string and manipulates some global stack and partially evaluates the expression, make it accept an input string and return* the individual tokens. Now it's a general function you can reuse anywhere and you also reduced it to a much easier problem to solve and test. Document it and name it that way as well, focusing on the usage and what it accepts and returns. For instance:
// Tokenize an input string. Returns the individual tokens as
// a collection of strings.
std::vector<std::string> tokenize(const std::string& input);
This is purely an example and it may or may not be the best one for this particular problem, but if you take a proper approach to designing procedures, the end result is that you should have built yourself a library of well-tested, reusable code that you can use again and again to make all your future projects that much easier. You'll also make it easier to decompose complex problems into a number of simpler problems to solve which will make everything easier and the whole coding and testing process much smoother.
I see a number of things which all likely contribute to the issue of it not working:
There are no error or bounds checking. I realize that this is homework and as such may have specific requirements/specifications which eliminate the need for some checks, but you still need some to ensure you are correctly parsing the input. What if you exceed the array size of equ/tmp/t? What if your stack is empty when you try to pop/top it?
There are a few if statements that look like else if (t == '+' || '-') which most likely doesn't do what you want them to. This expression is actually always true since '-' is non-zero and is converted to a true value. You probably want else if (t == '+' || t == '-').
As far as I can tell you seem to skip parsing or adding '(' to the stack which should make it impossible for you to actually evaluate the expression properly.
You have a while loop in the middle of postfix() which skips multiple ')' but doesn't do anything.
Your code is very hard to follow. Properly naming variables and functions and eliminating most of the globals (you don't actually need most of them) would help a great deal as would proper indentation and add a few spaces in expressions.
There are other minor issues not particularily worth mentioning. For example the covchar() and covnum() functions are much more complex than needed.
I've written a few postfix parsers over the years and I can't really follow what you are trying to do, which isn't to say the way you're trying is impossible but I would suggest re-examining at the base logic needed to parse the expression, particularly nested levels of brackets.

In which case is if(a=b) a good idea? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Inadvertent use of = instead of ==
C++ compilers let you know via warnings that you wrote,
if( a = b ) { //...
And that it might be a mistake that you certainly wanted to write:
if( a == b ) { //...
But is there a case where the warning should be ignored, because it's a good way to use this "feature"?
I don't see any code clarity reason possible, so is there a case where it’s useful?
Two possible reasons:
Assign & Check
The = operator (when not overriden) normally returns the value that it assigned. This is to allow statements such as a=b=c=3. In the context of your question, it also allows you to do something like this:
bool global;//a global variable
//a function
int foo(bool x){
//assign the value of x to global
//if x is equal to true, return 4
if (global=x)
return 4;
//otherwise return 3
return 3;
}
...which is equivalent to but shorter than:
bool global;//a global variable
//a function
int foo(bool x){
//assign the value of x to global
global=x;
//if x is equal to true, return 4
if (global==true)
return 4;
//otherwise return 3
return 3;
}
Also, it should be noted (as stated by Billy ONeal in a comment below) that this can also work when the left-hand argument of the = operator is actually a class with a conversion operator specified for a type which can be coerced (implicitly converted) to a bool. In other words, (a=b) will evaulate to true or false if a is of a type which can be coerced to a boolean value.
So the following is a similar situation to the above, except the left-hand argument to = is an object and not a bool:
#include <iostream>
using namespace std;
class Foo {
public:
operator bool (){ return true; }
Foo(){}
};
int main(){
Foo a;
Foo b;
if (a=b)
cout<<"true";
else
cout<<"false";
}
//output: true
Note: At the time of this writing, the code formatting above is bugged. My code (check the source) actually features proper indenting, shift operators and line spacing. The <'s are supposed to be <'s, and there aren't supposed to be enourmous gaps between each line.
Overridden = operator
Since C++ allows the overriding of operators, sometimes = will be overriden to do something other than what it does with primitive types. In these cases, the performing the = operation on an object could return a boolean (if that's how the = operator was overridden for that object type).
So the following code would perform the = operation on a with b as an argument. Then it would conditionally execute some code depending on the return value of that operation:
if (a=b){
//execute some code
}
Here, a would have to be an object and b would be of the correct type as defined by the overriding of the = operator for objects of a's type. To learn more about operator overriding, see this wikipedia article which includes C++ examples: Wikipedia article on operator overriding
while ( (line = readNextLine()) != EOF) {
processLine();
}
You could use to test if a function returned any error:
if (error_no = some_function(...)) {
// Handle error
}
Assuming that some_function returns the error code in case of an error. Or zero otherwise.
This is a consequence of basic feature of the C language:
The value of an assignment operation is the assigned value itself.
The fact that you can use that "return value" as the condition of an if() statement is incidental.
By the way, this is the same trick that allows this crazy conciseness:
void strcpy(char *s, char *t)
{
while( *s++ = *t++ );
}
Of course, the while exits when the nullchar in t is reached, but at the same time it is copied to the destination s string.
Whether it is a good idea, usually not, as it reduce code readability and is prone to errors.
Although the construct is perfectly legal syntax and your intent may truly be as shown below, don't leave the "!= 0" part out.
if( (a = b) != 0 ) {
...
}
The person looking at the code 6 months, 1 year, 5 years from now, at first glance, is simply going to believe the code contains a "classic bug" written by a junior programmer and will try to "fix" it. The construct above clearly indicates your intent and will be optimized out by the compiler. This would be especially embarrassing if you are that person.
Your other option is to heavily load it with comments. But the above is self-documenting code, which is better.
Lastly, my preference is to do this:
a = b;
if( a != 0 ) {
...
}
This is about a clear as the code can get. If there is a performance hit, it is virtually zero.
A common example where it is useful might be:
do {
...
} while (current = current->next);
I know that with this syntax you can avoid putting an extra line in your code, but I think it takes away some readability from the code.
This syntax is very useful for things like the one suggested by Steven Schlansker, but using it directly as a condition isn't a good idea.
This isn't actually a deliberate feature of C, but a consequence of two other features:
Assignment returns the assigned value
This is useful for performing multiple assignments, like a = b = 0, or loops like while ((n = getchar()) != EOF).
Numbers and pointers have truth values
C originally didn't have a bool type until the 1999 standard, so it used int to represent Boolean values. Backwards compatibility requires C and C++ to allow non-bool expressions in if, while, and for.
So, if a = b has a value and if is lenient about what values it accepts, then if (a = b) works. But I'd recommend using if ((a = b) != 0) instead to discourage anyone from "fixing" it.
You should explicitly write the checking statement in a better coding manner, avoiding the assign & check approach. Example:
if ((fp = fopen("filename.txt", "wt")) != NULL) {
// Do something with fp
}
void some( int b ) {
int a = 0;
if( a = b ) {
// or do something with a
// knowing that is not 0
}
// b remains the same
}
But is there a case where the warning
should be ignored because it's a good
way to use this "feature"? I don't see
any code clarity reason possible so is
there a case where its useful?
The warning can be suppressed by placing an extra parentheses around the assignment. That sort of clarifies the programmer's intent. Common cases I've seen that would match the (a = b) case directly would be something like:
if ( (a = expression_with_zero_for_failure) )
{
// do something with 'a' to avoid having to reevaluate
// 'expression_with_zero_for_failure' (might be a function call, e.g.)
}
else if ( (a = expression2_with_zero_for_failure) )
{
// do something with 'a' to avoid having to reevaluate
// 'expression2_with_zero_for_failure'
}
// etc.
As to whether writing this kind of code is useful enough to justify the common mistakes that beginners (and sometimes even professionals in their worst moments) encounter when using C++, it's difficult to say. It's a legacy inherited from C and Stroustrup and others contributing to the design of C++ might have gone a completely different, safer route had they not tried to make C++ backwards compatible with C as much as possible.
Personally I think it's not worth it. I work in a team and I've encountered this bug several times before. I would have been in favor of disallowing it (requiring parentheses or some other explicit syntax at least or else it's considered a build error) in exchange for lifting the burden of ever encountering these bugs.
while( (l = getline()) != EOF){
printf("%s\n", l);
}
This is of course the simplest example, and there are lots of times when this is useful. The primary thing to remember is that (a = true) returns true, just as (a = false) returns false.
Preamble
Note that this answer is about C++ (I started writing this answer before the tag "C" was added).
Still, after reading Jens Gustedt's comment, I realized it was not the first time I wrote this kind of answer. Truth is, this question is a duplicate of another, to which I gave the following answer:
Inadvertent use of = instead of ==
So, I'll shamelessly quote myself here to add an important information: if is not about comparison. It's about evaluation.
This difference is very important, because it means anything can be inside the parentheses of a if as long as it can be evaluated to a Boolean. And this is a good thing.
Now, limiting the language by forbidding =, where all other operators are authorized, is a dangerous exception for the language, an exception whose use would be far from certain, and whose drawbacks would be numerous indeed.
For those who are uneasy with the = typo, then there are solutions (see Alternatives below...).
About the valid uses of if(i = 0) [Quoted from myself]
The problem is that you're taking the problem upside down. The "if" notation is not about comparing two values like in some other languages.
The C/C++ if instruction waits for any expression that will evaluate to either a Boolean, or a null/non-null value. This expression can include two values comparison, and/or can be much more complex.
For example, you can have:
if(i >> 3)
{
std::cout << "i is less than 8" << std::endl
}
Which proves that, in C/C++, the if expression is not limited to == and =. Anything will do, as long as it can be evaluated as true or false (C++), or zero non-zero (C/C++).
About valid uses
Back to the non-quoted answer.
The following notation:
if(MyObject * p = findMyObject())
{
// uses p
}
enables the user to declare and then use p inside the if. It is a syntactic sugar... But an interesting one. For example, imagine the case of an XML DOM-like object whose type is unknown well until runtime, and you need to use RTTI:
void foo(Node * p_p)
{
if(BodyNode * p = dynamic_cast<BodyNode *>(p_p))
{
// this is a <body> node
}
else if(SpanNode * p = dynamic_cast<SpanNode *>(p_p))
{
// this is a <span> node
}
else if(DivNode * p = dynamic_cast<DivNode *>(p_p))
{
// this is a <div> node
}
// etc.
}
RTTI should not be abused, of course, but this is but one example of this syntactic sugar.
Another use would be to use what is called C++ variable injection. In Java, there is this cool keyword:
synchronized(p)
{
// Now, the Java code is synchronized using p as a mutex
}
In C++, you can do it, too. I don't have the exact code in mind (nor the exact Dr. Dobb's Journal's article where I discovered it), but this simple define should be enough for demonstration purposes:
#define synchronized(lock) \
if (auto_lock lock_##__LINE__(lock))
synchronized(p)
{
// Now, the C++ code is synchronized using p as a mutex
}
(Note that this macro is quite primitive, and should not be used as is in production code. The real macro uses a if and a for. See sources below for a more correct implementation).
This is the same way, mixing injection with if and for declaration, you can declare a primitive foreach macro (if you want an industrial-strength foreach, use Boost's).
About your typo problem
Your problem is a typo, and there are multiple ways to limit its frequency in your code. The most important one is to make sure the left-hand-side operand is constant.
For example, this code won't compile for multiple reasons:
if( NULL = b ) // won't compile because it is illegal
// to assign a value to r-values.
Or even better:
const T a ;
// etc.
if( a = b ) // Won't compile because it is illegal
// to modify a constant object
This is why in my code, const is one of the most used keyword you'll find. Unless I really want to modify a variable, it is declared const and thus, the compiler protects me from most errors, including the typo error that motivated you to write this question.
But is there a case where the warning should be ignored because it's a good way to use this "feature"? I don't see any code clarity reason possible so is there a case where its useful?
Conclusion
As shown in the examples above, there are multiple valid uses for the feature you used in your question.
My own code is a magnitude cleaner and clearer since I use the code injection enabled by this feature:
void foo()
{
// some code
LOCK(mutex)
{
// some code protected by a mutex
}
FOREACH(char c, MyVectorOfChar)
{
// using 'c'
}
}
... which makes the rare times I was confronted to this typo a negligible price to pay (and I can't remember the last time I wrote this type without being caught by the compiler).
Interesting sources
I finally found the articles I've had read on variable injection. Here we go!!!
FOR_EACH and LOCK (2003-11-01)
Exception Safety Analysis (2003-12-01)
Concurrent Access Control & C++ (2004-01-01)
Alternatives
If one fears being victim of the =/== typo, then perhaps using a macro could help:
#define EQUALS ==
#define ARE_EQUALS(lhs,rhs) (lhs == rhs)
int main(int argc, char* argv[])
{
int a = 25 ;
double b = 25 ;
if(a EQUALS b)
std::cout << "equals" << std::endl ;
else
std::cout << "NOT equals" << std::endl ;
if(ARE_EQUALS(a, b))
std::cout << "equals" << std::endl ;
else
std::cout << "NOT equals" << std::endl ;
return 0 ;
}
This way, one can protect oneself from the typo error, without needing a language limitation (that would cripple language), for a bug that happens rarely (i.e., almost never, as far as I remember it in my code).
There's an aspect of this that hasn't been mentioned: C doesn't prevent you from doing anything it doesn't have to. It doesn't prevent you from doing it because C's job is to give you enough rope to hang yourself by. To not think that it's smarter than you. And it's good at it.
Never!
The exceptions cited don't generate the compiler warning. In cases where the compiler generates the warning, it is never a good idea.
RegEx sample
RegEx r;
if(((r = new RegEx("\w*)).IsMatch()) {
// ... do something here
}
else if((r = new RegEx("\d*")).IsMatch()) {
// ... do something here
}
Assign a value test
int i = 0;
if((i = 1) == 1) {
// 1 is equal to i that was assigned to a int value 1
}
else {
// ?
}
My favourite is:
if (CComQIPtr<DerivedClassA> a = BaseClassPtr)
{
...
}
else if (CComQIPtr<DerivedClassB> b = BaseClassPtr)
{
...
}