Lex is not returning what I want - c++

%{
#include<stdio.h>
int n_chars = 0;
int n_lines = 0;
%}
%%
"if"|"else"|"while"|"do"|"switch"|"case" {
printf("Keyword");
}
[a-zA-Z][a-z|0-9]* {printf("Identifier");}
[0-9]* {printf("Number");}
"!"|"#"|"*"|"&"|"^"|"%"|"$"|"#" {printf("Special Character");}
\n { ++n_lines, ++n_chars; }
. ++n_chars;
%%
int yywrap() {
return 1;
}
main(int argc[], char *argv[]) {
yyin = fopen("index.txt", "r");
printf("Number of characters is: %d", n_chars);
yylex();
return 0;
}
My code above returns: Number of characters is: 0
The content of my file index.txt is:
if hello #
while 1
do test
Why does it return 0? What I expect is the number of all characters and also it should tell me if it is a keyword, an identifier or a special character.
I must be doing something wrong, since I am very new to this.
I am using EditPlus. So any help would be appreciated!

There are at least two problems with your code.
You print n_chars before calling yylex.
The last rule for . will not be matched for anything that is matches by one of the rules above, so you will not get the number of chars with this approach.
With calling yylex first, I get the number of "other" characters, such as spaces and newlines.
To count all characters, you can
Add the statement n_chars += strlen (yytext); to the first four rules to count the characters that were matched by the rule.
Add the statement REJECT to the first four rules to continue searching and therefor match the. with the action ++n_chars;.

Related

How to match *anything* until a delimiter is encountered in RE-flex lexer?

I was using RE/flex lexer for my project. In that, I want to match the syntax corresponding to ('*)".*?"\1. For eg, it should match "foo", ''"bar"'', but should not match ''"baz"'.
But RE/flex matcher doesn't work with lookaheads, lookbehinds and backreferences. So, is there a correct way to match this using reflex matcher? The nearest I could achieve was the following lexer:
%x STRING
%%
'*\" {
textLen = 0uz;
quoteLen = size();
start(STRING);
}
<STRING> {
\"'* {
if (size() - textLen < quoteLen) goto MORE_TEXT;
matcher().less(textLen + quoteLen);
start(INITIAL);
res = std::string{matcher().begin(), textLen};
return TokenKind::STR;
}
[^"]* {
MORE_TEXT:
textLen = size();
matcher().more();
}
<<EOF>> {
std::cerr << "Lexical error: Unterminated 'STRING' \n";
return TokenKind::ERR;
}
}
%%
The meta-character . in RE-flex matches any character, be it valid or invalid UTF8 sequence. Whereas the inverted character class - [^...] - matches only valid UTF8 sequences that are absent in the character class.
So, the problem with above lexer is that, it matches only valid UTF8 sequences inside strings. Whereas, I want it to match anything inside string until the delimiter.
I considered three workarounds. But all three seems to have some issues.
Use skip(). This skips all characters till it reaches delimiter. But in the process, it consumes all the string content. I don't get to keep them.
Use .*?/\" instead of [^"]*. This works for every properly terminated strings. But gets the lexer jammed if the string is not terminated.
Use consume string content character by character using .. Since . is synchronizing, it can even match invalid UTF8 sequences. But this approach feels way too slow.
So is there any better approach for solving this?
I didn't found any proper way to solve the problem. But I just did a dirty hack with 2nd workaround mentioned above.
Instead of RE/flex generated scanner loop, I added a custom loop inside string begin rule. In there, instead of failing with scanner jammed error, I am flushing remaining text and displaying unterminated string error message.
%x STRING
%%
'*\" {
auto textLen = 0uz;
const auto quoteLen = size();
matcher().pattern(PATTERN_STRING);
while (true) {
switch (matcher().scan()) {
case 1:
if (size() - textLen < quoteLen) break;
matcher().less(textLen + quoteLen);
res = std::string{matcher().begin(), textLen};
return TokenKind::STR;
case 0:
if (!matcher().at_end()) matcher().set_end(true);
std::cerr << "Lexical error: Unterminated 'STRING' \n";
return TokenKind::ERR;
default:
std::unreachable();
case 2:;
}
textLen = size();
matcher().more();
}
}
<STRING>{
\"'* |
.*?/\" |
<<EOF>> std::unreachable();
}
%%

Why is isdigit() not working?

Code:
#include <iostream>
#include <string>
using namespace std;
string s1,s2,s3,s4,s5;
int ex(string s){
int i;
if(isdigit(s)){
i = atoi(s.c_str);
}
else
return -1;
return i;
}
int main(){
int t;cin>>t;int v1,v2,v3;
while(t--){
cin>>s1>>s2>>s3>>s4>>s5;
v1=ex(s1);
v2=ex(s2);
v3=ex(s3);
if(v1<0) v1=v3-v2;
if(v2<0) v2=v3-v1;
if(v3<0) v3=v1+v2;
cout<<v1<<" + "<<v2<<" = "<<v3;
}
}
return 0;
}
Error:
error: no matching function for call to 'isdigit(std::string&)'
if(isdigit(s)){
I tried searching all the previous posts regarding this but still could not figure out why isdigit(s) function is not working.
And the question is there will be input of the form
47 + machula = 53, where machula is some word
and output should be 47 + 6 = 53.
isdigit is meant to check whether a single character is a digit or not, not a string. That's why the call isdigit(s) fails to compile.
You could use std::stoi. However, keep in mind that it will throw an exception if no conversion could be performed by the function.
try
{
i = std::stoi(s);
}
catch ( ... )
{
// Deal with the exception
}
You could also check whether the first character of the string is a digit before attempting to use std::stoi.
if ( !(s.empty()) && isdigit(s[0]) )
{
i = std::stoi(s);
}
NB
From the comment by #RemyLebeau:
The above check because that does not guarantee that all characters in the string are digits. std::stoi() parses the entire string and then reports the index of the first non-digit character, even if that is the null terminator. It also skips leading whitespace, so checking the first character may cause a false result where std::stoi() would have normally succeeded.

Why the regular expression "//" and "/*" can't match the single comment and block comment?

I want to calculate the "empty line","single comment","block comment" about c++ program.
I write the tool use flex.But the tool can't match the c++ block comment.
1 flex code:
%{
int block_flag = 0;
int empty_num = 0;
int single_line_num = 0;
int block_line_num = 0;
int line = 0;
%}
%%
^[\t ]*\n {
empty_num++;
printf("empty line\n");
}
"//" {
single_line_num++;
printf("single line comment\n");
}
"/*" {
block_flag = 1;
block_line_num++;
printf("block comment begin.block line:%d\n", block_line_num);
}
"*/" {
block_flag = 0;
printf("block comment end.block line:%d\n", block_line_num);
}
^(.*)\n {
if(block_flag)
block_line_num++;
else
line++;
}
%%
int main(int argc , char *argv[])
{
yyin = fopen(argv[1], "r");
yylex();
printf("lines :%d\n" ,line);
fclose(yyin);
return 0;
}
2 hello.c
bbg#ubuntu:~$ cat hello.c
#include <stdlib.h>
//
//
/*
*/
/* */
3 output
bbg#ubuntu:~$ ./a.out hello.c
empty line
empty line
lines :6
Why the "//" and "/*" can't match the single comment and block comment ?
Flex:
doesn't search. It matches patterns sequentially, each one starting where the other one ends.
always picks the pattern with the longest match. (If two or more patterns match exactly the same amount, it picks the first one.
So, you have
"//" { /* Do something */ }
and
^.*\n { /* Do something else */ }
Suppose it has just matched the second one, so we're at the beginning of a line, and suppose the line starts //. Now, both these patterns match, but the second one matches the whole line, whereas the first one only matches two characters. So the second one wins. That wasn't what you wanted.
Hint 1: You probably want // comments to match to the end of the line
Hint 2: There is a regular expression which will match /* comments, although it's a bit tedious: "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/" Unfortunately, if you use that, it won't count line ends for you, but you should be able to adapt it to do what you want.
Hint 3: You might want to think about comments which start in the middle of a line, possibly having been indented. You rule ^.*\n will swallow an entire line without even looking to see if there is a comment somewhere inside it.
Hint 4: String literals hide comments.

sscanf for this type of string

I'm not quite sure even after reading the documentation how to do this with sscanf.
Here is what I want to do:
given a string of text:
Read up to the first 64 chars or until space is reached
Then there will be a space, an = and then another space.
Following that I want to extract another string either until the end of the string or if 8192 chars are reached. I would also like it to change any occurrences in the second string of "\n" to the actual newline character.
I have: "%64s = %8192s" but I do not think this is correct.
Thanks
Ex:
element.name = hello\nworld
Would have string 1 with element.name and string2 as
hello
world
I do recommend std::regex for this, but apart from that, you should be fine with a little error checking:
#include <cstdio>
int main(int argc, const char *argv[])
{
char s1[65];
char s2[8193];
if (2!=std::scanf("%64s = %8192s", s1, s2))
puts("oops");
else
std::printf("s1 = '%s', s2 = '%s'\n", s1, s2);
return 0;
}
Your format string looks right to me; however, sscanf will not change occurences of "\n" to anything else. To do that you would then need to write a loop that uses strtok or even just a simple for loop evaluating each character in the string and swapping it for whatever character you prefer. You will also need to evaluate the sscanf return value to determine if the 2 strings were indeed scanned correctly. sscanf returns the number of field successfully scanned according to your format string.
#sehe shows the correct usage of sscanf including the check for the proper return value.

C++ printf: newline (\n) from commandline argument

How print format string passed as argument ?
example.cpp:
#include <iostream>
int main(int ac, char* av[])
{
printf(av[1],"anything");
return 0;
}
try:
example.exe "print this\non newline"
output is:
print this\non newline
instead I want:
print this
on newline
No, do not do that! That is a very severe vulnerability. You should never accept format strings as input. If you would like to print a newline whenever you see a "\n", a better approach would be:
#include <iostream>
#include <cstdlib>
int main(int argc, char* argv[])
{
if ( argc != 2 ){
std::cerr << "Exactly one parameter required!" << std::endl;
return 1;
}
int idx = 0;
const char* str = argv[1];
while ( str[idx] != '\0' ){
if ( (str[idx]=='\\') && (str[idx+1]=='n') ){
std::cout << std::endl;
idx+=2;
}else{
std::cout << str[idx];
idx++;
}
}
return 0;
}
Or, if you are including the Boost C++ Libraries in your project, you can use the boost::replace_all function to replace instances of "\\n" with "\n", as suggested by Pukku.
At least if I understand correctly, you question is really about converting the "\n" escape sequence into a new-line character. That happens at compile time, so if (for example) you enter the "\n" on the command line, it gets printed out as "\n" instead of being converted to a new-line character.
I wrote some code years ago to convert escape sequences when you want it done. Please don't pass it as the first argument to printf though. If you want to print a string entered by the user, use fputs, or the "%s" conversion format:
int main(int argc, char **argv) {
if (argc > 1)
printf("%s", translate(argv[1]));
return 0;
}
You can't do that because \n and the like are parsed by the C compiler. In the generated code, the actual numerical value is written.
What this means is that your input string will have to actually contain the character value 13 (or 10 or both) to be considered a new line because the C functions do not know how to handle these special characters since the C compiler does it for them.
Alternatively you can just replace every instance of \\n with \n in your string before sending it to printf.
passing user arguments directly to printf causes a exploit called "String format attack"
See Wikipedia and Much more details
There's no way to automatically have the string contain a newline. You'll have to do some kind of string replace on your own before you use the parameter.
It is only the compiler that converts \n etc to the actual ASCII character when it finds that sequence in a string.
If you want to do it for a string that you get from somewhere, you need to manipulate the string directly and replace the string "\n" with a CR/LF etc. etc.
If you do that, don't forget that "\\" becomes '\' too.
Please never ever use char* buffers in C++, there is a nice std::string class that's safer and more elegant.
I know the answer but is this thread is active ?
btw
you can try
example.exe "print this$(echo -e "\n ")on newline".
I tried and executed
Regards,
Shahid nx