Flex error negative range in character class - regex

I am writing a parser using Flex and Bison and have defined various tokens as:
[0-9]+ { yylval.str=strdup(yytext); return digit; }
[0-9]+\.[0-9]* { yylval.str=strdup(yytext); return floating; }
[a-zA-Z_][a-zA-Z0-9_]* { yylval.str=strdup(yytext); return key; }
[a-zA-Z/][a-zA-Z_-/.]* { yylval.str=strdup(yytext); return string; }
[a-zA-Z0-9._-]+ { yylval.str=strdup(yytext); return hostname; }
["][a-zA-Z0-9!##$%^&*()_-+=.,/?]* { yylval.str=strdup(yytext); return qstring1; }
[a-zA-Z0-9!##$%^&*()_-+=.,/?]*["] { yylval.str=strdup(yytext); return qstring2; }
[#].+ { yylval.str=strdup(yytext); return comment;}
[ \n\t] {} /* Ignore white space. */
. {printf("ERR:L:%d\n", q); return ERROR;}
And it shows an error "Negative Range in Character Class" in the regexps for string, qstring1 and qstring2.
Can someone please help me with where I went wrong?
The spec is that:
Non quoted strings may contain ASCII alphanumeric characters, underscores, hyphens, forward slash and period and must start with letter or slash.
Quoted strings may contain any alphanumeric character between the quotes.
I have taken two different strings for quoted strings for some more specifications to be fulfilled.
Thanks.

For (string, qstring1, qstring2) you need to either place the hyphen (-) as the first or last character of your character class [] or just simply escape it \- if elsewhere.
(string)
[a-zA-Z/][a-zA-Z_./-]*
(qstring1)
["][a-zA-Z0-9!##$%^&*()_+=.,/?-]*
(qstring2)
[a-zA-Z0-9!##$%^&*()_+=.,/?-]*["]

- needs to be escaped with a backslash.
For qstring1, try the following:
["][a-zA-Z0-9!##$%^&*()_\-+=.,/?]*

I guess while writing a regular expression you should always write it with it's priority order :
for example for this line of code :
[+-/*><=] {printf("Operator %c\n",yytext[0]); return yytext[0];} won't give any error.
whereas :
[+-*/><=] {printf("Operator %c\n",yytext[0]); return yytext[0];} will.
hope it helps.

Related

Evaluating a string against a pattern with RegExp in Flutter

I'm trying to evaluate a string against a set list of parameters with RegExp in Flutter. For example, the string must contain at least:
One capital letter
One lowercase letter
One number from 0-9
One special character, such as $ or !
This is basically for a password entry field of an application. I have set things up, firstly using validateStructure as follows:
abstract class PasswordValidator {
bool validateStructure(String value);
}
Then, I have used the RegExp function as follows:
class PasswordValidatorSpecial implements PasswordValidator {
bool validateStructure(String value) {
String pattern =
r'^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[!##\$&*~£]).{8,}$';
RegExp regEx = new RegExp(pattern);
return regEx.hasMatch(value);
}
}
This does work well, in a sense that when I pass a string/password through it, it does tell me if at least one of the criteria is not met. However, what I would like to do is for the output to be more specific, telling me which of those criteria isn't met.
For example, if the password were to have everything but a number (from 0-9) I would want to be able to get the output to specifically say that a number is missing, but everything else is present.
How would I adapt my code to be able to do that? I thought perhaps by using conditional 'if' statement, although I don't know how that would work. Thanks!
That's right, you can use RegExr to check your RegExp, separate each part and use them separately to have a custom error. Also instead of return a bool value, you can return a String value, such as the following function:
String validateStructure(String value) {
String patternUpperCaseCharacters = r'^(?=.*?[A-Z])';
String patternLowerCaseCharacters = r'^(?=.*?[a-z])';
String patternNumbers = r'^(?=.*?[0-9])';
String patternSpecialCharacters = r'^(?=.*?[!##\$&*~£])';
RegExp regEx = new RegExp(patternUpperCaseCharacters);
if (regEx.hasMatch(value)) {
regEx = new RegExp(patternLowerCaseCharacters);
if (regEx.hasMatch(value)) {
return "More errors";
} else {
return "You need at least one lowercase letter";
}
} else {
return "You need at least one capital letter";
}
}

regex for at least one character

How to check if string contains at least one character? I want to eliminate strings where are only special characters, so I've decided that the easiest way is to check if there is at least one character or digit, so I've created [a-zA-Z0-9]{1,} and [a-zA-Z0-9]+ but none of these work.
boost::regex noSpecialCharacters("[a-zA-Z0-9]+");
boost::regex noSpecialCharacters2("[a-zA-Z0-9]{1,}");
string tab[SIZE] = {"father", "apple is red"};
for (int i = 0; i < SIZE; i++) {
if (!boost::regex_match(tab[i], noSpecialCharacters)) {
puts("This is it!");
} else {
puts("or not");
}
if (!boost::regex_match(tab[i], noSpecialCharacters2)) {
puts("This is it!");
} else {
puts("or not");
}
}
for "apple is red" the answer is correct but for "father" it doesn't work.
apple is red won't match because, as per here (my bold):
Note that the result is true only if the expression matches the whole of the input sequence.
That means the spaces make it invalid. It then goes on to say (again, my bold):
If you want to search for an expression somewhere within the sequence then use regex_search.
If all you're looking for is one valid character somewhere in there, you can just use regex_match() with ".*[a-zA-Z0-9].*" or regex_search() with "[a-zA-Z0-9]".

using \ in a string as literal instead of an escape

bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
So the problem right now is: Xcode is not reading in the '\', when I was debugging in stringMatch function, expr appears only to be 'asb' instead of the literal a\sb'.
And Xcode is spitting out an warning at the line:
string a = "a\sb" : Unknown escape sequence
Edit: I have already tried using "a\\sb", it reads in as "a\\sb" as literal.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a\\sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
C and C++ deal with backslashes as escape sequences by default. You got to tell C to not use your backslash as an escape sequence by adding an extra backslash to your string.
These are the common escape sequences:
\a - Bell(beep)
\b - Backspace
\f - Formfeed
\n - New line
\r - Carriage Return
\t - Horizontal Tab
\\ - Backslash
\' - Single Quotation Mark
\" - Double Quatation Mark
\ooo - Octal Representation
\xdd - Hexadecimal Representaion
EDIT: Xcode is behaving abnormally on your machine. So I can suggest you this.
bool stringMatch(const char *expr, const char *str) {
// do something to compare *(expr+i) == '\\'
// In this case it is comparing against a backslash
// i is some integer
}
int main() {
string a = "a" "\x5C" "sb";
string b = "a b";
cout << stringMatch(a.c_str(), b.c_str()) << endl;
return 1;
}
Don't worry about the spaces in the string a declaration, Xcode concatenates strings separated with a space.
EDIT 2: Indeed Xcode is reading your "a\\b" literally, that's how it deals with escaped backslashes. When you'll output string a = "a\\sb" to console, you'll see, a\sb. But when you'll pass string a between methods as argument or as a private member then it will take the extra backslash literally. You have to design your code considering this fact so that it ignores the extra backslash. It's upto you how you handle the string.
EDIT 3: Edit 1 is your optimal answer here, but here's another one.
Add code in your stringMatch() method to replace double backslashes with single backslash.
You just need to add this extra line at the very start of the function:
expr=[expr stringByReplacingOccurrencesOfString:#"\\\\" withString:#"\\"];
This should solve the double backslash problem.
EDIT 4:
Some people think Edit 3 is ObjectiveC and thus is not optimal, so another option in ObjectiveC++.
void searchAndReplace(std::string& value, std::string const& search,std::string const& replace)
{
std::string::size_type next;
for(next = value.find(search); // Try and find the first match
next != std::string::npos; // next is npos if nothing was found
next = value.find(search,next) // search for the next match starting after
// the last match that was found.
)
{
// Inside the loop. So we found a match.
value.replace(next,search.length(),replace); // Do the replacement.
next += replace.length(); // Move to just after the replace
// This is the point were we start
// the next search from.
}
}
EDIT 5: If you change the const char * in stringMatch() to 'string` it will be less complex for you.
expr.replace(/*size_t*/ pos1, /*size_t*/ n1, /*const string&*/ str );
EDIT 6: From C++11 on, there exists something like raw string literals.
This means you don't have to escape, instead, you can write the following:
string a = R"raw(a\sb)raw";
Note that the raw in the string can be replaced by any delimiter of your choosing. This for the case you want to use a sub string like )raw in the actual string. Using these raw string literals mainly make sense when you have to escape characters a lot, like in combination with std::regex.
P.S. You have all the answers now, so it's upto you which one you implement that gives you the best results.
Xcode is spitting out that warning because it is interpreting \s in "a\sb" as an escape sequence, but \s is not a valid escape sequence. It gets replaced with just s so the string becomes "asb".
Escaping the backslash like "a\\sb" is the correct solution. If this somehow didn't work for you please post more details on that.
Here's an example.
#include <iostream>
#include <string>
int main() {
std::string a = "a\\sb";
std::cout << a.size() << ' ' << a << '\n';
}
The output of this program looks like:
If you get different output please post it. Also please post exactly what problem you observed when you tried "a\\sb" earlier.
Regexs can be a pain in C++ because backslashes have to be escaped this way. C++11 has raw strings that don't allow any kind of escaping so that escaping the backslash is unnecessary: R"(a\sb)".

Lex parsing without spaces

I am coding a custom shell using Lex, Yacc, and C++. It is being run in a Unix environment. It currently works fine as long as there are spaces between the tokens. for example:
ls | grep test > out
will pass:
WORD PIPE WORD WORD GREAT WORD
to Yacc, and then actions are taken from there. However, I need it to work when there are not spaces as well. for example:
ls|grep test>out
should work the same as the previous command. However, it currently only passes:
WORD WORD
is there a way to parse the input before Lex tokenizes it?
Edit:
Here is my Lex file:
%{
#include <string.h>
#include "y.tab.h"
%}
%%
\n {
return NEWLINE;
}
[ \t] {
/* Discard spaces and tabs */
}
">" { return GREAT; }
">&" { return GREATAMPERSAND; }
">>" { return GREATGREAT; }
">>&" { return GREATGREATAMPERSAND; }
"<" { return LESS; }
"|" { return PIPE; }
"&" { return AMPERSAND; }
[^ \t\n][^ \t\n]* {
/* Assume that file names have only alpha chars */
yylval.string_val = strdup(yytext);
return WORD;
}
. {
/* Invalid character in input */
return NOTOKEN;
}
%%
You need to change your definition of a WORD. Right now, when it encounters an alphabetic character, it considers everything up to the next whitespace as part of that WORD.
You want to change that so it doesn't include any of the punctuation you're using for other purposes:
[^ \t\n\>\<\|\&]+ {
/* Assume that file names have only alpha chars */
yylval.string_val = strdup(yytext);
return WORD;
}
I figured it out. WORD was including the pipes and other special characters.
I changed it to
[^\|\>\<\& \t\n][^\|\>\<\& \t\n]* {
yylval.string_val = strdup(yytext);
return WORD;
}
and now it works.

Regex Rejecting matches because of Instr

What's the easiest way to do an "instring" type function with a regex? For example, how could I reject a whole string because of the presence of a single character such as :? For example:
this - okay
there:is - not okay because of :
More practically, how can I match the following string:
//foo/bar/baz[1]/ns:foo2/#attr/text()
For any node test on the xpath that doesn't include a namespace?
(/)?(/)([^:/]+)
Will match the node tests but includes the namespace prefix which makes it faulty.
I'm still not sure whether you just wanted to detect if the Xpath contains a namespace, or whether you want to remove the references to the namespace. So here's some sample code (in C#) that does both.
class Program
{
static void Main(string[] args)
{
string withNamespace = #"//foo/ns2:bar/baz[1]/ns:foo2/#attr/text()";
string withoutNamespace = #"//foo/bar/baz[1]/foo2/#attr/text()";
ShowStuff(withNamespace);
ShowStuff(withoutNamespace);
}
static void ShowStuff(string input)
{
Console.WriteLine("'{0}' does {1}contain namespaces", input, ContainsNamespace(input) ? "" : "not ");
Console.WriteLine("'{0}' without namespaces is '{1}'", input, StripNamespaces(input));
}
static bool ContainsNamespace(string input)
{
// a namspace must start with a character, but can have characters and numbers
// from that point on.
return Regex.IsMatch(input, #"/?\w[\w\d]+:\w[\w\d]+/?");
}
static string StripNamespaces(string input)
{
return Regex.Replace(input, #"(/?)\w[\w\d]+:(\w[\w\d]+)(/?)", "$1$2$3");
}
}
Hope that helps! Good luck.
Match on :? I think the question isn't clear enough, because the answer is so obvious:
if(Regex.Match(":", input)) // reject
You might want \w which is a "word" character. From javadocs, it is defined as [a-zA-Z_0-9], so if you don't want underscores either, that may not work....
I dont know regex syntax very well but could you not do:
[any alpha numeric]\*:[any alphanumeric]\*
I think something like that should work no?
Yeah, my question was not very clear. Here's a solution but rather than a single pass with a regex, I use a split and perform iteration. It works as well but isn't as elegant:
string xpath = "//foo/bar/baz[1]/ns:foo2/#attr/text()";
string[] nodetests = xpath.Split( new char[] { '/' } );
for (int i = 0; i < nodetests.Length; i++)
{
if (nodetests[i].Length > 0 && Regex.IsMatch( nodetests[i], #"^(\w|\[|\])+$" ))
{
// does not have a ":", we can manipulate it.
}
}
xpath = String.Join( "/", nodetests );