Finding number between [/ and ] using regex in C++ - c++

I want to find the number between [/ and ] (12345 in this case).
I have written such code:
float num;
string line = "A111[/12345]";
boost::regex e ("[/([0-9]{5})]");
boost::smatch match;
if (boost::regex_search(line, match, e))
{
std::string s1(match[1].first, match[1].second);
num = boost::lexical_cast<float>(s1); //convert to float
cout << num << endl;
}
However, I get this error: The error occurred while parsing the regular expression fragment: '/([0-9]{5}>>>HERE>>>)]'.

You need to double escape the [ and ] that special characters in regex denoting character classes. The correct regex declaration will be
boost::regex e ("\\[/([0-9]{5})\\]");
This is necessary because C++ compiler also uses a backslash to escape entities like \n, and regex engine uses the backslash to escape special characters so that they are treated like literals. Thus, backslash gets doubled. When you need to match a literal backslash, you will have to use 4 of them (i.e. \\\\).

Use the following (escape [ and ] because they are special characters in regex meaning a character class):
\\[/([0-9]{5})\\]
^^ ^^

Related

regex_replace is returning empty string

I am trying to remove all characters that are not digit, dot (.), plus/minus sign (+/-) with empty character/string for float conversion.
When I pass my string through regex_replace function I am returned an empty string.
I belive something is wrong with my regex expression std::regex reg_exp("\\D|[^+-.]")
Code
#include <iostream>
#include <regex>
int main()
{
std::string temporary_recieve_data = " S S +456.789 tg\r\n";
std::string::size_type sz;
const std::regex reg_exp("\\D|[^+-.]"); // matches not digit, decimal point (.), plus sign, minus sign
std::string numeric_string = std::regex_replace(temporary_recieve_data, reg_exp, ""); //replace the character that are not digit, dot (.), plus-minus sign (+,-) with empty character/string for float conversion
std::cout << "Numeric String : " << numeric_string << std::endl;
if (numeric_string.empty())
{
return 0;
}
float data_value = std::stof(numeric_string, &sz);
std::cout << "Float Value : " << data_value << std::endl;
return 0;
}
I have been trying to evaluate my regex expression on regex101.com for past 2 days but I am unable to figure out where I am wrong with my regular expression. When I just put \D, the editor substitutes non-digit character properly but soon as I add or condition | for not dot . or plus + or minus - sign the editor returns empty string.
The string is empty because your regex matches each character.
\D already matches every character that is not a digit.
So plus, hyphen and the period thus far are consumed.
And digits get consumed by the negated class: [^+-.]
Further the hyphen indicates a range inside a character class.
Either escape it or put it at the start or end of the char-class.
(funnily the used range +-. 43-46 even contained a hyphen)
Remove the alternation with \D and put \d into the negated class:
[^\d.+-]+
See this demo at regex101 (attaching + for one or more is efficient)

Groovy complaining about illegal character range in regex

Groovy 2.4 here. I am trying to build a regex that will filter out all the following characters:
`,./;[]-&<>?:"()|
Here's my best attempt:
static void main(String[] args) {
// `,./;[]-&<>?:"()|
String regex = "`,./;[]-&<>?:\"()|"
String test = "ooekrofkrofor ` oxkeoe , wdkeodeko / kodek ] woekoedk \" swjiej ' wsjwdjeiji :"
println test.replaceAll(regex, "")
}
However this produces a compile error on the regex string definition, complaining:
illegal character range (to < from)
Not sure if this is a Java or Groovy thing, but I can't figure out how to define the regex properly so that it quiets the error and correctly strips these "illegal characters" out of my string. Any ideas?
It seems to me you want to remove all the characters listed in your regex variable. The problem is that you declared a sequence while you need a character class (enclose the characters with []).
See Groovy demo:
String regex = "[`,./;\\[\\]&<>?:\"()|-]+"
^ ^^^^^^ ^ ^
String test = "ooekrofkrofor ` oxkeoe , wdkeodeko / kodek ] woekoedk \" swjiej ' wsjwdjeiji :"
println test.replaceAll(regex, "")
Output: ooekrofkrofor oxkeoe wdkeodeko kodek woekoedk swjiej ' wsjwdjeiji
The pattern now contains a character class matching any of the characters defined inside it - [`,./;\[\]&<>?:\"()|-] - one or more times due to the + quantifier. Note that inside the character class, ] and [ must always be escaped, and the - can be left unescaped when placed at the start/end of the character class.
You need to escape a few special characters in your pattern:
String regex = "[`,./;\\[]\\-&<>?:\"\\(\\)|]+"
Note using double \\ to turn them into a single \ in the string, so when the pattern is parsed, the next character is escaped.

How to match a string with an opening brace { in C++ regex

I have about writing regexes in C++. I have 2 regexes which work fine in java. But these throws an error namely
one of * + was not preceded by a valid regular expression C++
These regexes are as follows:
regex r1("^[\s]*{[\s]*\n"); //Space followed by '{' then followed by spaces and '\n'
regex r2("^[\s]*{[\s]*\/\/.*\n") // Space followed by '{' then by '//' and '\n'
Can someone help me how to fix this error or re-write these regex in C++?
See basic_regex reference:
By default, regex patterns follow the ECMAScript syntax.
ECMAScript syntax reference states:
characters:
\character
description: character
matches: the character character as it is, without interpreting its special meaning within a regex expression.
Any character can be escaped except those which form any of the special character sequences above.
Needed for: ^ $ \ . * + ? ( ) [ ] { } |
So, you need to escape { to get the code working:
std::string s("\r\n { \r\nSome text here");
regex r1(R"(^\s*\{\s*\n)");
regex r2(R"(^\s*\{\s*//.*\n)");
std::string newtext = std::regex_replace( s, r1, "" );
std::cout << newtext << std::endl;
See IDEONE demo
Also, note how the R"(pattern_here_with_single_escaping_backslashes)" raw string literal syntax simplifies a regex declaration.

C++11 regex to tokenize Mathematical Expression

I have the following code to tokenize a string of the format: (1+2)/((8))-(100*34):
I'd like to throw an error to the user if they use an operator or character that isn't part of my regex.
e.g if user enters 3^4 or x-6
Is there a way to negate my regex, search for it and if it is true throw the error?
Can the regex expression be improved?
//Using c++11 regex to tokenize input string
//[0-9]+ = 1 or many digits
//Or [\\-\\+\\\\\(\\)\\/\\*] = "-" or "+" or "/" or "*" or "(" or ")"
std::regex e ( "[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
std::sregex_iterator rend;
std::sregex_iterator a( infixExpression.begin(), infixExpression.end(), e );
queue<string> infixQueue;
while (a!=rend) {
infixQueue.push(a->str());
++a;
}
return infixQueue;
-Thanks
You can run a search on the string using the search expression [^0-9()+\-*/] defined as C++ string as "[^0-9()+\\-*/]" which finds any character which is NOT a digit, a round bracket, a plus or minus sign (in real hyphen), an asterisk or a slash.
The search with this regular expression search string should not return anything otherwise the string contains a not supported character like ^ or x.
[...] is a positive character class which means find a character being one of the characters in the square brackets.
[^...] is a negative character class which means find a character NOT being one of the characters in the square brackets.
The only characters which must be escaped within square brackets to be interpreted as literal character are ], \ and - whereby - must not be escaped if being first or last character in the list of characters within the square brackets. But it is nevertheless better to escape - always within square brackets as this makes it easier for the regular expression engine / function to detect that the hyphen character should be interpreted as literal character and not with meaning "FROM x to z".
Of course this expression does not check for missing closing round brackets. But formula parsers do often not require that there is always a closing parenthesis for every opening parenthesis in comparison to a compiler or script interpreter simply because not needed to calculate the value based on entered formula.
Answer is given already but perhaps someone might need this
[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]
This regex separates floats, integers and arithmetic operators
Heres the trick:
[0-9]?([0-9]*[.])?[0-9]+ -> if its a digit and has a point, then grab the digits with the point and the digits that follows it, if not, just grab the digits.
Sorry if my answer isn't clear, i just learned regex and found this solution by my own by just trial and errors.
Heres the code (it takes a mathematical expression and split all digits and operators into a vector)
NOTE: I don't know if it accepts whitespaces, meaning that the mathematical expression that i worked with had no whitespaces. Example: 4+2*(3+1) and would separate everything nicely, but i havent tried with whitespaces.
/* Separate every int or float or operator into a single string using regular expression and store it in untokenize vector */
string infix; //The string to be parse (the arithmetic operation if you will)
vector<string> untokenize;
std::regex words_regex("[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
auto words_begin = std::sregex_iterator(infix.begin(), infix.end(), words_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
cout << (*i).str() << endl;
untokenize.push_back((*i).str());
}
Output:
(<br/>
1<br/>
+<br/>
2<br/>
)<br/>
/<br/>
(<br/>
(<br/>
8<br/>
)<br/>
)<br/>
-<br/>
(<br/>
100<br/>
*<br/>
34<br/>
)<br/>

Ignore String containing special words (Months)

I am trying to find alphanumeric strings by using the following regular expression:
^(?=.*\d)(?=.*[a-zA-Z]).{3,90}$
Alphanumeric string: an alphanumeric string is any string that contains at least a number and a letter plus any other special characters it can be # - _ [] () {} ç _ \ ù %
I want to add an extra constraint to ignore all alphanumerical strings containing the following month formats :
JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
One solution is to actually match an alphanumerical string. Then check if this string contains one of these names by using the following function:
vector<string> findString(string s)
{
vector<string> vec;
boost::regex rgx("JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
");
boost::smatch match;
boost::sregex_iterator begin {s.begin(), s.end(), rgx},
end {};
for (boost::sregex_iterator& i = begin; i != end; ++i)
{
boost::smatch m = *i;
vec.push_back(m.str());
}
return vec;
}
Question: How can I add this constraint directly into the regular expression instead of using this function.
One solution is to use negative lookahead as mentioned in How to ignore words in string using Regular Expressions.
I used it as follows:
String : 2-hello-001
Regular expression : ^(?=.*\d)(?=.*[a-zA-Z]^(?!Jan|Feb|Mar)).{3,90}$
Result: no match
Test website: http://regexlib.com/
The edit provided by #Robin and #RyanCarlson : ^[][\w#_(){}ç\\ù%-]{3,90}$ works perfectly in detecting alphanumeric strings with special characters. It's just the negative lookahead part that isn't working.
You can use negative look ahead, the same way you're using positive lookahead:
(?=.*\d)(?=.*[a-zA-Z])
(?!.*(?:JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)).{3,90}$
Also you regex is pretty unclear. If you want alphanumerical strings with a length between 3 and 90, you can just do:
/^(?!.*(?:JANVIER|F[Eé]VRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AO[Uù]T|SEPTEMBRE|OCTOBRE|NOVEMBRE|D[Eé]CEMBRE|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
[][\w#_(){}ç\\ù%-]{3,90}$/i
the i flag means it will match upper and lower case (so you can reduce your forbidden list), \w is a shortcut for [0-9a-zA-Z_] (careful if you copy-paste, there's a linebreak here for readability between (?! ) and [ ]). Just add in the final [...] whatever special characters you wanna match.