Qt C++ QRegExp parse string - c++

I have the string str. I want to get two strings ('+' and '-'):
QString str = "+asdf+zxcv-tyupo+qwerty-yyuu oo+llad dd ff";
// I need this two strings:
// 1. For '+': asdf,zxcv,qwerty,llad dd ff
// 2. For '-': tyupo,yyuu oo
QRegExp rx("[\\+\\-](\\w+)");
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1) {
qDebug() << rx.cap(0);
pos += rx.matchedLength();
}
Output I need:
"+asdf"
"+zxcv"
"-tyupo"
"+qwerty"
"-yyuu oo"
"+llad dd ff"
Output I get:
"+asdf"
"+zxcv"
"-tyupo"
"+qwerty"
"-yyuu"
"+llad"
If I replace \\w by .* the output is:
"+asdf+zxcv-tyupo+qwerty-yyuu oo+llad dd ff"

You can use the following regex:
[+-]([^-+]+)
See regex demo
The regex breakdown:
[+-] - either a + or -
([^-+]+) - a capturing group matching 1 or more symbols other than - and +.

Your regexp is excessive:
[\\+\\-](\\w+)
\______/\____/
^ ^--- any amount of alphabetical characters
^--- '+' or '-' sign
So what you are capturing is the +/- sign, and any word that follows it directly. If you want to capture only the +/- signs, use [+-] as a regular expression.
EDIT:
To get the strings including the spaces, you need
QRegExp rx("[+-](\\w|\\s)+");

Related

regex to match all whitespace except those between words and surrounding hyphens?

I'd like to sanitize a string so all whitespace is removed, except those between words, and surrounding hyphens
1234 - Text | OneWord , Multiple Words | Another Text , 456 -> 1234 - Text|OneWord,Multiple Words|Another Text,456
std::regex regex(R"(\B\s+|\s+\B)"); //get rid of whitespaces except between words
auto newStr = std::regex_replace(str, regex, "*");
newStr = std::regex_replace(newStr, std::regex("*-*"), " - ");
newStr = std::regex_replace(newStr, std::regex("*"), "");
this is what I currently use, but it is rather ugly and I'm wondering if there is a regex I can use to do this in one go.
You can use
(\s+-\s+|\b\s+\b)|\s+
Replace with $1, backreference to the captured substrings in Group 1. See the regex demo. Details:
(\s+-\s+|\b\s+\b) - Group 1: a - with one or more whitespaces on both sides, or one or more whitespaces in between word boundaries
| - or
\s+ - one or more whitespaces.
See the C++ demo:
std::string s("1234 - Text | OneWord , Multiple Words | Another Text , 456");
std::regex reg(R"((\s+-\s+|\b\s+\b)|\s+)");
std::cout << std::regex_replace(s, reg, "$1") << std::endl;
// => 1234 - Text|OneWord,Multiple Words|Another Text,456

How to find the exact substring with regex in c++11?

I am trying to find substrings that are not surrounded by other a-zA-Z0-9 symbols.
For example: I want to find substring hello, so it won't match hello1 or hellow but will match Hello and heLLo!##$%.
And I have such sample below.
std::string s = "1mySymbol1, /_mySymbol_ mysymbol";
const std::string sub = "mysymbol";
std::regex rgx("[^a-zA-Z0-9]*" + sub + "[^a-zA-Z0-9]*", std::regex::icase);
std::smatch match;
while (std::regex_search(s, match, rgx)) {
std::cout << match.size() << "match: " << match[0] << '\n';
s = match.suffix();
}
The result is:
1match: mySymbol
1match: , /_mySymbol_
1match: mysymbol
But I don't understand why first occurance 1mySymbol1 also matches my regex?
How to create a proper regex that will ignore such strings?
UDP
If I do like this
std::string s = "mySymbol, /_mySymbol_ mysymbol";
const std::string sub = "mysymbol";
std::regex rgx("[^a-zA-Z0-9]+" + sub + "[^a-zA-Z0-9]+", std::regex::icase);
then I find only substring in the middle
1match: , /_mySymbol_
And don't find substrings at the beggining and at the end.
The regex [^a-zA-Z0-9]* will match 0 or more characters, so it's perfectly valid for [^a-zA-Z0-9]*mysymbol[^a-zA-Z0-9]* to match mysymbol in 1mySymbol1 (allowing for case insensitivity). As you saw, this is fixed when you use [^a-zA-Z0-9]+ (matching 1 or more characters) instead.
With your update, you see that this doesn't match strings at the beginning or end. That's because [^a-zA-Z0-9]+ has to match 1 or more characters (which don't exist at the beginning or end of the string).
You have a few options:
Use beginning/end anchors: (?:[^a-zA-Z0-9]+|^)mysymbol(?:[^a-zA-Z0-9]+|$) (non-alphanumeric OR beginning of string, followed by mysymbol, followed by non-alphanumeric OR end of string).
Use negative lookahead and negative lookbehind: (?<![a-zA-Z0-9])mysymbol(?![a-zA-Z0-9]) (match mysymbol which doesn't have an alphanumeric character before or after it). Note that using this the match won't include the characters before/after mysymbol.
I recommend using https://regex101.com/ to play around with regular expressions. It lists all the different constructs you can use.

String Replacing in Regex

I am trying to replace text in string using regex. I accomplished it in c# using the same pattern but in swift its not working as per needed.
Here is my code:
var pattern = "\\d(\\()*[x]"
let oldString = "2x + 3 + x2 +2(x)"
let newString = oldString.stringByReplacingOccurrencesOfString(pattern, withString:"*" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
print(newString)
What I want after replacement is :
"2*x + 3 +x2 + 2*(x)"
What I am getting is :
"* + 3 + x2 +*)"
Try this:
(?<=\d)(?=x)|(?<=\d)(?=\()
This pattern matches not any characters in the given string, but zero width positions in between characters.
For example, (?<=\d)(?=x) This matches a position in between a digit and 'x'
(?<= is look behind assertion (?= is look ahead.
(?<=\d)(?=\() This matches the position between a digit and '('
So the pattern before escaping:
(?<=\d)(?=x)|(?<=\d)(?=\()
Pattern, after escaping the parentheses and '\'
\(?<=\\d\)\(?=x\)|\(?<=\\d\)\(?=\\\(\)

Extract numbers from string (Regex C++)

let's say i hve a string S = "1 this is a number=200; Val+54 4class find57"
i want to use Regex to extract only this numbers:
num[1] = 1
num[2] = 200
num[3] = 54
and not the 4 in "4class" or 57 in "find57" which means only numbers that are surrounded by Operators or space.
i tried this code but no results:
std::string str = "1 this is a number=200; Val+54 4class find57";
boost::regex re("(\\s|\\-|\\*|\\+|\\/|\\=|\\;|\n|$)([0-9]+)(\\s|\\-|\\*|\\+|\\/|\\;|\n|$)");
boost::sregex_iterator m1(str.begin(), str.end(), re);
boost::sregex_iterator m2;
for (; m1 != m2; ++m1) {
advm1->Lines->Append((*m1)[1].str().c_str());
}
by the way i'am using c++ Builder XE6.
Just use word boundaries. \b matches between a word character and a non-word character.
\b\d+\b
OR
\b[0-9]+\b
DEMO
Escape the backslash one more time if necessary like \\b\\d+\\b or \\b[0-9]+\\b

Regular expression that matches string equals to one in a group

E.g. I want to match string with the same word at the end as at the begin, so that following strings match:
aaa dsfj gjroo gnfsdj riier aaa
sdf foiqjf skdfjqei adf sdf sdjfei sdf
rew123 jefqeoi03945 jq984rjfa;p94 ajefoj384 rew123
This one could do te job:
/^(\w+\b).*\b\1$/
explanation:
/ : regex delimiter
^ : start of string
( : start capture group 1
\w+ : one or more word character
\b : word boundary
) : end of group 1
.* : any number of any char
\b : word boundary
\1 : group 1
$ : end of string
/ : regex delimiter
M42's answer is ok except degenerate cases -- it will not match string with only one word. In order to accept those within one regexp use:
/^(?:(\w+\b).*\b\1|\w+)$/
Also matching only necessary part may be significantly faster on very large strings. Here're my solutions on javascript:
RegExp:
function areEdgeWordsTheSame(str) {
var m = str.match(/^(\w+)\b/);
return (new RegExp(m[1]+'$')).test(str);
}
String:
function areEdgeWordsTheSame(str) {
var idx = str.indexOf(' ');
if (idx < 0) return true;
return str.substr(0, idx) == str.substr(-idx);
}
I don't think a regular expression is the right choice here. Why not split the the lines into an array and compare the first and the last item:
In c#:
string[] words = line.Split(' ');
return words.Length >= 2 && words[0] == words[words.Length - 1];