Qt 4.8.4 MAC Address QRegExp - regex

I'm trying to get Qt to match a MAC Address ( 1a:2b:3c:4d:5e:6f ) using a QRegExp. I can't seem to get it to match - what am I doing wrong?
I am forcing it to try and match the string:
"48:C1:AC:55:86:F3"
Here are my attempts:
// Define a RegEx to match the mac address
//QRegExp regExMacAddress("[0-9a-F]{1,2}[\.:-]){5}([0-9a-F]{1,2}");
//QRegExp regExMacAddress("[0-9a-F]{0,2}:[0-9a-F]{0,2}:[0-9a-F]{0,2}:[0-9a-F]{0,2}:[0-9a-F]{0,2}:[0-9a-F]{0,2}");
//regExMacAddress.setPatternSyntax(QRegExp::RegExp);
// Ensure that the hexadecimal characters are upper case
hwAddress = hwAddress.toUpper();
qDebug() << "STRING TO MATCH: " << hwAddress << "MATCHED IT: " << regExMacAddress.indexIn(hwAddress) << " Exact Match: " << regExMacAddress.exactMatch(hwAddress);
// Check the mac address format
if ( regExMacAddress.indexIn(hwAddress) == -1 ) {

In your first example opening bracket is missing and \. is incorrect (read help for explanations), in both a-F matches nothing, due to 'a' > 'F'.
The correct answer you can find in the comment of kenrogers, but I'll duplicate it for you:
([0-9A-F]{2}[:-]){5}([0-9A-F]{2})
If you want to match . you should use:
([0-9A-F]{2}[:-\\.]){5}([0-9A-F]{2})
If you also want to match lower case characters, you should use:
([0-9A-Fa-f]{2}[:-\\.]){5}([0-9A-Fa-f]{2})

Related

QRegExp to extract array name and index

I am parsing some strings. If I encounter something like "Foo(bar)", I want to extract "Foo" and "bar"
How do I do it using QRegExp?
First thing, if you are using Qt 5 then rather use QRegularExpression class
The QRegularExpression class introduced in Qt 5 is a big improvement upon QRegExp, in terms of APIs offered, supported pattern syntax and speed of execution.
Secondly, get a visual tool that helps when testing/defining regular expressions, I use an online website.
To get the "Foo" and "Bar" from your example, I can suggest the following pattern:
(\w+)\((\w+)\)
--------------
The above means:
(\w+) - Capture one or more word characters (capture group 1)
\( - followed by a opening brace
(\w+) - then capture one or more word characters (capture group 2)
\) - followed by a closing brace
This pattern must be escaped for direct usage in the Qt regular expression:
const QRegularExpression expression( "(\\w+)\\((\\w+)\\)" );
QRegularExpressionMatch match = expression.match( "Foo(bar)" );
if( match.hasMatch() ) {
qDebug() << "0: " << match.captured( 0 ); // 0 is the complete match
qDebug() << "1: " << match.captured( 1 ); // First capture group
qDebug() << "2: " << match.captured( 2 ); // Second capture group
}
Output is:
0: "Foo(bar)"
1: "Foo"
2: "bar"
See the pattern in action online here. Hover the mouse over the parts in the "Expression" box to see the explanations or over the "Text" part to see the result.

Regex for replacing printf-style calls with ostream left-shift syntax

The logging facility for our C++ project is about to be refactored to use repeated left-shift operators (in the manner of Qt's qDebug() syntax) instead of printf-style variadic functions.
Suppose the logging object is called logger. Let's say we want to show the ip and port of the server we connected to. In the current implementation, the usage is:
logger.logf("connected to %s:%d", ip, port);
After the refactor, the above call would become:
logger() << "connected to" << ip << ":" << port;
Manually replacing all these calls would be extremely tedious and error-prone, so naturally, I want to use a regex. As a first pass, I could replace the .logf(...) call, yielding
logger() "connected to %s:%d", ip, port;
However, reformatting this string to the left-shift syntax is where I have trouble. I managed to create the separate regexes for capturing printf placeholders and comma-delimited arguments. However, I don't know how to properly correlate the two.
In order to avoid repetition of the fairly unwieldy regexes, I will use the placeholder (printf) to refer to the printf placeholder regex (returning the named group token), and (args) to refer to the comma-delimited arguments regex (returning the named group arg). Below, I will give the outputs of various attempts applied to the relevant part of the above line, i.e.:
"connected to %s:%d", ip, port
/(printf)(args)/g produces no match.
/(printf)*(args)/g produces two matches, containing ip and port in the named group arg (but nothing in token).
/(printf)(args)*/g achieves the opposite result: it produces two matches, containing %s and %d in the named group token, but nothing in arg.
/(printf)*(args)*/g returns 3 matches: the first two contain %s and %d in token, the third contains port in arg. However, regexp101 reports "20 matches - 207 steps" and seems to match before every character.
I figured that perhaps I need to specify that the first capturing group is always between double quotes. However, neither /"(printf)"(args)/g nor /"(printf)(args)/g produce any matches.
/(printf)"(args)/g produces one (incorrect) match, containing %d in group token and ip in arg, and substitution consumes the entire string between those two strings (so entering # for the substitution string results in "connected to %s:#, port. Obviously, this is not the desired outcome, but it's the only version where I could at least get both named groups in a single match.
Any help is greatly appreciated.
Edited to correct broken formatting
Disclaimer: This is a workaround, it's far from perfect and may lead to errors. Be careful when you'll commit the changes and, if you can, make a colleague proofread the diff to reduce the chances of disturbance.
You may try this multi-steps replacement from the max number of argument you have in the solution to the min (here I'll do from 3 to 0).
Let's consider logger.logf("connected to %s:%d some %s random text", ip, port, test);
You can match this with this regex: logger.logf\("(.*?)(%[a-z])(.*?)(%[a-z])(.*?)(%[a-z])(.*?)",(.*?)(?:, (.*?))?(?:, (.*?))?\); which will give you the following groups:
1. [75-88] `connected to `
2. [88-90] `%s`
3. [90-91] `:`
4. [91-93] `%d`
5. [93-99] ` some `
6. [99-101] `%s`
7. [101-113] ` random text`
8. [115-118] ` ip`
9. [120-124] `port`
10. [126-130] `test`
Replace with logger() << "\1" << \8 << "\3" << \9 << "\5" << \10 << "\7"; will give you
logger() << "connected to " << ip << ":" << port << " some " << test << " random text";
Now step with 2 args, example string is logger.logf("connected to %s:%d some random text", ip, port);, corresponding regex is logger.logf\("(.*?)(%[a-z])(.*?)(%[a-z])(.*?)",(.*?)(?:, (.*?))?\);
The matching is the following:
1. [13-26] `connected to `
2. [26-28] `%s`
3. [28-29] `:`
4. [29-31] `%d`
5. [31-48] ` some random text`
6. [50-53] ` ip`
7. [55-59] `port`
And the replace string: logger() << "\1" << \6 << "\3" << \7 << "\5"; outputs:
logger() << "connected to " << ip << ":" << port << " some random text";
Input logger.logf("Some %s text", port);
Regex logger.logf\("(.*?)(%[a-z])(.*?)",(.*?)\);
Replacement logger() << "\1" << \4 << "\3";
logger() << "Some " << port << " text";
What about empty groups?
Let's say input is not logger.logf("Some %s text", port); but logger.logf("Some %s", port);. The output will then be:
logger() << "Some " << port << "";
You'll have to remove << "" to get something clean.

Regex grouping matches with C++ 11 regex library

I'm trying to use a regex for group matching. I want to extract two strings from one big string.
The input string looks something like this:
tХB:Username!Username#Username.tcc.domain.com Connected
tХB:Username!Username#Username.tcc.domain.com WEBMSG #Username :this is a message
tХB:Username!Username#Username.tcc.domain.com Status: visible
The Username can be anything. Same goes for the end part this is a message.
What I want to do is extract the Username that comes after the pound sign #. Not from any other place in the string, since that can vary aswell. I also want to get the message from the string that comes after the semicolon :.
I tried that with the following regex. But it never outputs any results.
regex rgx("WEBMSG #([a-zA-Z0-9]) :(.*?)");
smatch matches;
for(size_t i=0; i<matches.size(); ++i) {
cout << "MATCH: " << matches[i] << endl;
}
I'm not getting any matches. What is wrong with my regex?
Your regular expression is incorrect because neither capture group does what you want. The first is looking to match a single character from the set [a-zA-Z0-9] followed by <space>:, which works for single character usernames, but nothing else. The second capture group will always be empty because you're looking for zero or more characters, but also specifying the match should not be greedy, which means a zero character match is a valid result.
Fixing both of these your regex becomes
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
But simply instantiating a regex and a match_results object does not produce matches, you need to apply a regex algorithm. Since you only want to match part of the input string the appropriate algorithm to use in this case is regex_search.
std::regex_search(s, matches, rgx);
Putting it all together
std::string s{R"(
tХB:Username!Username#Username.tcc.domain.com Connected
tХB:Username!Username#Username.tcc.domain.com WEBMSG #Username :this is a message
tХB:Username!Username#Username.tcc.domain.com Status: visible
)"};
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
std::smatch matches;
if(std::regex_search(s, matches, rgx)) {
std::cout << "Match found\n";
for (size_t i = 0; i < matches.size(); ++i) {
std::cout << i << ": '" << matches[i].str() << "'\n";
}
} else {
std::cout << "Match not found\n";
}
Live demo
"WEBMSG #([a-zA-Z0-9]) :(.*?)"
This regex will match only strings, which contain username of 1 character length and any message after semicolon, but second group will be always empty, because tries to find the less non-greedy match of any characters from 0 to unlimited.
This should work:
"WEBMSG #([a-zA-Z0-9]+) :(.*)"

QRegExp not finding expected string pattern

I am working in Qt 5.2, and I have a piece of code that takes in a string and enters one of several if statements based on its format. One of the formats searched for is the letters "RCV", followed by a variable amount of numbers, a decimal, and then one more number. There can be more than one of these values in the line, separated by "|", for example it could one value like "RCV0123456.1" or mulitple values like "RCV12345.1|RCV678.9". Right now I am using QRegExp class to find this, like this:
QString value = "RCV000030249.2|RCV000035360.2"; //Note: real test value from my code
if(QRegExp("^[RCV\d+\.\d\|?]+$").exactMatch(value))
std::cout << ":D" << std::endl;
else
std::cout << ":(" << std::endl;
I want it to use the if statement, but it keeps going into the else statement. Is there something I'm doing wrong with the regular expression?
Your expression should be like #vahancho mentionet in a comment:
if(QRegExp("^[RCV\\d+\\.\\d\\|?]+$").exactMatch(value))
If you use C++11, then you can use its raw strings feature:
if(QRegExp(R"(^[RCV\d+\.\d\|?]+$)").exactMatch(value))
Aside from escaping the backslashes which others has mentioned in answers and comments,
There can be more than one of these values in the line, separated by "|", for example it could one value like "RCV0123456.1" or mulitple values like "RCV12345.1|RCV678.9".
[RCV\d+\.\d\|?] may not be doing what you expect. Perhaps you want () instead of []:
/^
[RCV\d+\.\d\|?]+ # More than one of characters from the list:
# R, C, V, a digit, a +, a dot, a digit, a |, a ?
$/x
/^
(
RCV\d+\.\d # RCV, some digits, a dot, followed by a digit
\|? # Optional: a |
)+ # Quantifier of one or more
$/x
Also, maybe you could revise the regex such that the optional | requires the group to be matched *again*:
/^
(RCV\d+\.\d) # RCV, some digits, a dot, followed by a digit
(
\|(?1) # A |, then match subpattern 1 (Above)
)+ # Quantifier of one or more
$/x
Check if only valid occurences in line with the addition to require an | starting second occurence (having your implementation would not require the | even with double quotes):
QString value = "RCV000030249.2|RCV000035360.2"; //Note: real test value from my code
if(QRegExp("^RCV\\d+\\.\\d(\\|RCV\\d+\\.\\d)*$").exactMatch(value))
std::cout << ":D" << std::endl;
else
std::cout << ":(" << std::endl;

QRegExp not extracting text as expected

I am trying to extract text from between square brackets on a line of text. I've been messing with the regex for some time now, and cannot get what I need. (I can't even explain why the output is what it is). Here's the code:
QRegExp rx_timestamp("\[(.*?)\]");
int pos = rx_timestamp.indexIn(line);
if (pos > -1) {
qDebug() << "Captured texts: " << rx_timestamp.capturedTexts();
qDebug() << "timestamp cap: " <<rx_timestamp.cap(0);
qDebug() << "timestamp cap: " <<rx_timestamp.cap(1);
qDebug() << "timestamp cap: " <<rx_timestamp.cap(2);
} else qDebug() << "No indexin";
The input line is:
messages:[2013-10-08 09:13:41] NOTICE[2366] chan_sip.c: Registration from '"xx000 <sip:xx000#183.229.164.42:5060>' failed for '192.187.100.170' - No matching peer found
And the output is:
Captured texts: (".")
timestamp cap: "."
timestamp cap: ""
timestamp cap: ""
Can someone explain what is going on? Why is cap returning "." when no such character exists between square brackets
Can someone correct the regex to extract the timestamp from between the square brackets?
You are missing two things. Escaping the backslash, and using setMinimal. See below.
QString line = "messages:[2013-10-08 09:13:41] NOTICE[2366] chan_sip.c: Registration from '\"xx000 <sip:xx000#183.229.164.42:5060>' failed for '192.187.100.170' - No matching peer found";
QRegExp rx_timestamp("\\[(.*)\\]");
rx_timestamp.setMinimal(true);
int pos = rx_timestamp.indexIn(line);
if (pos > -1) {
qDebug() << "Captured texts: " << rx_timestamp.capturedTexts();
qDebug() << "timestamp cap: " <<rx_timestamp.cap(0);
qDebug() << "timestamp cap: " <<rx_timestamp.cap(1);
qDebug() << "timestamp cap: " <<rx_timestamp.cap(2);
} else qDebug() << "No indexin";
Output:
Captured texts: ("[2013-10-08 09:13:41]", "2013-10-08 09:13:41")
timestamp cap: "[2013-10-08 09:13:41]"
timestamp cap: "2013-10-08 09:13:41"
timestamp cap: ""
UPDATE: What is going on:
A backslash in c++ source code indicates that the next character is an escape character, such as \n. To have a backslash show up in a regular expression you have to escape a backslash like so: \\ That will make it so that the Regular Expression engine sees \, like what Ruby, Perl or Python would use.
The square brackets should be escaped, too, because they are used to indicate a range of elements normally in regex.
So for the Regular expression engine to see a square bracket character you need to send it
\[
but a c++ source file can't get a \ character into a string without two of them in a row so it turns into
\\[
While learning regex, I liked using this regex tool by GSkinner. It has a listing on the right hand side of the page of unique codes and characters.
QRegEx doesn't match regex exactly. If you study the documentation you find a lot of little things. Such as how it does Greedy v. Lazy matching.
QRegExp and double-quoted text for QSyntaxHighlighter
How the captures are listed is pretty typical as far as I have seen from regex parsers. The capture listing first lists all of them, then it lists the first capture group (or what was enclosed by the first set of parentheses.
http://qt-project.org/doc/qt-5.0/qtcore/qregexp.html#cap
http://qt-project.org/doc/qt-5.0/qtcore/qregexp.html#capturedTexts
To find more matches, you have to iteratively call indexIn.
http://qt-project.org/doc/qt-5.0/qtcore/qregexp.html#indexIn
QString str = "offsets: 1.23 .50 71.00 6.00";
QRegExp rx("\\d*\\.\\d+"); // primitive floating point matching
int count = 0;
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1) {
++count;
pos += rx.matchedLength();
}
// pos will be 9, 14, 18 and finally 24; count will end up as 4
Hope that helps.